Dear Diarist, fun is where you find it! (part 1)

Sheila Heti collected 500,000 words from a decade’s worth of journals, put the sentences in a spreadsheet, and sorted them alphabetically. She cut and cut and was left with 60,000 words of brilliance and mayhem, joy and sorrow. These are her alphabetical diaries.
From the book jacket for Alphabetical Diaries by Sheila Heti (hard cover 2024 U.S. edition)

When I read a similar description of Alphabetical Diaries in a book review, I was intrigued about both how Heti did it and what the results were like. (You can read the “B” chapter via a link on her site.)

After all, I’ve played with manipulating texts for fun for a long time, removing words and phrases from context, putting them into new contexts, etc. Here on the blog, there’s been The Pen’s Honour, S + 7 in 1, and The Wisdom of Oz. Longer ago there were Textoems, and back in the mists of time (1999), I did a version of the Surrealist game The Exquisite Corpse, where you could play with Edgar Rice Burroughs, Lewis Carroll, Charles Dickens, Mary Shelley, and Mark Twain. Alas, that site did not survive technological progress and is itself a corpse.

The process

Let’s dive in and see what we can figure out about how Heti constructed her book. In Part 2, I’ll talk about the results.

The book jacket blurb is a very high level overview and (necessarily) omits a lot of detail. Fortunately, Heti has revealed a lot more of the details in a variety of interviews, and some of the reviews give a few more hints. (Links to the interviews and reviews are at the end of this post.) We can also fill in a couple more details on our own, or at least make reasonable guesses.

Finding sentences

In the beginning, there were the journals (Heti calls them both journals and diaries, but I’ll try to consistently call them journals; more about what they are in Part 2). Heti writes them in MS Word. For this project, she took 10 years of her journals, copied them into a new Word file and then divided them into individual sentences. Here’s where we (and she) hit the first hitches: what are sentence and how do you find them?

One main problem with figuring out what counts as a sentence is interruptions, which come in two types: quotations and asides/parentheticals/etc. For example, does the beginning of the previous paragraph consist of one sentence containing a parenthetical or two separate sentences, the second one inside the parentheses? I put the period after the closing parenthesis, suggesting I was thinking of it as one sentence, but I could just as easily have put it before the opening parenthesis, thus (more) clearly making two separate sentences. A similar problem occurs with quotations: are the quotations their own sentences, or are they part of one larger sentence? Extended quotations are particularly problematic.

A second main problem is what we might call dash conjunction — though I’m sure there’s a formal word for it (like parataxis).

It seems that Heti made the decision to include quotations, interruptions and dash conjunction in a single sentence. Here’s an example which has both a quotation and a dash conjunction, where the italics (in the original) indicate the quotation and the dash indicates, well, the dash conjunction.

Alphabetical Diaries: Mom said, some women want Brad Pitt—that doesn’t mean they can have him.

I want to stress that it was a decision on Heti’s part to make these long sentences. It would be perfectly reasonable to split at least the dash conjunctions into separate sentences, something like this:

My rephrase: Some women want Brad Pitt. That doesn’t mean they can have him.

The second of our initial hitches is how to find the sentences automatically. At 500,000 words, it would be quite an effort to read through the whole journals and put in line breaks at the end of every sentence, and as much as Heti loves editing (and she says editing is her preferred way of working), she also said that she did not read the journals straight through. Instead, she used Word’s search and replace function to add in the line breaks.

A starting point is to look for periods, question marks, exclamation points and add line breaks after them. One problem is abbreviations with periods that occur in the middle of a sentence, like

Alphabetical Diaries: I can have a bigger imagination than to think, New York or L.A.?

OK, L.A. isn’t so bad (no spaces to further confuse us), but other abbreviations are:

Made up example: When I visited St. Joseph, I went down Sixth St. when I meant to go down Eighth St.

Sort of makes your head hurt, doesn’t it? It turns out there’s a whole mini subfield of computational linguistics called “sentence boundary detection” to do exactly this kind of automatic division of sentences. They’ve made great progress, so much so that they have “declared victory and moved on”, leaving bits and pieces of misanalysed sentences in their wake. I too have made a relatively simple sentence boundary detection program, which you will be able to see in action in a bit. Be that as it may, Heti made many passes of find and replace in Word (and almost certainly many manual fixes) to divide her journals into sentences.

Alphabetizing

The next step was to alphabetize the sentences, which is heart of the project. To do this, Heti copied the sentences into MS Excel, one sentence per line, and then used Excel’s “Sort” button. That seems straightforward, and mechanically it is. However, it hides an issue: what exactly is the alphabetical order when there is punctuation involved? Well, in one type of example, Heti simply (eventually) did away with the punctuation: quotation marks. In the book, there are no quotation marks (at least not that I’ve noticed). As we saw above, Heti uses italics to indicate quotations. Problem solved. Of course, this is another decision that Heti made, just as she made the decision to make single sentences rather than break dash conjunction into shorter sentences. And in fact, in an earlier version of the alphabetized journals, published in 2014, she kept the quotation marks, although in the example below they are “scare quotes” rather than speech or thought quotes.

“From My Diaries (2006–10) in Alphabetical Order” in n+1 v. 18 (2014):

All I wanted was “a physical life.”

Alphabetical Diaries (2024): All I wanted was a physical life.

A more subtle issue, and one that Heti may or may not have been aware of, is that there are different ideas about how to alphabetize words with following punctuation. In particular, Microsoft’s online AI tool (currently called “Copilot”, but formerly “BingAI”), insists on a slightly different alphabetization order than Excel uses, which Heti used. So the choice of Excel as a tool has consequences for the alphabetical order. Here are some example from the book:

Excel

Halloween tonight.
Halloween!
Hanif was
Hanif, seeing
It would
It’s

Copilot

Halloween!
Halloween tonight.
Hanif, seeing
Hanif was
It’s
It would

Editing

The biggest editing task was deciding which sentences to include. After all, the original journals contained some 500,000 words and the book contains 60,000 (according to the jacket, but 50,000 or 55,000 according to other interviews and reviews). That’s a lot of editing! But Heti loves it, she loves writing lots and lots of text and then editing it down. Perhaps not surprisingly she makes different choices of sentences in different versions of the project, in particular between the (short) 2014 version and the 2024 book version (there is another version in the New York Times Magazine, but I have not seen that, though at least one review says that is slightly different from the book). I’ll have more to say about this in Part 2.

The other major editing process was creating composite characters from the people she wrote about in her journals. Heti has said that she replaced all the names (except those of famous people) with the corresponding gendered pronouns, and then constructed characters based on how she wanted the ideas of the sentences to go together. She then made up names for the characters and changed the pronouns to names where she felt appropriate. (Presumably, she retained the first and second person pronouns, though I haven’t found any place where she explicitly confirms that.) Obviously, this was an elaborate decision making process, and not one that we could infer from the book jacket description — we only know because she told us. This character formation explains the note on the copyright page of the U.S. edition:

Alphabetical Diaries: None of the characters in this book have their literal analogues in the author’s life.

One huge, and deliberate (and Heti is quite explicit about this in her interviews), consequence of the character formation is that the choice of names determines where in the book sentences starting with those characters occur. So the character Lars has lots of sentences in the L chapter, and the character Lemons does as well, while the character Pavel has lots of sentences in the P chapter. Pavel does occur in the L chapter, but not very often and obviously not at the beginning of sentences. Similarly, Lars occurs occasionally in the P chapter, though Lemons does not. Of course, this consequence only holds if the sentences were re-alphabetised after she inserted the character names. In other words, the final order of sentences is almost certainly not the original sorted order. This re-alphabetisation is implicit, though not explicit, in what Heti has said, and not at all clear from simple descriptions such as the one on the book jacket.

There is another interesting type of editing. Heti has said that she edited some of the sentences to make them “better” (her word). She has said that she did not invent any new sentences, and that she didn’t change the meaning of any sentences as she edited. While we don’t know what the original sentences are, we can see some sentence level editing by comparing the 2014 version with the 2024 book. Here are a couple examples with differences highlighted:

2014	2024
Actually, he is looking around the world for another girl, and because of who he is, he will find one and be with her.	Actually, he is looking around the world for another girl, and because of who he is, he will find her and be with her.
All I want to tell him is that he should take care of himself—that he doesn’t need to take care of me, I can take care of myself, and he ought to take care of himself first.	All I want is to tell him that he should take care of himself—that he doesn’t need to take care of me, I can take care of myself, and he ought to take care of himself first.

I have to say that I disagree with Heti that she didn’t change the meaning of any sentences, since the two versions of the examples above do have slightly different meanings. In the first case, (“find one” vs. “find her”), the difference could be whether he is looking for a specific other girl (“her”) or just looking for a girl in general (“one”). However, the difference is slight, and I’m not 100% convinced that it exists.

But the second example does show a clear (if subtle) difference in meaning between the two versions. We can paraphrase the 2014 version as something like “The only thing that I want to tell him is that …”, while the 2024 version can be paraphrased as “The only thing that I want to do is tell him that …”. In other words in the 2014 version “only thing” is limited in scope, restricted to the things she could tell him, while in the 2024 version “only thing” refers to all the things she could do.

Do the purported differences matter in the broader context? Probably not. However, they do indicate that the process that Heti used is not quite what she says (and I assume, believes) it is.

There is one last type of editing that I’ll mention, and that is formatting the sentences as running text as opposed to single sentences. Heti says that this formatting choice was the suggestion of a couple friends, and that the running text format freed her to think of the project as art (presumably in contrast to the more scientific view she had at the outset). Heti says that this was a relatively late development in the project, and interestingly enough, the 2014 version does not have running text, but rather individual sentences, with certain initial phrases (not sentences) used as separators (in all capital letters):

A
A 5,000-word article.
A bark worse than its bite.
A beautiful soul, person.
A big bulky man walked past us in the road and made a Hulkish yell and then punched the wall.
A big email list.
A book like a shopping mart—all the selections.
A book that is a game.
A budget will help you to know where to go.
A CERTAIN
A certain kind of bore who has said all he is saying, said it all before, and expects to hear nothing new from you on the subject.
A certain lack of self-centeredness, belief in one’s own innate genius, and faith in hard work, long hours.
ACTUALLY
Actually, he doesn’t want to love you.
Actually, he doesn’t want you.
Actually, he is looking around the world for another girl, and because of who he is, he will find one and be with her.
ALL I WANT
All I want is some more experiences with him.
All I want is to read books for a year.
All I want to tell him is that he should take care of himself—that he doesn’t need to take care of me, I can take care of myself, and he ought to take care of himself first.
All I wanted was “a physical life.”
AN INTEREST
An interest in a wide variety of people.
An interest in casting.
An interest in doing research.
An interest in sex.
An interest in streetcar drivers.
“From My Diaries (2006–10) in Alphabetical Order” in n+1 v. 18 (2014)

Simulacrum

When I first sent my friend Will the review I’d read of Alphabetical Diaries, he replied “Wow, have you tried to have fun with this?” Well, I’m having a lot of fun with it, which is sort of what Heti has said that she hopes for, though she means that she hopes people have fun reading the book.

Another part of the fun is seeing what an automatic version of Heti’s process, a simulacrum if you will, would produce. I’ve made a web page where you provide one or more texts (or you can use some examples I’ve included), and it will determine the sentences in them (using the sentence boundary detector that I wrote), alphabetize the sentences, and select some of the sentences. You can specify certain aspects in addition to the texts, such as the sorting order and how many sentences to choose, and whether to display the results as running text (a “paragraph”) or one sentence at a time. However, the default settings follow Heti’s choices for the 2024 version for alphabetization and running text.

Of course, not all of Heti’s process is incorporated into the Alphabeticals simulacrum, in particular the simulacrum does not do character creation and sentence internal editing. Go ahead and try it.

The Alphabeticals simulacrum gives me a better appreciation of the artfulness of Heti’s book and the work that went into it. I have found, and I think you will too, is that the results of the simulacrum just aren’t as interesting as Heti’s book. I’m sure she’s breathing a huge sigh of relief at that (though she has been working with AI chatbots in another ongoing project, so maybe she’s not too worried after all).

I’ll have more commentary in Part 2, but in the meantime, have fun!

Sources

Previous version

“From My Diaries (2006–10) in Alphabetical Order” in n+1 v. 18 (2014)

Interviews

ALOUD (Los Angeles Library Foundation): Michelle Tea

Barnes and Noble #PouredOver: Miwa Messer

Books are Magic: Lillian Fishman

The College Hill Independent (2014): Drew Dickerson

Reviews

Art Review: Celine Nguyen

The Arts Desk: India Lewis

Cleveland Review of Books: Hannah Kinney-Kobre

The Guardian: Hephzibah Anderson

The Guardian: Claire Dederer

Literary Review: Francesca Peacock

Los Angeles Times: Lynn Steger Strong

The Telegraph: Claire Allfree

toronto.com: Robert J. Wiersema

Washington Post: Rebecca Rothfeld