Now that you mention them …, Arbitrary but not random

All that work in identifying names in Photo-Era is paying off. I have started looking at some of the aspects of how people, but especially women, are mentioned in Photo-Era.

Frequency by name

There a lots of ways we could look at the names. The simplest is just seeing how often people are mentioned by name (and their variants). [I’ve updated that previous post with a chart showing the overall frequencies of names in Photo-Era by gender.] One interesting thing is that in the top 10 most frequent names, only one is a woman: Katherine Bingham. In addition to being a photographer herself, she was an assistant editor at Photo-Era for several years. (Here is a post by the St. Johnsbury Historical Society about a picture of hers that is in the Library of Congress collection, and Lee talked about her in her PHSNE presentation.) Another interesting thing about the frequencies is that of the top 10 women and the top 10 men, 5 of the top women have Wikipedia entries, but only 3 of the top men do. (I’m omitting the names for which I have not determined a gender, but they don’t have any Wikipedia entries in any case.)

Woman	Rank	Rank within women	Wikipedia
Elizabeth Flint Wade	17	2	Elizabeth Flint Wade
Gertrude Kasebier	41	3	Gertrude Kasebier
Nancy Ford Cones	70	7	Nancy Ford Cones
Gerhard Sisters	87	9	Gerhard Sisters
Emily H. Hayden	99	10	Emily H. Hayden

Top 10 women in Photo-Era who have Wikipedia entries

Man	Rank	Rank within men	Wikipedia
William S. Davis	2	2	William S. Davis
Paul Lewis Anderson	7	6	Paul Lewis Anderson
Clarence H. White	9	8	Clarence H. White

Top 10 men in Photo-Era who have Wikipedia entries

Frequency by nearby mentions

Another way to look at the names is by considering what other names are mentioned nearby. The motivation is that people who are mentioned together probably have some connection, which we can try to discover after finding them together. “Nearby” is rather vague, but we can make it more precise. First, we can measure the distance between names in terms of words and punctuation (“tokens” in linguistics jargon). Then we can limit ourselves to the closest 1% of distances, which turns out to be any pairs that are within 26 tokens of each other. This leaves us with 30,976 unique pairs that make up the closest 1% of the distances, out of the 55,851 total pairs (which also means that over half of the pairs are “nearby” pairs). That gives us plenty to work with.

As example, here are the top 4 co-occurring pairs of names, with the average distance between them.

Person 1	Person 2	Occurrences	Average Distance
Alonzo H. Beardsley	Wilfred A. French	104	6.7
Phil M. Riley	Wilfred A. French	98	8.1
Katherine Bingham	Wilfred A. French	62	14.0
Elizabeth Flint Wade	Wilfred A. French	57	12.7

Top 4 co-occurring pairs of names in Photo-Era

There is nothing important about the designations “Person 1” and “Person 2” — “Person 1” is the name in the pair which is alphabetically first (by first name). What is striking is that the top 4 pairs all involve Wilfred A. French. However, once we realize that he was the main editor (and sometimes owner) of Photo-Era, and the other people were all also editors at one time or another, then we understand why their names would occur near each other so often: they would be together on the masthead. (Interestingly, that Wikipedia entry for Photo-Era does not mention either of the women editors that we see here, though it does mention other male editors.)

This is a nice example of the what and the why of the names. Mostly in these first few posts, I’ll have whats rather than whys. Hopefully, more whys will come later…

This gives you a taste of the kind of things I’ll be looking at. In the next post, I’ll do a deeper dive into the “nearby” names. Stay tuned …

Technical notes

If you look for these names in my Photo-Era Search tool, you might not find what you expect. For example, if you look for Alonzo H. Beardsley AND Wilfred A. French, there are only 10 results in 2 issues, which is not nearly enough. One big difference is the variants: Beardsley is most often referred to as A. H. Beardsley, but my normalization used here is his full name.

On the other hand, if we look for A. H. Beardsley AND Wilfred A. French, we get 1041 results in 148 issues, which is way too many. Here we need to keep in mind that the search tool looks for the names anywhere in the same issue, whereas the information reported above is only for those pairs within 26 tokens of each other. That means that A. H. Beardsley and Wilfred A. French get mentioned a lot more often further apart, or even separately, than they do close together.

Finally, the 26 token distance which is the 1% cutoff is almost the 25 token cutoff often used as the upper limit for long “sentences” when parsing corpora (sequences of words longer than that are often not really sentences). In other words, our nearby names can be thought of as being roughly within one long sentence’s distance apart (though not necessarily in the same sentence).

Now that you mention them …

Frequency by name

Frequency by nearby mentions

Technical notes

Recent Posts

Archives

Categories