A Look at Gender Representation over Time


I use the New York Times’ database tool to examine the prevalence of the phrases “he said” and “she said,” using them as proxies for the representation of men and women in news articles over time. Dramatic world events have tended to suppress women’s representation until recently. Women’s representation in the news is only about one-third that of men at present, and this level was only achieved in the late 90s. There is also a general trend of increasing usage of both proxy phrases over time, especially in the post-WWII era.


The New York Times (NYT) recently added a data visualization tool to its website, allowing people to search for the prevalence of key phrases throughout the newspaper’s history. The Guardian ran with a story that included a chart of the use of “he said” versus “she said” over time, and this has been circulating online—mostly as an indication of how lopsided the representation of men vs. women is in the news.

The following assumes that the NYT usage corresponds to broad societal attitudes and that it can be used as a proxy for understanding the relative power and prominence of women vs. men in our society. At the very least, if one assumes that the NYT caters to the wealthiest top 20% or so of the population, it says something about the standing of men and women within that elite strata. It also assumes that “he said” and “she said” are a reasonable proxies for the representation of men and women in the news.

Analysis and Discussion

Fractions HeSaid SheSaid NYT First, the correlation between “he said” and “she said” is 0.89, meaning that movement in one measure often accompanies the same direction of movement in the other. This is likely due to the particular writing styles of those employed by the Times, or due to the preferences of the editors. They, in turn, are representative of shifting societal expectations and linguistic customs—again, either broadly or within the upper strata of US society depending on the who the NYT writes for. Both culture and world events tend to effect the use of these phrases.

There has been a large increase in the use of both proxies over time. The period from 1851 to 1890 averaged 4% of articles using the phrase “he said,” while 1891 to 1950 more than doubled this to 9.2%. Then there is a gradual takeoff from 1950 to the mid-70s, where “he said” becomes much more common and tops out between 20% and 25% of all articles, holding roughly steady to the present. “She said” also grows during this time, moving from 1% in 1891 – 1950 to roughly 8% in the present.

Possible explanations for the dramatic difference in pre- and post-WWII:

  1. In the 1800s and early 1900s, it was considered proper to reference individuals by their surnames or full names rather than via pronouns.
  2. In the 1800s and early 1900s, there were fewer direct quotations of sources. Possible sub-explanations:
    1. Space limitations. Easier to summarize someone’s statement than to quote them for any length. Perhaps quote a few choice words or phrases from them, summarize the rest in between for brevity.
    2. The inability to write down every word or record precisely what someone said, again leading to a preference for summarizing. There might be a correlation here with the prevalence of recording devices, especially portable ones. However, I would think that shorthand writing should have been able to get around this and was actually developed for this exact purpose.
    3. A shift in style toward narratives and conversations. Summarizing requires the ability to take a whole statement into account and distill it down to its essence, which is often less conversational. Quoting individuals directly personalizes an article, making it slightly more conversational and story-like than a summation. Remember, this could also reflect a gradual evolution of standards among journalists, and not in the broader culture—recall that the NYT tends to serve the wealthy and professionals, so this change could be relegated only to those segments of society. A “vulgarization” of their discourse, while keeping the content roughly the same.
    4. A shift in focus/content toward personality and “what people say” and away from analysis and discussion. Reasons for this could include the desire to appear balanced and impartial or the need to provide cover from accusations of distortion—a direct quote looks more legitimate and true than a summary, even if the quote is used out of context. This could also relate to 2c above.

These are just some guesses—no idea if I’m in the right ballpark.

Ratio HeSaid SheSaid NYT

The ratio of “she said” to “he said” varies from 4% to 24% prior to the Great Depression. Note a solid upward trend from 1890 to approximately 1914. This correlates closely with the movement for women’s suffrage, with the National American Woman Suffrage Association forming in 1889 and the nineteenth amendment being passed in 1920. There is a dip from 1914 to 1919, as World War I drew a lot of attention to the political and military realm that was dominated entirely by men at the time. The level picked back up after the war, then fell again around 1928 as the Great Depression took hold. The ratio did not take off again until the mid-1960s as the second-wave feminist movement in the US developed—The Feminine Mystique was published in 1963.

If this ratio can be taken as a rough indication of the representation of women in national news and if that is an estimation of their relative standing in society, then the period from 1929 to 1965 was the worst such stretch since NYT records began in 1851. Though there was a great deal of variation prior to 1929, the 1929 – 1965 period featured consistently low numbers, no higher than 11% and hovering around 6% for most of the 1950s.

Somewhat disconcerting for those of us born recently is the fact that the current levels of (still low) representation of women in the news were only achieved in the late 1990s. Even more disconcerting is that the last two years were the lowest in two decades, though it remains to be seen whether this is an aberration or the start of a new trend.

Note also that large, systemic crises that last for multiple years see a dip in the ratio, probably because politics, wars, and markets have historically been male dominated. As I mentioned above, World War I was a departure from trend, as were the Great Depression and World War II—and cultural changes from these latter two locked in a strongly subordinate status for women for an additional two decades after their resolution. The Civil War can be picked out of the data, with the ratio plummeting from a range of 10% to 20% down to 4% or 5% from 1861 to 1864, then picking back up to around 10% after the war.

However, it’s difficult to pick out more recent crises. The Vietnam War does not show up in the data as a decrease in the ratio—likely because second wave feminism was growing at the same time and the movement was a key part of the protests against the war. Women’s voices continued to be heard during wartime, and in fact increased. There is a slight stagnation in the late 70s and early 80s, possibly due to the end of the war (and its loss as a platform for oppositional organization) and stagflation—though I’m mostly hand waving at this point and should stop speculating.

Even more recently, the effects of the dotcom bubble, the September 11th attacks, and the invasions of Iraq and Afghanistan show up as fluctuations of maybe 1 or 2 percentage points difference. In other words, not much at all. The 2008 Great Recession is completely hidden in the ratio but shows up in the two base percentage data sets—although it’s possible that the low ratio values in 2012 and 2013 are indicative of lasting, deeper effects of the Recession.

Several ideas come to mind:

  1. For the majority of the US population, the intensity of recent wars and crises has diminished greatly due to use of a volunteer army, limited US casualties, and the existence of a social safety net in the post-New Deal era.
  2. Women now participate in politics, wars, and market activities to enough of a degree that when crises occur they are now part of the focus and not relegated to the background.
  3. The wealthy and much of the upper middle class are increasingly insulated from crises that affect the rest of the nation and world, and the NYT tends to serve these audiences.

There may be some merit to (1), as the more recent wars did not feature a draft, casualties were limited compared to 20th century wars that involved the US, and the existence of unemployment insurance, social security, and welfare can mitigate the worst impacts of economic crises. This would explain why recent dips in base proxy percentages, though present, are small. However, NYT reporters, editorial writers, and editors are not drawn from the general population and instead represent the wealthier strata of society. Their readers also tend to come from these strata. So the experience of the bottom 80% on their writing is limited, thus limiting (1)’s usefulness as an explanation. It could still be true, but it’s doubtful that it shows up in this particular set of data.

Idea (2) looks more likely. Although women are underrepresented in positions of power, they may have surpassed some minimum threshold of participation where they are now sufficient in number to be noticed and quoted in situations where men used to be the sole focus. Additionally, movements associated with third wave feminism have been active in the US since the early 1990s and could be having an effect on coverage. Note that this explanation also means that, given the chance, women are roughly as likely as men to participate in wars, economic downturns, and political crises—which sounds right to me. This would explain why the ratio is pretty level over the last 15 years while the individual proxy values may rise and fall a bit more dramatically.

Finally, (3) is also possible, especially with rising wealth inequality in the US since the 1980s. Given the existence of Wall Street and so much high finance right in the NYT’s backyard, as well as my stated reservations to (1) above, idea (3) seems like an explanation for proxy percentage changes—events that effect the wealthy show up more dramatically than those that don’t. The 2008 Great Recession shows up, wars in Iraq and Afghanistan do not. However, the dotcom bust doesn’t show up either, weakening the case for this explanation.


The proportional representation of women in NYT reporting since the late 1990s has increased by a factor of three compared to the 1800s and a factor of six compared to the period 1929 to 1965. However, it’s still at the very low proportion of 30% to 35%, and even these levels were only achieved in the last 20 years. The last couple years have also seen a downward push in representation, but whether this is a new trend or an aberration has yet to be seen. If these ratios are taken as a proxy for women’s status in society, we have a long way to go. However, it’s also possible that the NYT is behind the times somewhat.

Historically, women’s representation has plummeted during wars and economic crises and risen during feminist movements and political agitation for women’s rights. The last twenty years or so may have broken this dynamic as women have increasingly entered positions of power, though it is difficult to tell. There has also been a general trend toward greater use of “he said” and “she said” in the NYT, indicating a shift in the style of reporting. The various explanations for this above cannot be evaluated with the information on hand or from my own knowledge.


