Research quantifies literary trends
By Ester Khachatryan, The Dartmouth Staff
Published on Wednesday, May 9, 2012
Math department chair and computer science professor Daniel Rockmore and a team of researchers, including James Hughes Adv ’12 and Nicholas Foti Adv ’12, studied the evolution of literary styles using mathematical and statistical analysis of English-language literature to explore a “rich area of collaboration,” Rockmore said.
In a published report of their findings in the Proceedings of the National Academy of Sciences, the researchers concluded that the literary styles of modern authors vary from their predecessors more than authors of previous eras. This gap between authors and their immediate predecessors has widened over generations in a quantifiable manner.
“The study confirms the possibility of computations and quantitative techniques as another way to look at cultural artifacts,” Rockmore said.
The researchers focused on the usage of “content-free words” to study similarities in the structure of literary styles since the year 1550. Hughes, who studies computer science and was a co-author of the research study, said that content-free words like prepositions, conjunctions, articles and numbers are useful for analysis because they do not convey meaning. While content words provide an analysis of subject matter, content-free words provide an analysis of writing style.
The research analyzed data from 7,733 books written by 537 authors available through the Gutenberg Digital Library.
English professor Barbara Will said the findings reflect innovations in the fields of literature and translation that have made world literature accessible to modern authors.
“I think there is a way in which the globalizing of our world has diversified the way in which literary styles are manifested,” Will said.
The researcher found that between 1550 and 1783, the availability of literary texts was relatively low, so authors were strongly influenced by a small group of previous texts. This process led to common trends of phrasing, syntax and grammar that remained “relatively unchanged” for extensive periods of time.
As the number of works increased in later periods, authors were influenced by subsets of available literature, in turn producing increasingly diverse works. By the 20th century, authors were more strongly influenced by their contemporaries than predecessors, the researchers found.
“I think this type of large scale analysis can tell us things about how style changes over time at a macro level and about how style is changing,” Hughes said. “So in that sense it contributes to our understanding of the concept of literary style.”
Will said that the study is an example of “distant reading,” a new way of looking at literature that focuses on literary trends across many works.
Distant reading is part of the larger “digital humanities” movement that seeks to analyze literature using computers and mathematical techniques, rather than close reading and qualitative analysis, according to an article by Kathryn Schulz published in the New York Times on June 24, 2011.
While digital humanities is hailed by some scholars as an opportunity to revitalize the humanities field, English professor Aden Evens said that the new path provided by digitization will also appeal to scientists, social scientists, engineers and university administrators who manage the research opportunities and classes allotted to each area of discipline.
“What I am worried about is that we are asked to set aside what most distinguishes the humanities — our ability to perform critiques of ambiguity — and instead we are being asked to supplant it with a view of knowledge that promotes certainty, statistical or numerical style analyses, absolute and clear distinctions,” Evens said.
Stanford University English professor Ursula Heise, who studies 20th-century fiction, poetry and theories of modernism and postmodernism said that quantitative methodologies will never replace traditional readings of literary texts.
“In interpreting what the numbers mean, you have to bring in all the knowledge of literary genres, history, literary history, institutions, authors, methods of publishing and distribution that you acquire in the normal course of literary studies,” Heise said.
She said that computer-based analysis of texts can, however, provide opportunities to ask new questions about literature.
While Will said quantitative analysis provides for new ways of thinking about literature, she questioned whether content-free words are sufficient in determining an author’s style.
“I think it can help just by giving us more information, but I don’t think ultimately it necessarily ‘enriches our sympathies,’ which is George Elliot’s definition of literature,” Will said. “But it can be really interesting and useful for people who want to look at trends in literary study.”
Critiques of quantifying authorial style are “incredibly pertinent,” according to Hughes, who acknowledged that researchers using automatic or computational tools do not claim to employ the “best” approach.
“The point is really that traditional analyses that take into account the intangible or unquantifiable factors can benefit from the addition of quantitative evidence,” he said.
Rockmore, Hughes and Daniel Graham, a researcher at the University of Vienna’s psychology department, also developed a method in 2011 for mathematically analyzing drawings to distinguish authenticity, according to a College press release. In addition, the research team developed a web tool, “WriteLikeMe,” that enables writers to compare their writing style to the styles of past authors.