Interpreting the topic model of Signs
These pages use the results of a computer-assisted topic modeling technique to explore thematic and rhetorical patterns in the history of Signs from its first issue in 1975 up until 2014. The topic model offers pathways through the overall tendencies that have characterized the journal and novel routes to particular articles from the journal’s run so far. Here, we give some suggestions for exploring and interpreting the model using this website, and a number of feminist scholars offer their reflections on the Editorial Comments pages. Yet we hope you will discover new patterns and trends yourself—and that you may find this a useful way to (re)discover aspects of the Signs archive and to open up new channels of interpretation for the field of feminist scholarship.
Contents
Citing this topic model
Please cite this interactive topic model as follows:
Goldstone, Andrew, Susana Galán, C. Laura Lovin, Andrew Mazzaschi, and Lindsey Whitmore. An Interactive Topic Model of Signs. Signs at 40. http://signsat40.signsjournal.org/topic-model. 2014.
Each individual page displaying some part of the model has a distinctive URL, which you can copy from your web browser’s location bar. For example, the overview in the form of a list has the URL http://signsat40.signsjournal.org/topic-model/#/model/list.
The source code for this browser visualization and the R scripts used to create the topic model displayed here are available on github.
About topic modeling
Topic modeling is an approach from the field of machine learning which seeks to characterize a large number of documents algorithmically in terms of a smaller number of themes or word clusters (referred to as topics). The result of applying the algorithm is a model: deliberately simplifying the documents, it attempts to describe their interconnections concisely and cogently.
A topic, for the purposes of modeling, is a collection of words that tend to occur in the same documents. One topic here is characterized by the frequency of the words political, movement, politics, and national. We have labeled this the Political movements topic, because these words are often used together to discuss political movements, but the topic itself is only the verbal pattern of co-occurence. Because the technique focuses on words that occur together, it attempts to capture the way that the same word may be used (sometimes with different meanings) in different contexts. Thus, another topic also using the word political is characterized by words like class, political, public, and state. We have labeled this the Marxism and feminism topic: it reflects a rather different pattern of usage that also frequently includes the word political.
Because topics are models of word usage, the term topic is misleadingly narrow. The patterns of language are often better described as discourses, topoi, or rhetorical frames. The attraction of topic modeling is precisely its capacity to capture (if only approximately) these central concepts of cultural and political analysis.
The particular technique we have used is Latent Dirichlet Allocation, an algorithm first described by David Blei, Andrew Ng, and Michael Jordan in a 2003 article (Blei has more recently written an explanation for a broader audience as well). Like most such algorithms, it focuses its attention solely on patterns of word use, not the order of words in a document. The topic labels have been added by us after the algorithmic modeling process; they reflect our interpretive judgments.
The model treats each article in Signs as a mixture of topics, recognizing that the typical article incorporates multiple major themes or discourses. Thus, for example, Mary Hawkesworth’s 2011 editorial “Signs 2005–2015: Reflections on the Nature and Global Reach of Interdisciplinary Feminist Knowledge Production,” is divided by the model among many topics, including, most prominently, a topic related to discussions of the field of women’s studies (words like studies, feminist, work, research), another focused on the broad theme of the social (social, power, experience), and a third related to globalization (world, global, states). As this example indicates, these patterns of word use are by no means definitive interpretations of the content and rhetoric of an individual text; instead, this model offers a guide for exploration and further reading. (The appearance of several other topics in small proportions, some likely irrelevant to the meaning of the text, reveals the statistical nature of the model, which is always subject to error.)
This browser serves as one way of visualizing our model of the Signs archive. It presents multiple ways of viewing topics: individual topics over time; articles that are most strongly associated with particular topics; multiple topics contained in each article; and the words that contribute most significantly to each topic.
Reading the overview
The overview pages show all topics at once. There are four overview pages. In any of them, click a topic to examine it in more depth.
The topic grid
In the grid, the topics are arranged in a grid (alphabetically by label). Topics are represented by the most frequent words in the topic enclosed in a circle. As you hover the mouse pointer over a topic, its label appears. The relative size of a word in the circle reflects its weight in a topic.
Topic space: a scaled map of the topics
This model overview is “to scale,” in the sense that topics are placed close to one another if they have similar distributions of frequent words. This visualization allows you to see some of the relations among topics. To see overlapping topics more clearly, hover the mouse over a topic (this also reveals topic labels). Hold Shift and drag the mouse to pan, and scroll or double-click to zoom.
The topic list
The list of topics presents the model as a table. Topics are identified by our own interpretive label (in italics) and by their most frequent words. To help compare topics, the table can be sorted in several ways by clicking the table’s column headers.
On the left is a miniature graph of the topic over time (visible more fully on the specific pages for each topic; see below). This gives a rough sense of the distribution of the topic over the forty years of the Signs archive. Click the column’s header (“1975–2014”) to sort by the year in which the topic attains its maximum proportion within the overall corpus, and click again to reverse the order. (Note that the y-axes of these miniature bar charts are not all on the same scale.) Sorting chronologically in this manner may be of particular interest since it can point to topics that have strong trends over time.
Click the “top words” header to see the list sorted alphabetically by topic label. On the far right is the proportion of all the words in the corpus assigned to the given topic, visualized by the length of the blue bars. Sorting by corpus proportion moves topics that are assigned to a high proportion of the overall corpus to the top. These proportions are difficult to interpret; the highest-proportion topics are sometimes the least interesting parts of the model—agglomerations of very common words without a clear thematic content. To simplify matters, we have hidden some of these topics from view by default (they can be revealed using the settings dialog).
Topics over time
This version of the overview visualizes each topic’s relative prominence over time using a streamgraph. The height of each topic’s stream shows the proportion of the topic in a given year. Colors are used to distinguish topics from one another (colors are not, however, unique to each topic; they repeat after twenty colors). The stacking order of the topic is determined by a heuristic for making the visualization less jagged, and the topics are designated by their labels. Clicking on any of the individual streams takes you to the topic’s more detailed page.
“Islands” of prominence sometimes indicate special issues of Signs devoted to a particular topic. For example, the sudden spread of the light green stream near the bottom (the stream for the Medieval women topic) in the late 1980s largely represents the 1989 special issue, “Working Together in the Middle Ages: Perspectives on Women’s Communities.” Topics with other unusual patterns over time are also visible in this overview. To take exploration a little further, pan and zoom the view (using the mouse to scroll or holding down Shift while dragging). As you zoom, more topic labels appear, attached to moments of prominence for their topics.
By default this display shows the percentages of the year’s words accounted for by a topic. But it can also be shifted to display raw counts of words assigned to topics per year, since not all years contain the same number of words.
Exploring in depth
Topic pages
Each topic’s page gives a fuller sense of the makeup of that topic. The lefthand column shows the topic as a distribution of words, with each word’s frequency within the topic indicated by a blue bar. Clicking on any of the individual words in this column will take you to a list of all of the topics in which that word appears.
The topic page includes a list of the “Top articles” for the topic. This displays the articles that contain the highest proportion of the topic. For instance, for the topic labeled Sexuality, Annamarie Jagose’s “‘Critical Extasy’: Orgasm and Sensibility in Memoirs of a Woman of Pleasure” (vol. 32, no. 2) is listed first because the model estimates that 29.4 percent of that article is a discussion of this topic (it has assigned 1239 out of 3432 tokens in Jagose’s article to this topic). No article has a higher proportion of words in Sexuality. Remember, however, that every article has multiple topics: the articles with the largest proportion of a particular topic may also have large proportions of other topics.
On the upper right is the time series. This gives a sense of the topic’s prominence over time. The y-axis is the yearly proportion of words in the corpus belonging to the topic—that is, out of all the words published in Signs in a given year, the percentage devoted to the particular topic. The bars emphasize that the model does not assume a smooth evolution from year to year in topics, and neither should we, but it is still possible to see trends over time. For example, Globalization is clearly a topic whose prominence is relatively recent. However, it is not wholly absent in the early years of Signs, either. To focus on prominent articles in the topic for a given year only, click a year’s bar on the chart. Noticing that the topic has some presence in 1981, you might click on that year to view the list of 1981 articles with some words in Globalization.
Article pages
An article page represents the estimated proportions of the various topics in a given article. For instance, in the article page for Maxine Baca Zinn’s “Family, Race, and Poverty in the Eighties”, the model assigns the plurality, 31 percent, of the article to the Family, poverty, welfare topic and 20 percent to the broad topic concerned with The social. Further down the list, the topic related to Quantitative methods accounts for some 7 percent, but this topic is notable because it is not wholly thematic: words like percent and data (prominent in this topic) occur together often when quantitative methods are used, regardless of subject matter. The article page also offers a link to the full text of the article on JSTOR. If the article was in a special issue of the journal, that issue’s name will also be displayed and linked to JSTOR.
Word pages
The page for an individual word represents all the topics in which that word appears. Search for a particular word using the text box in the upper righthand corner. (Clicking on topic terms on a topic page also brings you to the word page.) You may also select a word from the list of all prominent words. The topic model tends to divide occurrences of each word among multiple topics (for example, difference is prominent in both Politics of difference and French feminism). This visualization helps to see the relative prominence of a single term across the topics to which it belongs; difference is rather more prominent, relatively speaking, in the Politics of difference topic than in French feminism. Click any word on this page to display that word’s prominent topics.
The word index
The word index is an alphabetical list of words that are prominent in any topic. Click on a word to see its individual word page. This index does not include all the language of the articles, only words that are highly ranked in at least one topic. A more comprehensive search interface for the journal is available on JSTOR.
The bibliography
The bibliography lists citations for all the articles included in the model. By default, articles are listed by the issue of Signs they appear in. To look up particular authors’ contributions, you can also sort the list by author or by year and then by author. However you sort, you can also use your browser’s own “Find” function to locate a particular article by title on this page.
Not every item published in Signs is to be found in this index, only those included in our topic model. See our discussion of modeling choices for an explanation of our decisions.
Special issues
Signs is distinguished by its frequent thematic special issues. Though we have not incorporated the distinction between special and regular issues into the topic model itself, the browser can highlight articles drawn from special issues, and there are suggestive connections between certain topics and particular special issues. To turn this feature on, visit the Settings dialog and click the “Highlight special-issue articles” checkbox. The bibliography also indicates which issues were special issues.
Complexities
This topic model is an algorithmic creation, and its categories are necessarily fuzzy. Because we model only the co-occurrence of words, it is possible for “topics” not to be perfectly coherent semantically. In this model, several topics appear to combine independent bodies of discourse: thus the War / Germany topic covers both discussions of war more generally and discussions of Germany, but—thanks to the fact of World War II, it seems—the model has combined them. (Notice the appearance of the “word” ii—as in World War—in this topic.) We call this a “merged” topic; there are several others in this model.
Another possibility is the “intrusion” of a word into a topic where it seems out of place. This has the result of linking articles in which that intrusive word is prominent with a topic in a potentially misleading way. Thus, the common word life is, in this model, most prominent in a topic otherwise largely specific to discussions of Israel and Palestine. Some articles have a high proportion of this topic only because they use the word life, not because they discuss Israel and Palestine.
A few topics also appear not to capture clear patterns of meaning at all. These are hidden from view by default, as discussed in the section on hidden topics.
Modeling choices
Despite the complexity of modeling topics, words, and articles all at once, in other respects a topic model like the one presented here represents a drastic simplification. We have disregarded word order: our model considers only the number of times a given word occurs in a given article, not their placement in relation to one another. The model of article composition is instead as follows.
Suppose I want to write a Signs article, operating according to the assumptions made by the model:
- I begin with only one piece of information: the number of words I want to write. Let’s imagine I am writing a 5000-word article.
- I randomly choose proportions of each of the 70 topics that will make up the article (I don’t have an even chance of picking any topic; instead, some topics are more likely than others, as the topic proportions in the topic list suggest.)
- Now I divide up my 5000 words according to the proportions I have picked. If I have randomly picked 50% of the topic The Social and 25% each of topics Writing and reading and Labor, then I will have 2500 words to pick from The Social and 1250 words from each of the other two.
- Now I randomly pick the specified fraction of my total 5000 words from each topic. Picking words is like drawing Scrabble tiles from a bag: some words are much more likely than others. Each topic assigns distinctive probabilities to each word. When I take my 2500 words at random from The social, I am likely to come up the word social many times, power somewhat less often, nature occasionally but not often, and so on. When I draw 1250 words each from Writing and reading and Labor, the most frequently chosen words will be quite different.
- Leaving the words in their random order, and entirely neglecting to include prepositions, definite or indefinite articles, or punctuation, I submit the article to Signs and await publication.
Much is ignored by this model: not only the order of words but the identity of the author, the date of composition, even the title. Yet this drastic simplification allows us to hand the task of categorizing the language of 1866 Signs articles to the computer, and to use it to aid our reading of the journal as a whole and as a collection of very diverse parts.
The inputs to the algorithm consisted of the digitally scanned and encoded version of the journal, with bibliographic data and text as supplied by the staff of JSTOR’s Data for Research service.
The algorithmic nature of the technique does not mean the model presented here is free of interpretation. The major interpretive choice is, of course, the decision to offer the model in the first place. But other, technical choices are interpretive as well. The algorithm requires the user to specify the number of topics in advance; we have settled on 70 topics as yielding a suggestive and interesting model of our archive. We have also not taken every page of Signs into consideration; we have limited ourselves to articles (rather than reviews) and we have excluded short texts (those under 800 words). We have omitted from consideration a list of stop words (including very common words like the and and, personal names, and certain terms common to article reference lists) which distort the modeling results. The same applies to very rare words; we have also omitted words occurring four or fewer times in the whole corpus. Though it would be possible to combine morphologically related words, we have decided it is worth preserving the different verbal associations of, for example, woman and women.
The software
The source code for this website and the R scripts used to create the topic model displayed here are available at github.com/signs40th/topic-model. To construct the topic model, we have used MALLET by Andrew K. McCallum et al., as well as the R mallet package by David Mimno and the dfrtopics package by Andrew Goldstone. This site builds on dfr-browser by Andrew Goldstone. It also uses the code of the following open-source projects: d3 by Mike Bostock; bootstrap by Twitter, Inc.; jQuery by the jQuery Foundation; and JSZip by Stuart Knightley.
This site works best in recent web browsers and on screens that are at least 1500 pixels wide. The total data download is approximately one megabyte. On touchscreen devices, some features may not work as expected.
Select a topic from the “Topic” menu at left.
This topic appears to merge two relatively independent sets of related terms.
Brittney Cooper discusses this topic in her comment.
With an emphasis on sex work in transnational contexts.
With an emphasis on the language of psychoanalysis.
With an emphasis on science studies.
With a focus on beauty and consumption.
A difficult-to-interpret collection of “argument words.”
Susan Sidlauskas discusses this topic in her comment.
Cynthia Daniels discusses this topic in her comment.
Cynthia Daniels discusses this topic in her comment.
With a focus on the psychology of sex differences.
The prominent word life does not seem to fit well in this topic.
Mary Hawkesworth discusses this topic in her comment.
Catharine R. Stimpson discusses this topic in her comment.
The words in this topic seem to be prominent in the language of interviews and autobiographical writing.
Joanna Kempner discusses this topic in her comment.
Magdalena Grabowska discusses this topic in her comment.
Suzanna Danuta Walters discusses this topic in her comment.
Mary Hawkesworth discusses this topic in her comment.
Agatha Beins discusses this topic in her comment.
With an emphasis on epistemology. Mary Hawkesworth discusses this topic in her comment.
A difficult-to-interpret collection of very common words.
A difficult-to-interpret set of thematically disconnected terms.
Kathryn Norberg discusses this topic in her comment.
In their comments, Dana Britton and Danielle Phillips discuss this topic.
With an emphasis on music and fashion.
Catharine R. Stimpson discusses this topic in her comment.
Dana Britton discusses this topic in her comment.
Discussions of the field of women’s studies itself. Kayo Denda reflects on this topic in her comment.
Kayo Denda discusses this topic in her comment.
And transnationalism. Danielle Phillips discusses this topic in her comment.
Kayo Denda discusses this topic in her comment.
Top words
Word | Weight |
---|
Yearly proportion of words in topic
in total. Click a bar to limit articles to that year.
Top articles
There are no articles containing this topic.
Choose a specific article to view from the bibliography or from a topic page.
Special issue: . tokens. (view on JSTOR)
Topic | % | Tokens |
---|
Choose a specific word to view from the list of all words or from a topic page.
Prominent topics for
Click row labels to go to the corresponding topic page; click a word to show the topic list for that word.
Bibliography
All words prominent in any topic
Words not prominent in any topic are not listed