The structural topic model and applied social science

Edoardo Airoldi

Outline

The structural topic model and applied social science

Edoardo Airoldi

2013, Neural Information Processing Society

Abstract

We develop the Structural Topic Model which provides a general way to incorporate corpus structure or document metadata into the standard topic model. Document-level covariates enter the model through a simple generalized linear model framework in the prior distributions controlling either topical prevalence or topical content. We demonstrate the model's use in two applied problems: the analysis of open-ended responses in a survey experiment about immigration policy, and understanding differing media coverage of China's rise. * Prepared for the NIPS 2013 Workshop on Topic Models: Computation, Application, and Evaluation. A forthcoming R package implements the methods described here.

FAQs

What distinct advantages does STM offer over traditional topic modeling approaches?add

The research reveals that STM integrates document-level covariates affecting both topical prevalence and content, unlike traditional models. This allows for nuanced analysis tailored to specific corpus structures and research questions.

How does the Structural Topic Model utilize generalized linear models in practice?add

The model specifies priors as generalized linear models, which enhance flexibility in estimating effects of observed covariates. This methodological innovation facilitates direct estimation of quantities relevant to applied social science research.

What evidence supports the effectiveness of STM in analyzing open-ended survey responses?add

The study found that using STM significantly reduces analysis costs and effectively captures topic variations in responses, demonstrating its application with empirical data from a survey on immigration attitudes. Specifically, K=3 analysis revealed nuanced partisan effects in emotional responses to immigration.

How were document-level covariates incorporated into the STM analysis of news wire articles?add

The model employed K=80 topics with time and news source as covariates, revealing important trends across years from 1997 to 2006. This approach allowed the assessment of how different sources portrayed events, like Taiwan’s elections, highlighting contrasting narratives.

What issues might arise from the adoption of advanced topic models in applied social science?add

Despite their advantages, applied users find adopting models like STM challenging due to the specificity required for individual corpus structures. This barrier stems from the necessity of tailoring the model fitting to unique datasets and theoretical questions.

Figures (3)

After describing the model, we demonstrate the use of STM to analyze two social science questions using open-ended responses from a survey experiment and an international newswire corpus. Figure 1: Plate Diagram for the Structural Topic Model

Figure 2: Party ID, Treatment, and the Predicted Proportion in Fear Topic (1 of 3) Survey researchers often face a difficult to decision about whether to employ open-ended responses which can be costly to analyze by humans, or close-ended responses which require the a priori spec- ification of possible responses. We argue that STM can substantially lower the costs of analyzing open-ended responses. We use open-ended survey responses collected by [17] who study how neg- atively valanced emotions affect political attitudes. In one design, they use a survey experiment to study how encouraging subjects to be worried about immigration influences their reaction to immi- gration policy. Using a K = 3 topic model we estimate the influence of the encouragement on the open-ended responses, conditioning on treatment and the stated party of the respondent. We find that the treatment makes respondents talk more about security and welfare concerns, while the control group stresses citizenship and the challenges immigrants face (Figure 2). The treatment effect is greater for Republicans than for Democrats, indicating that Republicans are more likely to respond to fear-provoking encouragements about immigration than Democrats. In the parlance of standard terminology, partisan ID moderates the effect of the treatment.

Figure 3: Taiwanese Presidential Election Topic (1 of 80) with news-source specific content (2 of 5)

References (18)

D. M. Blei. Probabilistic topic models. Communications of the ACM, 55(4):77-84, 2012.
J. Grimmer and B. M. Stewart. Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(3):267-297, 2013.
D. M. Blei and J. D. Lafferty. Dynamic topic models. In ICML, 2006.
M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In UAI, 2004.
A. Ahmed and E.P. Xing. Staying informed: supervised and semi-supervised multi-view topical analysis of ideological perspective. In EMNLP, pages 1140-1150, 2010.
J. Eisenstein, B. O'Connor, N.A. Smith, and E.P. Xing. A latent variable model for geographic lexical variation. In EMNLP, pages 1277-1287, 2010.
D. M. Blei and J. D. Lafferty. A correlated topic model of science. AAS, 1(1):17-35, 2007.
D. Mimno and A. McCallum. Topic models conditioned on arbitrary features with dirichlet-multinomial regression. In UAI, 2008.
J. Eisenstein, A. Ahmed, and E. P. Xing. Sparse additive generative models of text. In ICML, pages 1041-1048, 2011.
M. Taddy. Multinomial inverse regression for text analysis. JASA, 108(503):755-770, 2013.
C. Wang and D. M. Blei. Variational inference in nonconjugate models. JMLR, 14:1005-1031, 2013.
J. Bischof and E. M. Airoldi. Summarizing topical content with word frequency and exclusivity. In ICML, pages 201-208, 2012.
M. E. Roberts, B. M. Stewart, D. Tingley, C. Lucas, J. Leder-Luis, S. Gadarian, B. Albertson, and D. Rand. Structural topic models for open-ended survey responses. Am. Journal of Political Science, Forthcoming.
D. M. Blei and J. D. McAuliffe. Supervised topic models. In NIPS, 2007.
D. Ramage, C. D. Manning, and S. Dumais. Partially labeled topic models for interpretable text mining. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 457-465. ACM, 2011.
M. Paul and M. Dredze. Factorial lda: Sparse multi-dimensional text models. In NIPS, pages 2591-2599, 2012.
S.K. Gadarian and B. Albertson. Anxiety, immigration, and the search for information. Political Psychol- ogy, 2013.
G. King, J. Pan, and M. E. Roberts. How censorship in china allows government criticism but silences collective expression. American Political Science Review, 107:1-18, 2013.

The structural topic model and applied social science

Sign up for access to the world's latest research

Abstract

FAQs

Related papers

References (18)

Related papers