The structural topic model and applied social science
2013, Neural Information Processing Society
Abstract
We develop the Structural Topic Model which provides a general way to incorporate corpus structure or document metadata into the standard topic model. Document-level covariates enter the model through a simple generalized linear model framework in the prior distributions controlling either topical prevalence or topical content. We demonstrate the model's use in two applied problems: the analysis of open-ended responses in a survey experiment about immigration policy, and understanding differing media coverage of China's rise. * Prepared for the NIPS 2013 Workshop on Topic Models: Computation, Application, and Evaluation. A forthcoming R package implements the methods described here.
FAQs
AI
What distinct advantages does STM offer over traditional topic modeling approaches?
The research reveals that STM integrates document-level covariates affecting both topical prevalence and content, unlike traditional models. This allows for nuanced analysis tailored to specific corpus structures and research questions.
How does the Structural Topic Model utilize generalized linear models in practice?
The model specifies priors as generalized linear models, which enhance flexibility in estimating effects of observed covariates. This methodological innovation facilitates direct estimation of quantities relevant to applied social science research.
What evidence supports the effectiveness of STM in analyzing open-ended survey responses?
The study found that using STM significantly reduces analysis costs and effectively captures topic variations in responses, demonstrating its application with empirical data from a survey on immigration attitudes. Specifically, K=3 analysis revealed nuanced partisan effects in emotional responses to immigration.
How were document-level covariates incorporated into the STM analysis of news wire articles?
The model employed K=80 topics with time and news source as covariates, revealing important trends across years from 1997 to 2006. This approach allowed the assessment of how different sources portrayed events, like Taiwan’s elections, highlighting contrasting narratives.
What issues might arise from the adoption of advanced topic models in applied social science?
Despite their advantages, applied users find adopting models like STM challenging due to the specificity required for individual corpus structures. This barrier stems from the necessity of tailoring the model fitting to unique datasets and theoretical questions.
References (18)
- D. M. Blei. Probabilistic topic models. Communications of the ACM, 55(4):77-84, 2012.
- J. Grimmer and B. M. Stewart. Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(3):267-297, 2013.
- D. M. Blei and J. D. Lafferty. Dynamic topic models. In ICML, 2006.
- M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In UAI, 2004.
- A. Ahmed and E.P. Xing. Staying informed: supervised and semi-supervised multi-view topical analysis of ideological perspective. In EMNLP, pages 1140-1150, 2010.
- J. Eisenstein, B. O'Connor, N.A. Smith, and E.P. Xing. A latent variable model for geographic lexical variation. In EMNLP, pages 1277-1287, 2010.
- D. M. Blei and J. D. Lafferty. A correlated topic model of science. AAS, 1(1):17-35, 2007.
- D. Mimno and A. McCallum. Topic models conditioned on arbitrary features with dirichlet-multinomial regression. In UAI, 2008.
- J. Eisenstein, A. Ahmed, and E. P. Xing. Sparse additive generative models of text. In ICML, pages 1041-1048, 2011.
- M. Taddy. Multinomial inverse regression for text analysis. JASA, 108(503):755-770, 2013.
- C. Wang and D. M. Blei. Variational inference in nonconjugate models. JMLR, 14:1005-1031, 2013.
- J. Bischof and E. M. Airoldi. Summarizing topical content with word frequency and exclusivity. In ICML, pages 201-208, 2012.
- M. E. Roberts, B. M. Stewart, D. Tingley, C. Lucas, J. Leder-Luis, S. Gadarian, B. Albertson, and D. Rand. Structural topic models for open-ended survey responses. Am. Journal of Political Science, Forthcoming.
- D. M. Blei and J. D. McAuliffe. Supervised topic models. In NIPS, 2007.
- D. Ramage, C. D. Manning, and S. Dumais. Partially labeled topic models for interpretable text mining. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 457-465. ACM, 2011.
- M. Paul and M. Dredze. Factorial lda: Sparse multi-dimensional text models. In NIPS, pages 2591-2599, 2012.
- S.K. Gadarian and B. Albertson. Anxiety, immigration, and the search for information. Political Psychol- ogy, 2013.
- G. King, J. Pan, and M. E. Roberts. How censorship in china allows government criticism but silences collective expression. American Political Science Review, 107:1-18, 2013.