A Survey of Automatic Personality Detection from Texts
2020
Abstract
Personality profiling has long been used in psychology to predict life outcomes. Recently, automatic detection of personality traits from written messages has gained significant attention in computational linguistics and natural language processing communities, due to its applicability in various fields. In this survey, we show the trajectory of research towards automatic personality detection from purely psychology approaches, through psycholinguistics, to the recent purely natural language processing approaches on large datasets automatically extracted from social media. We point out what has been gained and what lost during that trajectory, and show what can be realistic expectations in the field.
Key takeaways
AI
AI
- Automatic personality detection has evolved from psychology-based methods to NLP techniques using social media data.
- The Big 5 model offers continuous trait assessment, while MBTI uses discrete typologies, impacting detection accuracy.
- Research indicates that personality detection models trained on Facebook data outperform those trained on Twitter data.
- Ethical concerns arise from using social media data for personality profiling without user consent, risking algorithmic bias.
- Future research should bridge psychology and NLP to set realistic expectations for personality detection from textual data.
References (62)
- Ivana Anusic, Ulrich Schimmack, Rebecca T. Pinkus, and Penelope Lockwood. 2009. The nature and structure of correlations among Big Five ratings: The halo-alpha-beta model. Journal of Personality and Social Psychology, 97(6):1142-1156.
- Shlomo Argamon, Sushant Dhawle, Moshe. Koppel, and James W. Pennebaker. 2005. Lexical predictors of personality type. In Proceedings of the Joint Annual Meeting of the Interface and the Classification Society of North America.
- Tolga Bolukbasi, Kai-Wei Chang, James Y. Zou, Venkatesh Saligrama, and Adam T. Kalai. 2016. Man is to com- puter programmer as woman is to homemaker? debiasing word embeddings. In Advances in neural information processing systems, pages 4349-4357.
- Isabel Briggs-Myers and Peter B. Myers. 1995. Gifts differing: Understanding personality type. Davies-Black Publishing.
- Justine Cassell and Timothy Bickmore. 2003. Negotiated collusion: Modeling social language and its relationship effects in intelligent agents. User Modeling and User-Adapted Interaction, 13:89-132.
- Raymond B. Cattell. 1946. The description and measurement of personality. Yonkers-on-Hudson.
- Fabio Celli and Bruno Lepri. 2018. Is Big Five Better than MBTI? A Personality Computing Challenge Using Twitter Data. In CLiC-it.
- Max Coltheart. 1981. The mrc psycholinguistic database. The Quarterly Journal of Experimental Psychology Section A, 33(4):497-505.
- Paul T. Costa, Jr and Robert R. McCrae. 1992. Revised NEO Personality Inventory (Neo-PI-R) and NEO Five- Factor Inventory (NEO-FFI): Professional manual. Psychological Assessment Resources.
- Martha E. Francis and James W. Pennebaker. 1993. Linguistic inquiry and word count. Dallas, TX: Southern Methodist University.
- David C. Funder and Carl D. Sneed. 1993. Behavioral manifestations of personality: An ecological approach to judgmental accuracy. Journal of Personality and Social Psychology, 64(3):479--490.
- Adrian Furnham and John Crump. 2005. Personality traits, types, and disorders: an examination of the relationship between three self-report measures. European Journal of Personality, 19(3).
- Adrian Furnham, 1990. Handbook of Language and Social Psychology, chapter Language and personality. Win- ley.
- Alastair J. Gill and Jon Oberlander. 2002. Taking care of the linguistic features of extraversion. In Proceedings of the 24th Annual Conference of the Cognitive Science Society, pages 363--368.
- Alastair J. Gill and Jon Oberlander. 2003. Perception of e-mail personality at zero-acquaintance: Extraversion takes care of itself; neuroticism is a worry. In Proceedings of the 25th Annual Conference of the Cognitive Science Society, pages 456--461.
- Joe J. Gladstone, Sandra C. Matz, and Alain Lemaire. 2019. Can Psychological Traits Be Inferred From Spending? Evidence From Transaction Data." Psychological science. Psychological Science, 30(7):1087-1096.
- Valerie Priscilla Goby. 2006. Personality and online/offline choices: MBTI profiles and favored communication modes in a Singapore study. CyberPsychology Behavior, 9:5-13.
- Lewis R. Goldberg. 1982. From Ace to Zombie: Some explorations in the language of personality. Advances in personality assessment, 1:203-234.
- Steven J. Heine, Darrin. R. Lehman, Kaiping Peng, and Joe Greenholtz. 2002. What's wrong with cross-cultural comparisons of subjective likert scales?: The reference-group effect. Journal of Personality and Social Psy- chology, 82(6):903--918.
- Christina U. Heinrich and Peter Borkenau. 1998. Deception and deception detection: The role of cross-modal inconsistency. Journal of Personality, 66(5):687--712.
- Guido Hertel, Joachim Schroer, Bernad Batinic, and Sonja Naumann. 2008. Do shy people prefer to send e-mail? Personality effects on communication media preferences in threatening and nonthreatening situations. Social Psychology, 39(4):231-243.
- Francisco Iacobelli, Alastair J. Gill, Scott Nowson, and Jon Oberlander. 2011. Large scale personality classifi- cation of bloggers. In Proceedings of the 4th international conference on affective computing and intelligent interaction, pages 568-577.
- Carl G. Jung. 1921. Psychological Types: Volume 6. Routledge.
- Onno Kampman, Elham J. Barezi, Dario Bertero, and Pascale Fung. 2018. Investigating audio, video, and text fusion methods for end-to-end automatic personality prediction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 2: Short Papers, pages 606-611.
- Meera Komarraju and Steven J. Karau. 2005. The relationship between the big five personality traits and academic motivation. Personality and Individual Differences, 39:557--567.
- Michal Kosinski, David Stillwell, and Thore Graepell. 2013. Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, 15:5802-5805.
- Ivar Krumpal. 2011. Determinants of social desirability bias in sensitive surveys: a literature review. Quality Quantity, 47(4).
- Vivek Kulkarni, Margaret L. Kern, David Stillwell, Michal Kosinski, Sandra Matz, Lyle Ungar, Steven Skiena, and H. Andrew Schwartz. 2018. Latent human traits in the language of social media: An open-vocabulary approach. PloS one, 13(11).
- Xiaojuan Ma, Emily P. Yang, and Pascale Fung. 2019. Exploring perceived emotional intelligence of personality- driven virtual agents in handling user challenges. In The World Wide Web Conference, WWW 2019, San Fran- cisco, CA, USA, May 13-17, 2019, pages 1222-1233.
- Francois Mairesse and Marilyn Walker. 2007. Personage: Personality generation for dialogue. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL).
- Francois Mairesse and Marilyn Walker. 2008. Trainable generation of big-five personality styles through data- driven parameter estimation. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL).
- Francois Mairesse and Marilyn Walker. 2010. Towards personality-based user adaptation: Psychologically in- formed stylistic language generation. User Modeling and User-Adapted Interaction, 20(3):227-278.
- Francois Mairesse and Marilyn Walker. 2011. Controlling user perceptions of linguistic style: Trainable genera- tion of personality traits. Computational Linguistics, 37(3).
- Franc ¸ois Mairesse, Marilyn A. Walker, Matthias R. Mehl, and Roger K. Moore. 2007. Using linguistic cues for the automatic recognition of personality in conversation and text. J. Artif. Int. Res., 30(1):457-500, November.
- Sandra C. Matz and Oded Netzer. 2017. Using big data as a window into consumers' psychology. Current opinion in behavioral sciences, 18:7-12.
- Sandra C. Matz, Joe J. Gladstone, and David Stillwell. 2016. Money buys happiness when spending fits our personality. Psychological science, 27(5):715-725.
- Sandra C. Matz, Michal Kosinski, Gideon Nave, and David J. Stillwell. 2017. Psychological targeting as an effective approach to digital mass persuasion. PNAS, 114:12714-12719.
- Robert R. McCrae and Paul T. Jr. Costa. 1989. Reinterpreting the Myers-Briggs type indicator from the perspective of the five-factor model of personality. Journal of personality 57, 57(1):17-40.
- Amber R. McLarney-Vesotski, Frank Bernieri, and Daniel Rempala. 2006. Personality perception: A develop- mental study. Journal of Research in Personality, 40(5):652--674.
- Matthias R. Mehl, Samuel D. Gosling, and James W. Pennebaker. 2006. Personality in its natural habitat: Manifes- tations and implicit folk theories of personality in daily life. Personality and Social Psychology, 90:862--877.
- Jon Oberlander and Scott Nowson. 2006. Whose thumb is it anyway? classifying author personality from weblog text. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics (ACL).
- Gregory Park, H. Andrew Schwartz, Johannes C. Eichstaedt, Margaret L. Kern, Michal Kosinski, David J. Stillwell, Lyle H. Ungar, and Martin E.P. Seligman. 2015. Automatic personality assessment through social media language. Journal of personality and social psychology, 108:934-952.
- James W. Pennebaker and Laura A. King. 1999. Linguistic styles: Language use as an individual difference. Journal of personality and social psychology, 77(6):1296-1312.
- James W. Pennebaker, Matthias R. Mehl, and Niederhoffer Kate G. 2003. Psychological aspects of natural language use: Our words, our selves. Annual review of psychology, 54(1):547-577.
- James W. Pennebaker, Ryan L. Boyd, Kayla Jordan, and Kate Blackburn. 2015. The development and psychomet- ric properties of LIWC2015. Environment and Planning D: Society and Space.
- James W. Pennebaker. 2011. The secret life of pronouns: What our words say about us. Bloomsbury Press.
- Barbara Plank and Dirk Hovy. 2015. Personality traits on twitter-or-how to get 1,500 personality tests in a week. In Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pages 92-98, Lisbon, Portugal, September. Association for Computational Linguistics.
- Brent W. Roberts and Daniel Mroczek. 2008. Personality trait change in adulthood. Current directions in psycho- logical science, 17:31-35.
- Klaus R. Scherer. 2003. Vocal communication of emotion: A review of research paradigms. Speech Communica- tion, 40 (1-2):227--256.
- Christophe Olivier Schneble, Bernice Simone Elger, and David Shaw. 2018. The Cambridge Analytica affair and Internet-mediated research. EMBO reports, 19(8).
- H. Andrew Schwartz, Johannes C. Eichstaedt, Margaret L. Kern, Lukasz Dziurzynski, Stephanie M. Ramones, Megha Agrawal, Achal Shah, Michal Kosinski, David Stilwell, Martin E. Seligman, and Lyle H. Ungar. 2013. Personality, gender, and age in the language of social media: The open vocabulary approach. PLOS ONE, 8.
- Soomin Kim Seo-young Lee, Gyuho Lee and Joonhwan Lee. 2019. Expressing personalities of conversational agents through visual and verbal feedback. Electronics, 8.
- Clemens Stachl, Florian Pargent, Sven Hilbert, Gabriella M. Harari, Ramona Schoedel, Sumer Vaid, Sam Gosling, and Bühner Markus. 2019. Personality Research and Assessment in the Era of Machine Learning.
- Yla R. Tausczik and James W. Pennebaker. 2010. The psychological meaning of words: LIWC and computerized text analysis methods. Journal of language and social psychology, 29(1):24-54.
- Ernest C. Tupes and Raymond E. Christal. 1961. Recurrent personality factors based on trait ratings. USAF ASD Technical Report, pages 61-97.
- Ben Verhoeven, Walter Daelemans, and Barbara Plank. 2016. Twisty: a multilingual twitter stylometry corpus for gender and personality profiling. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pages 1632-1637, Portoroz, Slovenia, May. European Language Resources As- sociation (ELRA).
- Shichao Wang and Xi Chen. 2019. Recognizing CEO personality and its impact on business performance: Mining linguistic cues from social media. Information Management.
- William D. Wells. 1975. Psychographics: A critical review. Journal of marketing research , 12:196-213.
- Youyou Wu, Michal Kosinski, and David Stillwell. 2015. Computer-based personality judgments are more accu- rate than those made by humans. Proceedings of the National Academy of Sciences, 112(4):1036-1040.
- Youyou Wu, David Stillwell, H. Andrew Schwartz, and Michal Kosinski. 2017. Birds of a feather do flock to- gether: behavior-based personality-assessment method reveals personality similarity among couples and friends. Psychological Science, 28:276-284.
- Kosuke Yamada, Ryohei Sasano, and Koichi Takeda. 2019. Incorporating textual information on user behavior for personality prediction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 177-182, Florence, Italy, July. Association for Computational Linguistics.
- Mingzhi Yu, Emer Gilmartin, and Diane J. Litman. 2019. Identifying personality traits using overlap dynamics in multiparty dialogue. ArXiv, abs/1909.00876.