The DARPA Twitter Bot Challenge

Vadim Kagan

doi:10.1109/MC.2016.183

Outline

The DARPA Twitter Bot Challenge

Vadim Kagan

2016, Computer

https://doi.org/10.1109/MC.2016.183

visibility

…

description

17 pages

link

1 file

Abstract

A number of organizations ranging from terrorist groups such as ISIS to politicians and nation states reportedly conduct explicit campaigns to influence opinion on social media, posing a risk to democratic processes. There is thus a growing need to identify and eliminate "influence bots"-realistic, automated identities that illicity shape discussion on sites like Twitter and Facebook-before they get too influential. Spurred by such events, DARPA held a 4-week competition in February/March 2015 in which multiple teams supported by the DARPA Social Media in Strategic Communications program competed to identify a set of previously identified "influence bots" serving as ground truth on a specific topic within Twitter. Past work regarding influence bots often has difficulty supporting claims about accuracy, since there is limited ground truth (though some exceptions do exist [3,7]). However, with the exception of [3], no past work has looked specifically at identifying influence bots on a specific topic. This paper describes the DARPA Challenge and describes the methods used by the three top-ranked teams.

Figures (14)

analysts to query the profiles, sort them in descending order of columns (see top of Figure 1). Figure 1. Sentimetrix Dashboard to view Twitter user information. variables cue an analyst about whether a person is suspicious. The Sentimetrix Dashboard allows

Figure 2. Top of detailed screen about a Competition Bot

Distance Measures. The Indiana team identified additional bots by computing the cosine similarity between users and known bots. Figure 4 shows the kernel density estimation of the pairwise cosine distance between pairs of feature vectors characterizing two bots, compared to bot-human pairs. The distances between bot pairs are much smaller than bot-human pairs. The bot-bot distance exhibits a bimodal distribution that reflects the presence of two types of bots designed by two teams. Sentimetri: achieved similar success using Jaccard distance. distance between pairs of feature vectors characterizing two bots, compared to bot-human pairs. The

Figure 4. Kernel density estimation of the cosine Distance between bot-bot pairs (blue) and bot- human pairs.

Vadim Kagan is a technologist with over 30 years of experience in building large-scale systems; he is a founder and president of SentiMetrix

Linhong Zhu is a Computer Scientist at the Information Sciences Institute, University of Southern California.

Emilio Ferrara was a Research Scientist at the Indiana University Network Science Institute. Currently he is a Computer Scientist at the University of Southern California's Information Sciences Institute. Emilio Ferrara was a Research Scientist at the Indiana University Network Science Institute. Currently he

References (32)

Dickerson, J. P., Kagan, V., & Subrahmanian, V. S. (2014, August). Using sentiment to detect bots on Twitter: Are humans more opinionated than bots?. In Advances in Social Networks Analysis and Mining (ASONAM), 2014 IEEE/ACM International Conference on (pp. 620--627). IEEE.
Kata, A. (2012). Anti--vaccine activists, Web 2.0, and the postmodern paradigm-An overview of tactics and tropes used online by the anti--vaccination movement. Vaccine, 30(25), 3778-- 3789.
K. Lee, B. D. Eoff, and J. Caverlee, "Seven months with the devils: a long--term study of content polluters on Twitter," in AAAI International Conference on Weblogs and Social Media, 2011.
Kagan, V., Stevens, A. and Subrahmanian, V.S.. Using Twitter Sentiment to Forecast the 2013 Pakistani Election and the 2014 Indian Election. IEEE Intelligent Systems, pp. 2--5, Jan-- Feb 2015.
Danezis, G. and Mittal, P. "SybilInfer: Detecting sybil nodes using social networks," in Network and Distributed System Security Symposium (NDSS), 2009.
Yu, H., Kaminsky, M., Gibbons, P. B., & Flaxman, A. (2006, September). Sybilguard: defending against sybil attacks via social networks. In ACM SIGCOMM Computer Communication Review (Vol. 36, No. 4, pp. 267--278). ACM.
Weizenbaum, J. (1966). ELIZA-a computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1), 36--45.
Simmons, R. F. (1970). Natural language question--answering systems: 1969. Communications of the ACM, 13(1), 15--30.
Z. Chu, S. Gianvecchio, H. Wang, and S. Jajodia. "Detecting automation of Twitter accounts: Are you a human, bot, or cyborg?" IEEE Transactions on Dependable and Secure Computing, Vol. 9, no. 6, pp. 811-824, 2012.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning research, 3, 993--1022.
Subrahmanian, V. S., & Reforgiato, D. (2008). AVA: Adjective--verb--adverb combinations for sentiment analysis. Intelligent Systems, IEEE, 23(4), 43--50.
Cesarano, C., Dorr, B., Picariello, A., Reforgiato, D., Sagoff, A., & Subrahmanian, V. (2004). Oasys: An opinion analysis system. In AAAI spring symposium on computational approaches to analyzing weblogs.
Hong, L. and Davison, B.D. Empirical study of topic modeling in twitter. In Workshop on Social Media Analytics, 2010.
Cai, D. A. (2011). Graph Regularized Nonnegative Matrix Factorization for Data Representation. IEEE Trans. Pattern Anal. Mach. Intell., 1548----1560.
Kriegel, H. P., & Pfeifle, M. (2005, August). Density--based clustering of uncertain data. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining (pp. 672--677). ACM.
Ratkiewicz, J., Conover, M., Meiss, M., Gonçalves, B., Patil, S., Flammini, A., & Menczer, F. (2011, March). Truthy: mapping the spread of astroturf in microblog streams. In Proceedings of the 20th international conference companion on World wide web (pp. 249--252). ACM.
P.R. Gregory. Inside Putin's Campaign Of Social Media Trolling And Faked Ukrainian Crimes, Forbes.com, May 11 2014, http://www.forbes.com/sites/paulroderickgregory/2014/05/11/inside--putins--campaign--of-- social--media--trolling--and--faked--ukrainian--crimes/
Yoav Freund and Robert E. Schapire (1997). A decision--theoretic generalization of online learning and an application to boosting. Journal of Computer and System Sciences; 55(1):119----139.
Linhong Zhu, Aram Galstyan, James Cheng, Kristina Lerman, "Tripartite Graph Clustering for Dynamic Sentiment Analysis on Social Media ", in Proc. of SIGMOD'14, 2014.
R. Ghosh, T. Surachawala, and K. Lerman, "Entropy--based Classification of 'Retweeting' Activity on Twitter," in SNA--KDD, 2011.
Greg Ver Steeg and Aram Galstyan, "Information Transfer in Social Media," in Proc. of WWW'12, 2012.
Vincent D Blondel, Jean--Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre, "Fast unfolding of communities in large networks", Vincent D Blondel, Jean--Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre, Journal of Statistical Mechanics: Theory and Experiment 2008 (10), P10008 (12pp) doi: 10.1088/1742--5468/2008/10/P10008.
Kumar, S., Spezzano, F., & Subrahmanian, V. S. (2014, August). Accurately detecting trolls in Slashdot Zoo via decluttering. In Advances in Social Networks Analysis and Mining (ASONAM), 2014 IEEE/ACM International Conference on (pp. 188--195). IEEE.
Shane, S., & Hubbard, B. ( Aug 20 2014). ISIS Displaying a Deft Command of Varied Media. New York Times, http://www.nytimes.com/2014/08/31/world/middleeast/isis--displaying--a-- deft--command--of--varied--media.html
Breunig, M. M., Kriegel, H. P., Ng, R. T., & Sander, J. (2000, May). LOF: identifying density-- based local outliers. In ACM sigmod record (Vol. 29, No. 2, pp. 93--104). ACM.
Aiello, L. M., Deplano, M., Schifanella, R., & Ruffo, G. (2012). People are Strange when you're a Stranger: Impact and Influence of Bots on Social Networks. Links, 697(483,151), 1-- 566.
S. Lehmann and P. Sapieżyński. You're here Because of a Robot. http://sunelehmann.com/2013/12/04/youre--here--because--of--a--robot/
Nissen, T. E. (2014). Terror.com: IS's Social Media Warfare in Syria and Iraq. Contemporary Conflicts: Military Studies Magazine, 2(2).
V.S. Subrahmanian is a Professor of Computer Science, a past Director of the University of Maryland Institute for Advanced Computer Studies, and a founder of Sentimetrix.
Amos Azaria is a postdoctoral researcher at Carnegie Mellon University in the Machine Learning department.
Skylar Durst is a graduate student of Computer Science at the California Polytechnic State University (SLO) and full--time data architect at Sentimetrix Vadim Kagan is a technologist with over 30 years of experience in building large--scale systems; he is a founder and president of SentiMetrix Aram Galstyan is a Project Leader at the USC Information Sciences Institute and a Research Associate Professor of Computer Science at USC Kristina Lerman is a Project Leader at the Information Sciences Institute and holds a joint appointment as a Research Associate Professor in the USC Computer Science Department.
Alessandro Flammini is an associate professor in the School of Informatics and Computing at Indiana University.

The DARPA Twitter Bot Challenge

Sign up for access to the world's latest research

Abstract

Related papers

References (32)

Related papers

Related topics

Cited by