Twitter Content-Based Spam Filtering

Santos, Igor; Miñambres-Marcos, Igor; Laorden, Carlos; Galán-García, Patxi; Santamaría-Ibirika, Aitor; Bringas, Pablo García

doi:10.1007/978-3-319-01854-6_46

Outline

Twitter Content-Based Spam Filtering

Igor Santos

2014, Advances in Intelligent Systems and Computing

https://doi.org/10.1007/978-3-319-01854-6_46

visibility

…

description

10 pages

link

1 file

Abstract

Twitter has become one of the most used social networks. And, as happens with every popular media, it is prone to misuse. In this context, spam in Twitter has emerged in the last years, becoming an important problem for the users. In the last years, several approaches have appeared that are able to determine whether an user is a spammer or not. However, these blacklisting systems cannot filter every spam message and a spammer may create another account and restart sending spam. In this paper, we propose a content-based approach to filter spam tweets. We have used the text in the tweet and machine learning and compression algorithms to filter those undesired tweets.

References (43)

Thomas, K., Grier, C., Song, D., Paxson, V.: Suspended accounts in retrospect: an analysis of twitter spam. In: Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference, ACM (2011) 243-258
Bratko, A., Filipič, B., Cormack, G., Lynam, T., Zupan, B.: Spam filtering using statistical data compression models. The Journal of Machine Learning Research 7 (2006) 2673-2698
Jagatic, T., Johnson, N., Jakobsson, M., Menczer, F.: Social phishing. Communi- cations of the ACM 50(10) (2007) 94-100
Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on twitter. In: Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS). (2010)
Grier, C., Thomas, K., Paxson, V., Zhang, M.: @spam: The underground on 140 characters or less. In: Proceedings of the 17th ACM conference on Computer and communications security, ACM (2010) 27-37
Wang, A.H.: Don't follow me: Spam detection in twitter. In: Security and Cryp- tography (SECRYPT), Proceedings of the 2010 International Conference on, IEEE (2010) 1-10
Gao, H., Chen, Y., Lee, K., Palsetia, D., Choudhary, A.: Towards online spam filtering in social networks. In: Symposium on Network and Distributed System Security (NDSS). (2012)
Ahmed, F., Abulaish, M.: A generic statistical approach for spam detection in online social networks. Computer Communications (2013) in press.
Martinez-Romo, J., Araujo, L.: Detecting malicious tweets in trending topics using a statistical analysis of language. Expert Systems with Applications (2012)
Sebastiani, F.: Machine learning in automated text categorization. ACM comput- ing surveys (CSUR) 34(1) (2002) 1-47
Lewis, D.: Naive (Bayes) at forty: The independence assumption in information retrieval. Lecture Notes in Computer Science 1398 (1998) 4-18
Schneider, K.: A comparison of event models for Naive Bayes anti-spam e-mail filtering. In: Proceedings of the 10 th Conference of the European Chapter of the Association for Computational Linguistics. (2003) 307-314
Androutsopoulos, I., Koutsias, J., Chandrinos, K., Spyropoulos, C.: An experi- mental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages. In: Proceedings of the 23 rd annual international ACM SIGIR conference on Research and development in information retrieval. (2000) 160-167
Seewald, A.: An evaluation of naive Bayes variants in content-based learning for spam filtering. Intelligent Data Analysis 11(5) (2007) 497-524
Vapnik, V.: The nature of statistical learning theory. Springer (2000)
Drucker, H., Wu, D., Vapnik, V.: Support vector machines for spam categorization. IEEE Transactions on Neural networks 10(5) (1999) 1048-1054
Blanzieri, E., Bryl, A.: Instance-based spam filtering using SVM nearest neighbor classifier. Proceedings of FLAIRS-20 (2007) 441-442
Sculley, D., Wachman, G.: Relaxed online SVMs for spam filtering. In: Proceed- ings of the 30 th annual international ACM SIGIR conference on Research and development in information retrieval. (2007) 415-422
Quinlan, J.: Induction of decision trees. Machine learning 1(1) (1986) 81-106
Carreras, X., Márquez, L.: Boosting trees for anti-spam email filtering. In: Pro- ceedings of RANLP-01, 4th international conference on recent advances in natural language processing, Citeseer (2001) 58-64
Zhang, L., Zhu, J., Yao, T.: An evaluation of statistical spam filtering tech- niques. ACM Transactions on Asian Language Information Processing (TALIP) 3(4) (2004) 243-269
Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Communications of the ACM 18(11) (1975) 613-620
Wittel, G., Wu, S.: On attacking statistical spam filters. In: Proceedings of the 1 st Conference on Email and Anti-Spam (CEAS). (2004)
Pearl, J.: Reverend bayes on inference engines: a distributed hierarchical approach. In: Proceedings of the National Conference on Artificial Intelligence. (1982) 133- 136
Bayes, T.: An essay towards solving a problem in the doctrine of chances. Philo- sophical Transactions of the Royal Society 53 (1763) 370-418
Castillo, E., Gutiérrez, J.M., Hadi, A.S.: Expert Systems and Probabilistic Network Models. Erste edn., New York, NY, USA (1996)
Breiman, L.: Random forests. Machine learning 45(1) (2001) 5-32
Garner, S.: Weka: The Waikato environment for knowledge analysis. In: Proceed- ings of the 1995 New Zealand Computer Science Research Students Conference. (1995) 57-64
Quinlan, J.: C4. 5 programs for machine learning. Morgan Kaufmann Publishers (1993)
Fix, E., Hodges, J.L.: Discriminatory analysis: Nonparametric discrimination: Small sample performance. technical report project 21-49-004, report number 11. Technical report, USAF School of Aviation Medicine, Randolf Field, Texas (1952)
Amari, S., Wu, S.: Improving support vector machine classifiers by modifying kernel functions. Neural Networks 12(6) (1999) 783-789
Begleiter, R., El-Yaniv, R., Yona, G.: On prediction using variable order markov models. J. Artif. Intell. Res. (JAIR) 22 (2004) 385-421
Cleary, J., Witten, I.: Data compression using adaptive coding and partial string matching. Communications, IEEE Transactions on 32(4) (1984) 396-402
Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory 24(5) (1978) 530-536
Nisenson, M., Yariv, I., El-Yaniv, R., Meir, R.: Towards behaviometric security systems: Learning to identify a typist. In: Knowledge Discovery in Databases: PKDD 2003, Springer (2003) 363-374
Willems, F.: The context-tree weighting method: Extensions. Information Theory, IEEE Transactions on 44(2) (1998) 792-798
Volf, P.A.J.: Weighting techniques in data compression: Theory and algorithms. Citeseer (2002)
Ron, D., Singer, Y., Tishby, N.: The power of amnesia: Learning probabilistic automata with variable memory length. Machine learning 25(2) (1996) 117-149
Cormack, G., Horspool, R.: Data compression using dynamic markov modelling. The Computer Journal 30(6) (1987) 541-550
Cormack, G., Gómez Hidalgo, J., Sánz, E.: Spam filtering for short messages. In: Proceedings of the 16th ACM conference on Conference on information and knowledge management, ACM (2007) 313-320
Cormack, G., Hidalgo, J., Sánz, E.: Feature engineering for mobile(sms) spam filtering. In: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. Volume 23. (2007) 871-872
Santos, I., Laorden, C., Sanz, B., Bringas, P.G.: Enhanced topic-based vector space model for semantics-aware spam filtering. Expert Systems With Applications 39(1) 437-444 doi:10.1016/j.eswa.2011.07.034.
Laorden, C., Santos, I., Sanz, B., Alvarez, G., Bringas, P.G.: Word sense disam- biguation for spam filtering. Electron. Commer. Rec. Appl. 11(3) (May 2012) 290-298

Twitter Content-Based Spam Filtering

Sign up for access to the world's latest research

Abstract

Related papers

References (43)

Related papers

Related topics