Spammer Detection In Social Bookmarking Systems
2012
Abstract
Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2012Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2012Sosyal imleme sistemleri, Web kullanıcılarına, kaynaklarını depolama, organize etme ve bu kaynakların içinde arama yapma olunağı sunmaktadır. Buna ek olarak, bu sistemler, kullanıcılarına, Web üzerinde depolanmış kaynaklarını, diger kullanıcılarla paylaşma fırsatı da sunmaktadır. Bu kullanıcılar, sosyal imleme sitelerinde ortak ilgi alanlarına göre çeşitli gruplara üye olup, aktivitelerde bulunabilirler. Sosyal imleme veya benzer sistemlerin yaygın olmasının nedeni çevrimiçi çalışmaları ve kolay kullanılabilmeleridir. Kullanıcılar herhangi bir yerden internete baglanarak, hesaplarına ulaşabilir ve yönetebilirler. Bu sistemlerde dil kısıtlaması olmadıgı için, kullanıcılar istedikleri dilde etiketleme yapabilmektedir. Son dönemlerde, bu sistemlerin yaygın kullanımıyla beraber büyük bir veri hacmi oluşmuştur. Bu veri...
References (38)
- FOREWORD........................................................................................................... vii TABLE OF CONTENTS........................................................................................ ix ABBREVIATIONS ................................................................................................. xi LIST OF TABLES .................................................................................................. xiii LIST OF FIGURES ................................................................................................ xv LIST OF SYMBOLS ..............................................................................................xvii SUMMARY ............................................................................................................. xix ÖZET ....................................................................................................................... xxi
- INTRODUCTION .............................................................................................. 1 1.1 Literature Review ........................................................................................... 3
- BACKGROUND ................................................................................................. 7 2.1 Basics of Folksonomy and Social Bookmarking Systems ............................. 7
- Folksonomy ............................................................................................ 7 2.1.2 Advantages ............................................................................................ 9 2.1.3 Disadvantages......................................................................................... 10
- Spam ............................................................................................................... 10
- Spam In Folksonomies ................................................................................... 10
- 4 Combating with Spam .................................................................................... 11
- METHODOLOGY ............................................................................................. 13 3.1 Spam Detection Techniques ........................................................................... 14 3.2 Feature Creation ............................................................................................. 14
- 2.1 Time........................................................................................................ 14 3.2.1.1 Sessions............................................................................................ 15 3.2.1.2 Resource bombardment ................................................................... 15
- 2.2 Share ....................................................................................................... 17 3.2.2.1 Share percentage.............................................................................. 17 3.2.2.2 Share bombardment ......................................................................... 18
- 3 Semantic Analysis .......................................................................................... 20 3.3.1 Page Rank ............................................................................................... 21 3.3.2 Trust Rank for bookmarking systems..................................................... 22
- 3.3 Propagation using seed set in Trust Rank technique .............................. 25 3.3.4 Improvements on the technique.............................................................. 26 3.3.4.1 Seed set improvements .................................................................... 26 3.3.4.2 Coverage improvement.................................................................... 27
- EXPERIMENTS ................................................................................................. 31 4.1 Data Set Description....................................................................................... 31
- Yanbe, Y., Jatowt, A., Nakamura, S. and Tanaka, K. (2007). Can social bookmarking enhance search in the web?, Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries, JCDL '07, ACM, New York, NY, USA, pp.107-116.
- Wetzker, R., Zimmermann, C. and Bauckhage, C. (2008). Analyzing Social Bookmarking Systems: A del.icio.us Cookbook, Proceedings of the ECAI 2008 Mining Social Data Workshop, IOS Press, pp.26-30.
- Heymann, P., Koutrika, G. and Garcia-Molina, H. (2007). Fighting Spam on Social Web Sites: A Survey of Approaches and Future Challenges, IEEE Internet Computing, 11(6), 36-45.
- Sung, K.J., Kim, S.C. and Kim, S.K. (2010). Tag quantification for spam detection in social bookmarking system, Advanced Information Management and Service (IMS), 2010 6th International Conference on, pp.297 -303.
- Yang, H.C. and Lee, C.H. (2011). Post-Level Spam Detection for Social Bookmarking Web Sites, Advances in Social Networks Analysis and Mining (ASONAM), 2011 International Conference on, pp.180 -185.
- Graefe, G., Maa, C. and He, A. (2007). Alternative Searching Services: Seven Theses on the Importance of "Social Bookmarking", volumeP-113 of GI-Edition -Lecture Notes in Informatics (LNI), ISSN 1617-5468, Bonner Köllen Verlag.
- Gyöngyi, Z. and Garcia-Molina, H. (2005). Web Spam Taxonomy, AIRWeb, pp.39-47.
- Sureka, A. (2011). Mining User Comment Activity for Detecting Forum Spammers in YouTube, CoRR, abs/1103.5044.
- Markines, B., Cattuto, C. and Menczer, F. (2009). Social spam detection, Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web, AIRWeb '09, ACM, New York, NY, USA, pp.41-48.
- Krause, B., Schimitz, Hotho, A. and Stumme, G. (2008). The Anti-Social Tagger -Detecting Spam in Social Bookmarking Systems, Fourth International Workshop on Adversarial Information Retrieval on the Web.
- Koutrika, G., Effendi, F.A., Gyöngyi, Z., Heymann, P. and Garcia-Molina, H. (2007). Combating spam in tagging systems, Proceedings of the 3rd international workshop on Adversarial information retrieval on the web, AIRWeb '07, ACM, New York, NY, USA, pp.57-64.
- A. Madkour, T. Hefni, A.H. and Refaat, K.S. (2008). Using semantic features to detect spamming in social bookmarking systems, Proceedings of PKDD ECML Discovery Challenge 2008 (RSDC 2008), pp.55-62.
- Bogers, T. and Bosch, A. (2008). Using Language Models for Spam Detection in Social Bookmarking Systems, Proceedings of PKDD ECML Discovery Challenge 2008 (RSDC 2008).
- Kyriakopoulou, A. and Kalamboukis, T. (2008). Combining Clustering with Classification for Spam Detection in Social Bookmarking Systems.
- Delicious Bookmarking Portal, http://www.delicious.com.
- Thomas Vander Wal. (2 February 2007). [Online]. Viewed 2012 March 24, http: //vanderwal.net/folksonomy.html.
- Farooq, U., Kannampallil, T.G., Song, Y., Ganoe, C.H., Carroll, J.M. and Giles, L. (2007). Evaluating tagging behavior in social bookmarking systems: metrics and design heuristics, GROUP '07: Proceedings of the 2007 international ACM conference on Supporting group work, ACM, New York, NY, USA, pp.351-360.
- Cooley, R., Mobasher, B. and Srivastava, J. (1997). Web Mining: Information and Pattern Discovery on the World Wide Web, Tools with Artificial Intelligence, IEEE International Conference on, 0, 0558-567.
- Krestel, R. and Chen, L. (2008). Using co-occurence of tags and resources to identify spammers, ECML PKDD Discovery Challenge.
- 2004). Wisdom of crowds, http://en.wikipedia.org/wiki/The_ Wisdom_of_Crowds.
- 2008). European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, http://www.ecmlpkdd2008. org/.
- 2008). Weka Machine Learning Project, University of Waikato, http://www. cs.waikato.ac.nz/ml/weka/.
- Cheng, J. and Greiner, R. (2001). Learning Bayesian Belief Network Classifiers: Algorithms and System, Proceedings of 14 th Biennial conference of the, pp.141-151.
- Page, L.,Brin, S., Motowani, R.and Winograd, T. (1998). The PageRank citation ranking:bringing order to the web. Technical report, Stanford University. CURRICULUM VITAE Name Surname:Soghra Mehdinejad Gargari Place and Date of Birth: Iran-17/09/1981
- Adress: Mecidiyeköy-Latilokum Sokak-NO 19A -Daire 3 -Yener Apt-Şişli - İstanbul E-Mail: gargari@itu.edu.tr B.Sc.: Computer Engineering/ Central Tehran Azad University M.Sc.: Computer Engineering/Istanbul Technical University Professional Experience and Rewards: List of Publications and Patents: PUBLICATIONS/PRESENTATIONS ON THE THESIS Mehdinejad G.Soghra, Şule GÜNDÜZ-Ö GÜDÜCÜ . 2012: A Novel Framework For Spammer Detection In Social Bookmarking Systems The 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining FOSINT , 26-29 August, 2012, Istanbul, Turkey.