Spam campaign detection, analysis, and investigation

sơn đinh

doi:10.1016/J.DIIN.2015.01.006

Outline

Spam campaign detection, analysis, and investigation

sơn đinh

2015, Digital Investigation

https://doi.org/10.1016/J.DIIN.2015.01.006

visibility

…

description

10 pages

link

1 file

Abstract

Spam has been a major tool for criminals to conduct illegal activities on the Internet, such as stealing sensitive information, selling counterfeit goods, distributing malware, etc. The astronomical amount of spam data has rendered its manual analysis impractical. Moreover, most of the current techniques are either too complex to be applied on a large amount of data or miss the extraction of vital security insights for forensic purposes. In this paper, we elaborate a software framework for spam campaign detection, analysis and investigation. The proposed framework identifies spam campaigns on-the-fly. Additionally, it labels and scores the campaigns as well as gathers various information about them. The elaborated framework provides law enforcement officials with a powerful platform to conduct investigations on cyber-based criminal activities.

References (41)

Anderson DS, Fleizach C, Savage S, Voelker GM. Spamscatter: character- izing internet scam hosting infrastructure. In: Proceedings of 16th USENIX security symposium, SS'07; 2007. pp. 10:1e10:14.
Bergholz A, Chang JH, Paaß G, Reichartz F, Strobel S. Improved phishing detection using model-based features. In: CEAS; 2008.
Bergholz A, De Beer J, Glahn S, Moens M-F, Paaß G, Strobel S. New filtering approaches for phishing email. J Comput Secur 2010;18(1):7e35.
Broder AZ, Glassman SC, Manasse MS, Zweig G. Syntactic clustering of the web. Comput Netw ISDN Syst 1997;29(8):1157e66.
Calais P, Pires DE, Neto DOG, Meira Jr W, Hoepers C, Steding-Jessen K. A campaign-based characterization of spamming strategies. In: CEAS; 2008.
Cheung W, Zaiane OR. Incremental mining of frequent patterns without candidate generation or support constraint. In: Database engineering and applications symposium, 2003. Proceedings. Seventh interna- tional, IEEE; 2003. p. 111e6.
CodeFlower Source code visualization. http://redotheweb.com/CodeFlower/.
Daigle L. WHOIS protocol specification. Internet RFC 2004;
ISSN: 2070-1721.
Damiani E, di Vimercati SDC, Paraboschi S, Samarati P. An open digest- based technique for spam detection. In: ISCA PDCS; 2004. p. 559e64. Data-Driven Documents, http://d3js.org/.
Fette I, Sadeh N, Tomasic A. Learning to detect phishing emails. In: Pro- ceedings of the 16th international conference on world wide Web. ACM; 2007. p. 649e56.
Gao H, Hu J, Wilson C, Li Z, Chen Y, Zhao BY. Detecting and characterizing social spam campaigns. In: Proceedings of the 10th ACM SIGCOMM conference on internet measurement. ACM; 2010. p. 35e47.
Guerra P, Pires D, Guedes D, Meira Jr W, Hoepers C, Steding-Jessen K. Spam miner: a platform for detecting and characterizing spam cam- paigns. In: Proc. 6th Conf. Email Anti-Spam; 2008.
Haider P, Scheffer T. Bayesian clustering for email campaign detection. In: Proceedings of the 26th annual international conference on machine learning. ACM; 2009. p. 385e92.
Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. ACM SIGMOD Rec 2000;29(2):1e12.
Han J, Pei J, Yin Y, Mao R. Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 2004;8(1):53e87.
Harrenstien K, Stahl M, Feinler E. WHOIS protocol specification. Internet RFC 1985;954. ISSN: 2070-1721.
Heller KA, Ghahramani Z. Bayesian hierarchical clustering. In: Pro- ceedings of the 22nd international conference on machine learning. ACM; 2005. p. 297e304.
John JP, Moshchuk A, Gribble SD, Krishnamurthy A. Studying spamming botnets using botlab. In: NSDI, vol. 9; 2009. p. 291e306.
Kai-Sang Leung C. Interactive constrained frequent-pattern mining sys- tem. In: Database engineering and applications symposium, 2004. IDEAS'04. Proceedings. International, IEEE; 2004. p. 49e58.
Kanich C, Kreibich C, Levchenko K, Enright B, Voelker GM, Paxson V, et al. Spamalytics: an empirical analysis of spam marketing conversion. Commun ACM 2009;52(9):99e107.
Konte M, Feamster N, Jung J. Dynamics of online scam hosting infra- structure. In: Passive and active network measurement. Springer; 2009. p. 219e28.
Kornblum J. Identifying almost identical files using context triggered piecewise hashing. Digit Investig 2006;3:91e7.
Landauer TK, Foltz PW, Laham D. An introduction to latent semantic analysis. Discourse Process 1998;25(2e3):259e84.
Lau JH, Grieser K, Newman D, Baldwin T. Automatic labelling of topic models. ACL 2011;2011:1536e45.
Leung CK-S, Khan QI, Li Z, Hoque T. Cantree: a canonical-order tree for incremental frequent-pattern mining. Knowl Inform. Syst 2007;11(3): 287e311.
Li F, Hsieh M-H. An empirical study of clustering behavior of spammers and group-based anti-spam strategies. In: CEAS; 2006.
Milne D, Witten IH. An open-source toolkit for mining wikipedia. Artifi- cial Intelligence; 2013.
Moore T, Clayton R, Stern H. Temporal correlations between spam and phishing websites. In: Proc. of 2nd USENIX LEET; 2009.
Ong K-L, Ng W-K, Lim E-P. Fssm: fast construction of the optimized segment support map. In: Data warehousing and knowledge dis- covery. Springer; 2003. p. 257e66.
OrientDB, http://www.orientdb.org/, last accessed in August 2013.
Pathak A, Qian F, Hu YC, Mao ZM, Ranjan S. Botnet spam campaigns can be long lasting: evidence, implications, and analysis. In: Proceedings of the eleventh international joint conference on measurement and modeling of computer systems. ACM; 2009. p. 13e24.
Pitsillidis A, Levchenko K, Kreibich C, Kanich C, Voelker GM, Paxson V, et al. Botnet judo: fighting spam with itself. In: NDSS; 2010.
Pitsillidis A, Kanich C, Voelker GM, Levchenko K, Savage S. Taster's choice: a comparative analysis of spam feeds. In: Proceedings of the 2012 ACM conference on internet measurement conference, IMC'12; 2012. p. 427e40.
Qian F, Pathak A, Hu YC, Mao ZM, Xie Y. A case for unsupervised-learning- based spam filtering. ACM SIGMETRICS Perform Eval Rev 2010;38(1): 367e8. spamsum, http://www.samba.org/ftp/unpacked/junkcode/spamsum/ README, (last accessed in August 2013).
Stringhini G, Holz T, Stone-Gross B, Kruegel C, Vigna G. Botmagnifier: locating spambots on the internet. In: USENIX security symposium; 2011. Symantec Intelligence Report. Report. May 2013. http://www.symantec. com/content/en/us/enterprise/other_resources/b-intelligence_ report_05-2013.en-us.pdf.
Thonnard O, Dacier M. A strategic analysis of spam botnets opera- tions, in: proceedings of the 8th annual Collaboration. In: Elec- tronic messaging, anti-abuse and spam conference. ACM; 2011. p. 162e71.
Wei C, Sprague A, Warner G, Skjellum A. Mining spam email to identify common origins for forensic application. In: Proceedings of the 2008 ACM symposium on applied computing. ACM; 2008. p. 1433e7.
Wei C, Sprague A, Warner G, Skjellum A. Characterization of spam advertised website hosting strategy. In: Sixth conference on email and anti-spam, Mountain View, CA; 2009.
Xie Y, Yu F, Achan K, Panigrahy R, Hulten G, Osipkov I. Spamming botnets: signatures and characteristics. ACM SIGCOMM Comput Commun Rev 2008;38(4):171e82.
Zhuang L, Dunagan J, Simon DR, Wang HJ, Osipkov I, Tygar JD. Charac- terizing botnets from email spam records. LEET 2008;8:1e9.

Email databases are continually being updated with the inclusion of active email addresses collected from different sources by hackers and spammers for their illicit purpose, including spamming and sharing. The presence of multiple valid email addresses in the headers of chain and multi-recipient email messages increases the chances of successful harvesting. This paper investigates and exposes a botbased technique for email address harvesting from email messages, including chain email messages and emails sent to multiple recipients. Experimentation results demonstrate the designed Bot's effectiveness in misusing technologies to collect email addresses from the header and body of email messages. Also, the experimented method and user studies demonstrated the XOAuth authentication mechanism's ine ciency in blocking mailbox access and email address harvesting. The comprehensive illustration of the design shall be bene cial to design techniques for detecting and mitigating such bots. The paper also suggests a few mechanisms that can be put in place to prevent this type of email address harvesting signi cantly and also designs a mitigation method to detect and mitigate the designed Bots of this nature. I. INTRODUCTION Without a database of active email addresses, it is not possible for spammers to send spam effectively. Without a database of active email addresses, it is not possible for spammers to effectively send spam messages and pro t from the process. Spammers are always on a hunt for new and active email addresses for spamming [1], phishing [2], [3], whaling [4], spoo ng [5], cyberbullying [4], hoaxes [6], etc. Spamming and phishing attacks are also on the rise in online social networks. The use of new techniques for spamming requires the development of novel procedures to check their spread [7]. A consequence of spamming, i.e., phishing which poses a serious risk for consumers and businesses, has been or rise and requires contemporary measures for its control, such as ltering [8]. They harvest email addresses using different methods, which include mailing lists on sale, social networking sites, group discussion forums, web crawlers, directory harvest attacks, hacking into sites, intercepted user requests to unsubscribe, guessing and cleaning, and free offerings. (http://www.private.org.il/harvest.html). Hiding email addresses is no longer a viable solution to the problem as sharing email addresses with friends, family, customers, business partners, and online service providers are inevitable as it has evolved as a standard mode of communication on the Internet. Moreover, providing an active email to highly in-use social and ecommerce websites has become compulsory. These email addresses and user information are stored in diverse formats, mostly underlying databases of service providers. If hacked by spammers besides others, this database puts users at the risk of spamming. Also, businesses and individuals publish their email addresses on their websites and blogs for their customers to communicate with them. Bots [11], [12] originated for useful purposes, such as web crawlers to perform some prede ned functions automatically [13], but their use for numerous malicious purposes has been forged for a long. The illicit use includes spreading spam email messages which are considered one of the major threats to the security and privacy of the Internet [14], [15], [16]. A bot is propagated into the vulnerable systems to gain

Spam campaign detection, analysis, and investigation

Sign up for access to the world's latest research

Abstract

Related papers

References (41)

Related papers

Related topics

Cited by