Spam campaign detection, analysis, and investigation
2015, Digital Investigation
https://doi.org/10.1016/J.DIIN.2015.01.006Abstract
Spam has been a major tool for criminals to conduct illegal activities on the Internet, such as stealing sensitive information, selling counterfeit goods, distributing malware, etc. The astronomical amount of spam data has rendered its manual analysis impractical. Moreover, most of the current techniques are either too complex to be applied on a large amount of data or miss the extraction of vital security insights for forensic purposes. In this paper, we elaborate a software framework for spam campaign detection, analysis and investigation. The proposed framework identifies spam campaigns on-the-fly. Additionally, it labels and scores the campaigns as well as gathers various information about them. The elaborated framework provides law enforcement officials with a powerful platform to conduct investigations on cyber-based criminal activities.
References (41)
- Anderson DS, Fleizach C, Savage S, Voelker GM. Spamscatter: character- izing internet scam hosting infrastructure. In: Proceedings of 16th USENIX security symposium, SS'07; 2007. pp. 10:1e10:14.
- Bergholz A, Chang JH, Paaß G, Reichartz F, Strobel S. Improved phishing detection using model-based features. In: CEAS; 2008.
- Bergholz A, De Beer J, Glahn S, Moens M-F, Paaß G, Strobel S. New filtering approaches for phishing email. J Comput Secur 2010;18(1):7e35.
- Broder AZ, Glassman SC, Manasse MS, Zweig G. Syntactic clustering of the web. Comput Netw ISDN Syst 1997;29(8):1157e66.
- Calais P, Pires DE, Neto DOG, Meira Jr W, Hoepers C, Steding-Jessen K. A campaign-based characterization of spamming strategies. In: CEAS; 2008.
- Cheung W, Zaiane OR. Incremental mining of frequent patterns without candidate generation or support constraint. In: Database engineering and applications symposium, 2003. Proceedings. Seventh interna- tional, IEEE; 2003. p. 111e6.
- CodeFlower Source code visualization. http://redotheweb.com/CodeFlower/.
- Daigle L. WHOIS protocol specification. Internet RFC 2004;
- ISSN: 2070-1721.
- Damiani E, di Vimercati SDC, Paraboschi S, Samarati P. An open digest- based technique for spam detection. In: ISCA PDCS; 2004. p. 559e64. Data-Driven Documents, http://d3js.org/.
- Fette I, Sadeh N, Tomasic A. Learning to detect phishing emails. In: Pro- ceedings of the 16th international conference on world wide Web. ACM; 2007. p. 649e56.
- Gao H, Hu J, Wilson C, Li Z, Chen Y, Zhao BY. Detecting and characterizing social spam campaigns. In: Proceedings of the 10th ACM SIGCOMM conference on internet measurement. ACM; 2010. p. 35e47.
- Guerra P, Pires D, Guedes D, Meira Jr W, Hoepers C, Steding-Jessen K. Spam miner: a platform for detecting and characterizing spam cam- paigns. In: Proc. 6th Conf. Email Anti-Spam; 2008.
- Haider P, Scheffer T. Bayesian clustering for email campaign detection. In: Proceedings of the 26th annual international conference on machine learning. ACM; 2009. p. 385e92.
- Han J, Pei J, Yin Y. Mining frequent patterns without candidate generation. ACM SIGMOD Rec 2000;29(2):1e12.
- Han J, Pei J, Yin Y, Mao R. Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 2004;8(1):53e87.
- Harrenstien K, Stahl M, Feinler E. WHOIS protocol specification. Internet RFC 1985;954. ISSN: 2070-1721.
- Heller KA, Ghahramani Z. Bayesian hierarchical clustering. In: Pro- ceedings of the 22nd international conference on machine learning. ACM; 2005. p. 297e304.
- John JP, Moshchuk A, Gribble SD, Krishnamurthy A. Studying spamming botnets using botlab. In: NSDI, vol. 9; 2009. p. 291e306.
- Kai-Sang Leung C. Interactive constrained frequent-pattern mining sys- tem. In: Database engineering and applications symposium, 2004. IDEAS'04. Proceedings. International, IEEE; 2004. p. 49e58.
- Kanich C, Kreibich C, Levchenko K, Enright B, Voelker GM, Paxson V, et al. Spamalytics: an empirical analysis of spam marketing conversion. Commun ACM 2009;52(9):99e107.
- Konte M, Feamster N, Jung J. Dynamics of online scam hosting infra- structure. In: Passive and active network measurement. Springer; 2009. p. 219e28.
- Kornblum J. Identifying almost identical files using context triggered piecewise hashing. Digit Investig 2006;3:91e7.
- Landauer TK, Foltz PW, Laham D. An introduction to latent semantic analysis. Discourse Process 1998;25(2e3):259e84.
- Lau JH, Grieser K, Newman D, Baldwin T. Automatic labelling of topic models. ACL 2011;2011:1536e45.
- Leung CK-S, Khan QI, Li Z, Hoque T. Cantree: a canonical-order tree for incremental frequent-pattern mining. Knowl Inform. Syst 2007;11(3): 287e311.
- Li F, Hsieh M-H. An empirical study of clustering behavior of spammers and group-based anti-spam strategies. In: CEAS; 2006.
- Milne D, Witten IH. An open-source toolkit for mining wikipedia. Artifi- cial Intelligence; 2013.
- Moore T, Clayton R, Stern H. Temporal correlations between spam and phishing websites. In: Proc. of 2nd USENIX LEET; 2009.
- Ong K-L, Ng W-K, Lim E-P. Fssm: fast construction of the optimized segment support map. In: Data warehousing and knowledge dis- covery. Springer; 2003. p. 257e66.
- OrientDB, http://www.orientdb.org/, last accessed in August 2013.
- Pathak A, Qian F, Hu YC, Mao ZM, Ranjan S. Botnet spam campaigns can be long lasting: evidence, implications, and analysis. In: Proceedings of the eleventh international joint conference on measurement and modeling of computer systems. ACM; 2009. p. 13e24.
- Pitsillidis A, Levchenko K, Kreibich C, Kanich C, Voelker GM, Paxson V, et al. Botnet judo: fighting spam with itself. In: NDSS; 2010.
- Pitsillidis A, Kanich C, Voelker GM, Levchenko K, Savage S. Taster's choice: a comparative analysis of spam feeds. In: Proceedings of the 2012 ACM conference on internet measurement conference, IMC'12; 2012. p. 427e40.
- Qian F, Pathak A, Hu YC, Mao ZM, Xie Y. A case for unsupervised-learning- based spam filtering. ACM SIGMETRICS Perform Eval Rev 2010;38(1): 367e8. spamsum, http://www.samba.org/ftp/unpacked/junkcode/spamsum/ README, (last accessed in August 2013).
- Stringhini G, Holz T, Stone-Gross B, Kruegel C, Vigna G. Botmagnifier: locating spambots on the internet. In: USENIX security symposium; 2011. Symantec Intelligence Report. Report. May 2013. http://www.symantec. com/content/en/us/enterprise/other_resources/b-intelligence_ report_05-2013.en-us.pdf.
- Thonnard O, Dacier M. A strategic analysis of spam botnets opera- tions, in: proceedings of the 8th annual Collaboration. In: Elec- tronic messaging, anti-abuse and spam conference. ACM; 2011. p. 162e71.
- Wei C, Sprague A, Warner G, Skjellum A. Mining spam email to identify common origins for forensic application. In: Proceedings of the 2008 ACM symposium on applied computing. ACM; 2008. p. 1433e7.
- Wei C, Sprague A, Warner G, Skjellum A. Characterization of spam advertised website hosting strategy. In: Sixth conference on email and anti-spam, Mountain View, CA; 2009.
- Xie Y, Yu F, Achan K, Panigrahy R, Hulten G, Osipkov I. Spamming botnets: signatures and characteristics. ACM SIGCOMM Comput Commun Rev 2008;38(4):171e82.
- Zhuang L, Dunagan J, Simon DR, Wang HJ, Osipkov I, Tygar JD. Charac- terizing botnets from email spam records. LEET 2008;8:1e9.