Extract Me If You Can: Abusing PDF Parsers in Malware Detectors
https://doi.org/10.14722/NDSS.2016.23483Abstract
—Owing to the popularity of the PDF format and the continued exploitation of Adobe Reader, the detection of malicious PDFs remains a concern. All existing detection techniques rely on the PDF parser to a certain extent, while the complexity of the PDF format leaves an abundant space for parser confusion. To quantify the difference between these parsers and Adobe Reader, we create a reference JavaScript extractor by directly tapping into Adobe Reader at locations identified through a mostly automatic binary analysis technique. By comparing the output of this reference extractor against that of several open-source JavaScript extractors on a large data set obtained from VirusTotal, we are able to identify hundreds of samples which existing extractors fail to extract JavaScript from. By analyzing these samples we are able to identify several weaknesses in each of these extractors. Based on these lessons, we apply several obfuscations on a malicious PDF sample, which can successfully evade all the malware detectors tested. We call this evasion technique a PDF parser confusion attack. Lastly, we demonstrate that the reference JavaScript extractor improves the accuracy of existing JavaScript-based classifiers and how it can be used to mitigate these parser limitations in a real-world setting.
FAQs
AI
What are the main findings regarding JavaScript extraction from malicious PDFs?
The study reveals that existing extractors miss JavaScript in 22.47% of malicious samples, indicating significant detection shortcomings.
How does the reference JavaScript extractor improve detection rates?
Using the reference extractor, detection rates for PJScan increase from 68% to 96% on the tested sample set.
What is the impact of parser confusion attacks on PDF detectors?
Parser confusion attacks successfully evade all tested malware detectors, including signature-based and JavaScript-based systems.
What methodologies were used to evaluate existing PDF extractors?
The research compared the performance of multiple JavaScript extractors against 163,306 PDFs, highlighting discrepancies in extraction capabilities.
What limitations does the reference JavaScript extractor face?
The reference extractor only captures JavaScript automatically executed by Adobe Reader, potentially missing delayed or user-interactive scripts.
References (35)
- "Adobe acrobat reader : Security vulnerabilities published in 2014," http://www.cvedetails.com/vulnerability-list/vendor id-53/product id- 497/year-2014/Adobe-Acrobat-Reader.html.
- "Adobe acrobat reader : Security vulnerabilities published in 2015," http://www.cvedetails.com/vulnerability-list/vendor id-53/product id- 497/year-2015/Adobe-Acrobat-Reader.html.
- "CVE-2013-3346 Adobe Reader ToolButton Use After Free," http://www.rapid7.com/db/modules/exploit/windows/browser/adobe toolbutton.
- "difflib -helpers for computing deltas," https://docs.python.org/2/ library/difflib.html.
- "Introducing adobe reader protected mode," http://blogs.adobe.com/ security/2010/07/introducing-adobe-reader-protected-mode.html.
- "PDFrate A machine learning based classifier operating on document metadata and structure," http://pdfrate.com/.
- "PyPDF2," https://github.com/mstamy2/PyPDF2.
- "Vulnerability Details : CVE-2013-3346," http://www.cvedetails.com/ cve/2013-3346.
- J. Berkenbilt, "QPDF: A Content-Preserving PDF Transformation Sys- tem," http://qpdf.sourceforge.net/.
- G. Delugré, "The undocumented password validation algorithm of adobe reader x," http://esec-lab.sogeti.com/post/The-undocumented- password-validation-algorithm-of-Adobe-Reader-X.
- B. Dolan-Gavitt, T. Leek, J. Hodosh, and W. Lee, "Tappan zee (north) bridge: mining memory accesses for introspection," in Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security. ACM, 2013, pp. 839-850.
- P. Fogla and W. Lee, "Evading network anomaly detection systems: Formal reasoning and practical techniques," in Proceedings of the 13th ACM Conference on Computer and Communications Security, 2006.
- P. Fogla, M. Sharif, R. Perdisci, O. Kolesnikov, and W. Lee, "Poly- morphic blending attacks," in Proceedings of the 15th Conference on USENIX Security Symposium -Volume 15, 2006.
- M. Garnaeva, V. Chebyshev, D. Makrushin, R. Unuchek, and A. Ivanov, "Kaspersky security bulletin 2014." http: //securelist.com/analysis/kaspersky-security-bulletin/68010/kaspersky- security-bulletin-2014-overall-statistics-for-2014/.
- D. Goodin, "It's official: Adobe reader is world's most-exploited app," http://www.theregister.co.uk/2010/03/09/adobe reader attacks/, 2010.
- A. Henderson, A. Prakash, L. K. Yan, X. Hu, X. Wang, R. Zhou, and H. Yin, "Make it work, make it right, make it fast: Building a platform- neutral whole-system dynamic binary analysis platform," in Proceedings of the 2014 International Symposium on Software Testing and Analysis, ser. ISSTA 2014. New York, NY, USA: ACM, 2014, pp. 248-258. [Online]. Available: http://doi.acm.org/10.1145/2610384.2610407
- G. Hunt and D. Brubacher, "Detours: Binary interception of win32 functions," in Third USENIX Windows NT Symposium. USENIX, July 1999, p. 8. [Online]. Available: http://research.microsoft.com/ apps/pubs/default.aspx?id=68568
- S. Jana and V. Shmatikov, "Abusing file processing in malware detectors for fun and profit," in Proceedings of the 2012 IEEE Symposium on Security and Privacy, ser. SP '12. Washington, DC, USA: IEEE Computer Society, 2012, pp. 80-94. [Online]. Available: http://dx.doi.org/10.1109/SP.2012.15
- P. Laskov and N. Šrndić, "Static Detection of Malicious JavaScript- bearing PDF Documents," in Proceedings of the 27th Annual Computer Security Applications Conference, ser. ACSAC '11. New York, NY, USA: ACM, 2011, pp. 373-382. [Online]. Available: http://doi.acm.org/10.1145/2076732.2076785
- M. Lee, "GNU PDF project leaves FSF High Priority Projects list; mis- sion complete!" https://www.fsf.org/blogs/community/gnu-pdf-project- leaves-high-priority-projects-list-mission-complete.
- D. Liu, H. Wang, and A. Stavrou, "Detecting Malicious Javascript in PDF Through Document Instrumentation," in Proceedings of the 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, ser. DSN '14. Washington, DC, USA: IEEE Computer Society, 2014, pp. 100-111. [Online]. Available: http://dx.doi.org/10.1109/DSN.2014.92
- X. Lu, J. Zhuge, R. Wang, Y. Cao, and Y. Chen, "De-obfuscation and Detection of Malicious PDF Files with High Accuracy," in 46th Hawaii International Conference on System Sciences, HICSS 2013, Wailea, HI, USA, January 7-10, 2013, 2013, pp. 4890-4899. [Online]. Available: http://dx.doi.org/10.1109/HICSS.2013.166
- C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood, "Pin: Building customized program analysis tools with dynamic instrumentation," in Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2005.
- D. Maiorca, I. Corona, and G. Giacinto, "Looking at the Bag is Not Enough to Find the Bomb: An Evasion of Structural Methods for Malicious PDF Files Detection," in Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security, ser. ASIA CCS '13. New York, NY, USA: ACM, 2013, pp. 119-130. [Online]. Available: http: //doi.acm.org/10.1145/2484313.2484327
- D. Maiorca, G. Giacinto, and I. Corona, "A Pattern Recognition System for Malicious PDF Files Detection," in Proceedings of the 8th International Conference on Machine Learning and Data Mining in Pattern Recognition, ser. MLDM'12. Berlin, Heidelberg: Springer-Verlag, 2012, pp. 510-524. [Online]. Available: http: //dx.doi.org/10.1007/978-3-642-31537-4 40
- P. Ratanaworabhan, B. Livshits, and B. Zorn, "Nozzle: A defense against heap-spraying code injection attacks," in Proceedings of the Usenix Security Symposium. USENIX, 2009. [Online]. Available: http://research.microsoft.com/apps/pubs/default.aspx?id=81085
- M. Z. Shafiq, S. A. Khayam, and M. Farooq, "Embedded malware de- tection using markov n-grams," in Proceedings of the 5th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, 2008.
- C. Smutz and A. Stavrou, "Malicious PDF Detection Using Metadata and Structural Features," in Proceedings of the 28th Annual Computer Security Applications Conference, ser. ACSAC '12. New York, NY, USA: ACM, 2012, pp. 239-248. [Online]. Available: http://doi.acm.org/10.1145/2420950.2420987
- K. Snow, S. Krishnan, F. Monrose, and N. Provos, "Shellos: Enabling fast detection and forensic analysis of code injection attacks," in USENIX Security Symposium, 2011. [Online]. Available: http://static.usenix.org/events/sec11/tech/full papers/Snow.pdf
- Y. Song, M. E. Locasto, A. Stavrou, A. D. Keromytis, and S. J. Stolfo, "On the infeasibility of modeling polymorphic shellcode," in Proceed- ings of the 14th ACM Conference on Computer and Communications Security, 2007.
- D. Stevens, "PDF Tools," http://blog.didierstevens.com/programs/pdf- tools/.
- --, "PDFiD On VirusTotal," http://blog.didierstevens.com/2009/04/ 21/pdfid-on-virustotal/.
- Z. Tzermias, G. Sykiotakis, M. Polychronakis, and E. P. Markatos, "Combining static and dynamic analysis for the detection of malicious documents," in Proceedings of the Fourth European Workshop on System Security, ser. EUROSEC '11. New York, NY, USA: ACM, 2011, pp. 4:1-4:6. [Online]. Available: http: //doi.acm.org/10.1145/1972551.1972555
- N. Šrndić and P. Laskov, "Detection of malicious PDF files based on hierarchical document structure," in In Proceedings of the Network and Distributed System Security Symposium, NDSS 2013. The Internet Society, 2013.
- N. Šrndic and P. Laskov, "Practical evasion of a learning-based classifier: A case study," in Proceedings of the 2014 IEEE Symposium on Security and Privacy, ser. SP '14. Washington, DC, USA: IEEE Computer Society, 2014, pp. 197-211. [Online]. Available: http://dx.doi.org/10.1109/SP.2014.20