Academia.eduAcademia.edu

Outline

Safety in Numbers

2010

https://doi.org/10.21236/ADA532995

Abstract

Using large-scale distributed resources can help find vulnerabilities and malicious code. This project studied the feasibility of distributing two kinds of static analyses of machine code across large-scale donated computational cycles: conventional static analyses for finding bugs and vulnerabilities, and concolic execution to find test cases that trigger rare, possibly maliciously hidden, code paths. We demonstrated that concolic execution is particularly suited to large-scale distributed execution since its core computational loop is very parallelizable and communication costs are small. We assessed a large number of possible parallel architectures and experimented in depth with three. In the process of expanding and scaling our concolic engine for this application, we also devised a means to ‗fuzz' its semantic representation of machine code and so were able to demonstrate a general technique for validating abstract representations of machine code semantics. Dr. David Melski was co-PI of the project. He is the vice-president of research at GrammaTech and has overseen many successful projects. Dr. Melski graduated summa cum laude from the University of Wisconsin in 1994 with a B.S. in Computer Sciences and Russian Studies. He received his Ph.D. in Computer Sciences from the University of Wisconsin in 2002, where his research interests included static analysis, profiling, and profile-directed optimization. His publications include . Dr. David Cok, hired into GrammaTech in January, collaborated in directing the scientific direction and implementation work of the project. He is a technical expert and leader in static analysis, software development, computational science and digital imaging. He has been a major contributor to JML and Esc/Java2 for Java applications; his particular interests are the usability and application of static analysis and verification technology for industrial-scale software development. Prior to GrammaTech, Dr. Cok was a Research Fellow in the Kodak Research Laboratory and held technical leadership and managerial positions. He received a Ph.D. in Physics from Harvard University in 1980. Dr. Denis Gopan is an expert on concolic processing and will contribute to the analysis and rearchitecting of the concolic engine. He received a B.Sc. in Computer Science from the University of Wisconsin-Milwaukee in 1996 and a Ph.D. in Computer Science from the University of Wisconsin-Madison in 2007. His research interests are static program analysis and verification. While at Wisconsin, Dr. Gopan was awarded the CISCO fellowship for two consecutive years. Mr. John Phillips has many years of experience in parallel computing, high performance computing, and C++, including more than nine years of involvement in the Boost collaboration. Prior to joining GrammaTech in January 2010, he was a member of the faculty at Capital University in Mathematics, Computer Science, and Physics and completed several NSF and DoE-funded projects in Astrophysics and Computational Science.

References (107)

  1. Fuzz Testing ............................................................................................................ 81
  2. 2.5 Tests ........................................................................................................................ 82
  3. 2.6 Limitations .............................................................................................................. 82 10.3 Experimental Results .................................................................................................. 83 10.4 Abstract Analysis ....................................................................................................... 83 10.5 Test Infrastructure ...................................................................................................... 84 10.6 Testability ................................................................................................................... 84 10.7 Simple Value Set Analysis ......................................................................................... 84 10.8 Simple VSA Results ................................................................................................... 85
  4. Using the cloud to find malicious code ................................................................................. 86 11.1 Introduction ................................................................................................................ 86 11.2 Benefits of distributed computation ........................................................................... 87 11.3 Requirements .............................................................................................................. 88 11.4 Technical and Social Challenges ................................................................................ 89
  5. 4.1 Algorithmic Requirements ...................................................................................... 90
  6. 4.2 Security Concerns ................................................................................................... 91
  7. 4.3 Community Creation and Coherence ...................................................................... 93 11.5 Conclusions ................................................................................................................ 94
  8. Section III: Concluding Material .............................................................................................. 95
  9. Related Research ................................................................................................................... 95 12.1 Static and Dynamic Analysis ..................................................................................... 95 12.2 Mining Temporal Specifications ................................................................................ 96
  10. 2.1 Mining Dynamic Information ................................................................................. 96
  11. 2.2 Mining Static Information....................................................................................... 96 12.3 Passive Feedback for Cooperative Bug Isolation (CBI)............................................. 96 12.4 Efforts to Harness Donated Computation Cycles ....................................................... 97
  12. Conclusions and Recommendations ..................................................................................... 98
  13. References ........................................................................................................................... 100 14 References
  14. Christodorescu,M., Jha,S., Maughan,D., Song,D.X., and Wang,C., eds. Malware Detection. Advances in Information Security, ed. Jajodia,S. Vol. 27. 2007, Springer: New York City, NY. 312.
  15. Mission Impact of Foreign Influence on DoD Software, 2007, The Defense Science Board.
  16. BOINC: Open-Source Software for Volunteer Computing and Grid Computing, http://boinc.berkeley.edu/.
  17. Software Testing with Holodeck using Fault Injection, http://www.securityinnovation.com/holodeck/index.shtml.
  18. World Community Grid: Technology Solving Problems, http://www.worldcommunitygrid.org/index.jsp.
  19. Ammons,G., Bodík,R., and Larus,J.R., Mining Specifications. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). 2002. Portland, OR: ACM. pp. 4-16.
  20. Anderson,D.P., BOINC: A System for Public-Resource Computing and Storage. In Workshop on Grid Computing. 2004. Pittsburg, USA.
  21. Anderson,D.P. and Fedak,G., The Computational and Storage Potential of Volunteer Computing. In IEEE/ACM International Symposium on Cluster Computing and the Grid. 2006. Singapore.
  22. Anderson,P., Reps,T., and Teitelbaum,T., Design and Implementation of a Fine-Grained Software Inspection Tool. IEEE Transactions on Software Engineering (TSE), 2003. 29(8): pp. 721-733.
  23. Anonymous, The Folding@Home Project, http://folding.stanford.edu/.
  24. Bala,V., Duesterwald,E., and Banerjia,S., Dynamo: A Transparent Dynamic Optimization System. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 2000. Vancouver, British Columbia, Canada: ACM. pp. 1-12.
  25. Balakrishnan,G., Gruian,R., Reps,T., and Teitelbaum,T., CodeSurfer/x86 --A Platform for Analyzing x86 Executables. In International Conference on Compiler Construction (CC). 2005. Edinburgh, UK: Springer. pp. 250-254.
  26. Balakrishnan,G. and Reps,T., Analyzing Memory Accesses in x86 Executables. In International Conference on Compiler Construction (CC). 2004. Barcelona, Spain: Springer Verlag. pp. 5-23.
  27. Balakrishnan,G. and Reps,T., DIVINE: DIscovering Variables IN Executables. In International Conference on Verification, Model Checking, and Abstract Interpretation (VMCAI). 2007. Nice, France: Springer. pp. 1-28.
  28. Balakrishnan,G. and Reps,T., Analyzing Stripped Device-Driver Executables. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS). 2008. Budapest, Hungary: Springer. pp. 124-140.
  29. Balakrishnan,G., Reps,T., Kidd,N., Lal,A., Lim,J., Melski,D., Gruian,R., Yong,S.H., Chen,C.-H., and Teitelbaum,T., Model Checking x86 Executables with CodeSurfer/x86 and WPDS++, (tool-demonstration paper). In International Conference on Computer Aided Verification (CAV). 2005. Edinburgh, Scotland: Springer. pp. 158-163.
  30. Balakrishnan,G., Reps,T., Melski,D., and Teitelbaum,T., WYSINWYX: What You See Is Not What You eXecute. In IFIP Working Conference on Verified Software: Theories, Tools, Experiments (VSTTE). 2005. Zurich, Switzerland: Springer.
  31. Ball,T. and Rajamani,S.K., The SLAM Toolkit. In International Conference on Computer Aided Verification (CAV). 2001. Paris, France: Springer Verlag. pp. 260-264.
  32. Ball,T. and Rajamani,S.K., The Slam Project: Debugging System Software via Static Analysis, http://research.microsoft.com/slam/papers/popl02.pdf.
  33. Beckman,N.E., Nori,A.V., Rajamani,S.K., and Simmons,R.J., Proofs from Tests. In ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). 2008. Seattle, WA: ACM. pp. 3-14.
  34. Bishop,M. and Dilger,M., Checking for Race Conditions in File Accesses. Computing Systems, 1996. 2(2): pp. 131-152.
  35. Bowring,J.F., Harrold,M.J., and Rehg,J.M., Improving the Classification of Software Behaviors using Ensembles of ControlFlow and DataFlow Classifiers. 2005, College of Computing, Georgia Institute of Technology, Atlanta, GA GIT-CERCS-05-10.
  36. Bruening,D., Duesterwald,E., and Amarasinghe,S., Design and Implementation of a Dynamic Optimization Framework for Windows. In ACM Workshop on Feedback-Directed and Dynamic Optimization (FDDO). 2001. Austin, TX.
  37. Bush,W.R., Pincus,J.D., and Sielaff,D.J., A Static Analyzer for Finding Dynamic Programming Errors. Software -Practice and Experience (SPE), 2000. 30(7): pp. 775-802.
  38. Chen,H., Dean,D., and Wagner,D., Model Checking One Million Lines of C Code. In Symposium on Network and Distributed System Security (NDSS). 2004. San Diego, CA: The Internet Society. pp. 171-185.
  39. Chen,H. and Wagner,D., MOPS: An Infrastructure for Examining Security Properties of Software. In ACM Conference on Computer and Communications Security (CCS). 2002. Washington, DC: ACM. pp. 235- 244.
  40. Chess,B., Improving Computer Security Using Extended Static Checking. In IEEE Symposium on Security and Privacy. 2002. Oakland, CA: IEEE Computer Society. p. 160.
  41. Choi,J.-D., Lee,K., Loginov,A., O'Callahan,R., Sarkar,V., and Sridharan,M., Efficient and Precise Datarace Detection for Multithreaded Object-Oriented Programs. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 2002.
  42. Christodorescu,M. and Jha,S., Static Analysis of Executables to Detect Malicious Patterns. In USENIX Security Symposium. 2003. Washington, DC: USENIX Association. pp. 169-186.
  43. Clarke,E.M., Fujita,M., Rajan,S.P., Reps,T., Shankar,S., and Teitelbaum,T., Program Slicing for Design Automation: An Automatic Technique for Speeding-up Hardware Design, Simulation, Testing, and Verification. In Conference on Correct Hardware Design and Verification Methods (CHARME). 1999. Bad Herrenalb, Germany: Springer. pp. 298-312.
  44. Cook,J.E. and Wolf,A.L., Discovering Models of Software Processes from Event-Based Data. ACM Transactions on Software Engineering and Methodology (TOSEM), 1998. 7(3): pp. 215-249.
  45. Corbett,J.C., Dwyer,M.B., Hatcliff,J., Pasareanu,C.S., Robby, Laubach,S., and Zheng,H., Bandera : Extracting Finite-state Models from Java Source Code. In International Conference on Software Engineering (ICSE). 2000. Limerick, Ireland: ACM. pp. 439-448.
  46. Cousot,P. and Cousot,R., Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs by Construction of Approximation of Fixpoints. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). 1977. Los Angeles, CA: ACM. pp. 238-252.
  47. Cowan,C., Barringer,M., Beattie,S., and Kroah-Hartman,G., FormatGuard: Automatic Protection From printf Format String Vulnerabilities. In USENIX Security Symposium. 2001. Washington, DC: USENIX Association. pp. 191-200.
  48. Cowan,C., Beattie,S., Wright,C., and Kroah-Hartman,G., RaceGuard: Kernel Protection from Temporary File Race Vulnerabilities. In USENIX Security Symposium. 2001. Washington, DC: USENIX Association. pp. 165-176.
  49. Cowan,C., Pu,C., Maier,D., Hinton,H., Walpole,J., Bakke,P., Beattie,S., Grier,A., Wagle,P., and Zhang,Q., StackGuard: Automatic Adaptive Detection and Prevention of Buffer-Overflow Attacks. In USENIX Security Symposium. 1998. San Antonio, TX: USENIX Association. pp. 63-78.
  50. Daily, M., IBM, Harvard Want Your PC for Solar Power Study.
  51. A.D.
  52. Das,M., Lerner,S., and Seigle,M., ESP: Path-Sensitive Program Verification in Polynomial Time. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 2002. Berlin, Germany: ACM. pp. 57-68.
  53. Duesterwald,E., Grupta,R., and Soffa,M.L., Demand-Driven Computation of Interprocedural Data Flow. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). 1995. San Francisco, CA: ACM Press. pp. 37-48.
  54. El-Ramly,M., Stroulia,E., and Sorenson,P., From Run-Time Behavior to Usage Scenarios: An Interaction- Pattern Mining Approach. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 2002. Edmonton, Alberta, Canada: ACM Press. pp. 315-324.
  55. Engler,D.R. and Ashcraft,K., RacerX: Effective, Static Detection of Race Conditions and Deadlocks. In ACM Symposium on Operating Systems Principles (SOSP). 2003. Bolton Landing, NY: ACM Press. pp. 237-252.
  56. Engler,D.R., Chelf,B., Chou,A., and Hallem,S., Checking System Rules Using System-Specific, Programmer-Written Compiler Extensions. In Symposium on Operating System Design and Implementation (OSDI). 2000. San Diego, CA: USENIX Association. pp. 1-16.
  57. Engler,D.R., Chen,D.Y., and Chou,A., Bugs as Inconsistent Behavior: A General Approach to Inferring Errors in Systems Code. In ACM Symposium on Operating Systems Principles (SOSP). 2001. Banff, Alberta, Canada: ACM. pp. 57-72.
  58. Ernst,M.D., Cockrell,J., Griswold,W.G., and Notkin,D., Dynamically Discovering Likely Program Invariants to Support Program Evolution. In International Conference on Software Engineering (ICSE). 1999. Los Angeles, CA: ACM. pp. 213-224.
  59. Festa,P., RSA: 56-bit Crypto Too Weak. in CNET News. October 23, 1997.
  60. Ganapathy,V., Jha,S., Chandler,D., Melski,D., and Vitek,D., Buffer Overrun Detection using Linear Programming and Static Analysis. In ACM Conference on Computer and Communications Security (CCS). 2003. Washington, DC: ACM Press. pp. 345-354.
  61. Godefroid,P., Klarlund,N., and Sen,K., DART: Directed Automated Random Testing. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 2005. Chicago, IL: ACM. pp. 213-223.
  62. Godefroid,P., Levin,M.Y., and Molnar,D., Automated Whitebox Fuzz Testing. 2007, Microsoft MSR-TR- 2007-58.
  63. Havelund,K. and Pressburger,T., Model Checking Java Programs Using Java PathFinder. International Journal on Software Tools for Technology Transfer (STTT), 2000. 2(4): pp. 366-381.
  64. Henzinger,T.A., Jhala,R., Majumdar,R., and Sutre,G., Lazy Abstraction. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). 2002. Portland, OR: ACM. pp. 58-70.
  65. Horwitz,S., Reps,T., and Sagiv,M., Demand Interprocedural DataFlow Analysis. In ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE). 1995. Washington, DC: ACM. pp. 104-115.
  66. Kinder,J., Ktzenbeisser,S., Schallhart,C., and Veith,H., Detecting Malicious Code by Model Checking. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA). 2005. Vienna, Austria: Springer. pp. 174-187.
  67. Kiriansky,V., Bruening,D., and Amarasinghe,S., Secure Execution Via Program Shepherding. In USENIX Security Symposium. 2002. San Francisco, CA: USENIX. pp. 191-206.
  68. Lal,A., Reps,T., and Balakrishnan,G., Extended Weighted Pushdown Systems. In International Conference on Computer Aided Verification (CAV). 2005. Edinburgh, Scotland, UK: Springer. pp. 434-448.
  69. Larus,J.R., The Real Value of Testing. In ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). 2008.
  70. Liblit,B., Cooperative Bug Isolation: Winning Thesis of the 2005 ACM Doctoral Dissertation Competition. Lecture Notes in Computer Science. Vol. 4440. 2007: Springer.
  71. Liblit,B., Aiken,A., Zheng,A.X., and Jordan,M.I., Bug Isolation via Remote Program Sampling. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).
  72. Liblit,B., Naik,M., Zheng,A.X., Aiken,A., and Jordan,M.I., Scalable Statistical Bug Isolation. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).
  73. Lim,J. and Reps,T., A System For Generating Static Analyzers for Machine Instructions. In International Conference on Compiler Construction (CC). 2008. Budapest, Hungary: Springer. pp. 36-52.
  74. Lim,J., Reps,T., and Liblit,B., Extracting Output Formats From Executables. In Working Conference on Reverse Engineering (WCRE). 2006. Benevento, Italy: IEEE Computer Society. pp. 167-178.
  75. Loginov,A., Yahav,E., Chandra,S., Fink,S.J., Rinetzky,N., and Nanda,M.G., Verifying derefence safety via expanding-scope analysis. In ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). 2008.
  76. Loginov,A., Yong,S.H., Horwitz,S., and Reps,T., Debugging via Run-Time Type Checking. In International Conference on Fundamental Approaches to Software Engineering (FASE). 2001. Genova, Italy: Springer. pp. 217-232.
  77. Mariani,L. and Pezzè,M., Inference of Component Protocols by the kBehavior Algorithm. 2004, Università degli Studi di Milano -Bicocca. Dipartimento di Informatica, Sistemistica e Comunicazione. Laboratorio di Test e Analisi del Software, Milano. Technical Report LTA:2004:05.
  78. Melski,D. and Reps,T., Interconvertibility of a Class of Set Constraints and Context-Free Language Reachability. Theoretical Computer Science, 2000. 248(1-2): pp. 29-98.
  79. Miller,B.P., Fredriksen,L., and So,B., An Empirical Study of the Reliability of UNIX Utilities. Communications of the ACM, 1990. 32(12): pp. 32-44.
  80. Miller,B.P., Koski,D., Lee,C.P., Maganty,V., Murthy,R., Natarajan,A., and Steidl,J., Fuzz Revisited: A Re- Examination of the Reliability of UNIX Utilities and Services. 1995, University of Wisconsin-Madison, Madison, WI.
  81. Ramalingam,G., Field,J., and Tip,F., Aggregate Structure Identification and Its Application to Program Analysis. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). 1999. San Antonio, TX: ACM Press. pp. 119-132.
  82. Reps,T., Balakrishnan,G., and Lim,J., Intermediate-Representation Recovery from Low-Level Code. In ACM SIGPLAN Workshop on Partial Evaluation and Semantics-Based Program Manipulation (PEPM). 2006. Charleston, SC: ACM Press. pp. 100-111.
  83. Reps,T., Balakrishnan,G., Lim,J., and Teitelbaum,T., A Next-Generation Platform for Analyzing Executables. in Malware Detection, Christodorescu,M., Jha,S., Maughan,D., Song,D.X., and Wang,C., Editors. 2007, Springer: New York. pp. 43-61.
  84. Reps,T., Schwoon,S., Jha,S., and Melski,D., Weighted Pushdown Systems and Their Application to Interprocedural Dataflow Analysis. Science of Computer Programming, 2005. 58(1-2): pp. 206-263.
  85. Schultz,M.G., Eskin,E., Zadok,E., and Stolfo,S.J., Data Mining Methods for Detection of New Malicious Executables. In IEEE Symposium on Security and Privacy. 2001. Oakland, CA: IEEE Computer Society. pp. 38-49.
  86. Sen,K., Marinov,D., and Agha,G., CUTE: A Concolic Unit Testing Engine for C. In European Software Engineering Conference/ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE). 2005. Lisbon, Portugal: ACM. pp. 263-272.
  87. Sridharan,M. and Bodik,R., Refinement-Based Context-Sensitive Points-To Analysis for Java. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).
  88. Sridharan,M., Gopan,D., Shan,L., and Bodik,R., Demand-Drive Points-To Analysis for Java. In ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA).
  89. Teitelbaum,T. and Reps,T., The Cornell Program Synthesizer: A Syntax-Directed Programming Environment. Communications of the ACM, 1981. 24(9): pp. 563-573.
  90. Thomas Reps, On the sequential nature of interprocedural program-analysis problems. Acta Informatica, 1996. 33: pp. 739-757.
  91. Viega,J., Bloch,J.T., Kohno,T., and McGraw,G., ITS4: A Static Vulnerability Scanner for C and C++ Code. In Annual Computer Security Applications Conference (ACSAC). 2000. New Orleans, LA: IEEE Computer Society. p. 257.
  92. Wagner,D., Foster,J.S., Brewer,E.A., and Aiken,A., A First Step Towards Automated Detection of Buffer Overrun Vulnerabilities. In Symposium on Network and Distributed System Security (NDSS). 2000. San Diego, CA: The Internet Society. pp. 3-17.
  93. Yang,J. and Evans,D., Dynamically Inferring Temporal Properties. In ACM SIGPLAN-SIGSSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE). 2004. Washington, DC: ACM. pp. 23-28.
  94. Doe, J.: Cloud Computing and Software Services (2009) http://netlib.org/utk/people/JackDongarra/PAPERS/GridSolve_chapter.pdf
  95. Grandinetti (ed), Lucio. Grid Computing: The New Frontier of High Performance Computing. Elsevier Science and Technology Books, Inc.. © 2005. Books24x7. <http://common.books24x7.com/book/id_17807/book.asp> (accessed November 15, 2010)
  96. Abbas, Ahmar. Grid Computing: A Practical Guide to Technology and Applications. Cengage Charles River Media. © 2004. Books24x7. <http://common.books24x7.com/book/id_7274/book.asp> (accessed November 15, 2010)
  97. Silaghi, G., Araujo, F., Silva, L., Domingues, P., Arenas, A.: Defeating Colluding Nodes in Desktop Grid Computing Platforms. Parallel and Distributed Processing, 2008. IPDPS 2008.
  98. Sottrup, C., Pederson, J.: Developing Distributed Computing Solutions Combining Grid Computing and Public Computing 85. Lattice Homepage. http://boinc.umiacs.umd.edu/ 104
  99. Meyers, D., Bazinet, A., Cummings, M.: Expanding the Reach of Grid Computing: Combining Globus- and BOINC-Based Systems (2007)
  100. McConnell, Steve. Code Complete, 2nd Edition. Redmond, Wa.: Microsoft Press, 2004.
  101. Parkhill, Douglas. The Challenge of the Computer Utility. Addison-Wesley, 1960.
  102. Bazinet, A.: The Lattice Project: A Multi-Model Grid Computing System (2009). University of Maryland Masters Thesis
  103. Kondo, D., Javadi, B., Malecot, P., Cappello, F., Anderson, D.: Cost-Benefit Analysis of Cloud Computing versus Desktop Grids. IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
  104. Andrzejak, A., Kondo, D., Anderson, D.: Exploiting Non-Dedicated Resources for Cloud Computing.
  105. Majumdar, R., K. Sen. Hybrid Concolic Testing. ICSE, 2007.
  106. Thakur, A., et al. -Directed Proof Generation for Machine Code‖. CAV, 2010.
  107. Hoffmeyr, S. A., S. Forrest, A. Somayaji. Intrustion Detection Using Sequences of System Calls. J. Computer Security, 1998.