Automatic Recovery from Runtime Failures
https://doi.org/10.1109/ICSE.2013.6606624Abstract
We present a technique to make applications resilient to failures. This technique is intended to maintain a faulty application functional in the field while the developers work on permanent and radical fixes. We target field failures in applications built on reusable components. In particular, the technique exploits the intrinsic redundancy of those components by identifying workarounds consisting of alternative uses of the faulty components that avoid the failure. The technique is currently implemented for Java applications but makes little or no assumptions about the nature of the application, and works without interrupting the execution flow of the application and without restarting its components. We demonstrate and evaluate this technique on four mid-size applications and two popular libraries of reusable components affected by real and seeded faults. In these cases the technique is effective, maintaining the application fully functional with between 19% and 48% of the failure-...
References (34)
- D. A. Patterson, G. Gibson, and R. H. Katz, "A case for redundant arrays of inexpensive disks (RAID)," SIGMOD Record, vol. 17, no. 3, 1988.
- A. Avizienis, "The N-version approach to fault-tolerant software," IEEE Transactions on Software Engineering, vol. 11, no. 12, 1985.
- B. Randell, "System structure for software fault tolerance," in Proceed- ings of the International Conference on Reliable software, 1975.
- B. Demsky and M. Rinard, "Automatic detection and repair of errors in data structures," in Proceedings of the 18th Conference on Object- oriented Programming, Systems, Languages, and Applications, 2003.
- I. Hussain and C. Csallner, "DSDSR: a tool that uses dynamic symbolic execution for data structure repair," in Proceedings of the 8th Interna- tional Workshop on Dynamic Analysis, 2010.
- B. J. Garvin, M. B. Cohen, and M. B. Dwyer, "Using feature locality: can we leverage history to avoid failures during reconfiguration?" in Proceedings of the 8th Workshop on Assurances for Self-Adaptive Systems, 2011.
- M. Carbin, S. Misailovic, and M. Kling, "Detecting and escaping infinite loops with Jolt," in Proceedings of the 25th European Conference on Object-Oriented Programming, 2011.
- J. H. Perkins, G. Sullivan, W.-f.
- Wong, Y. Zibin, M. D. Ernst, M. Rinard, S. Kim, S. Larsen, S. Amarasinghe, J. Bachrach, M. Carbin, C. Pacheco, F. Sherwood, and S. Sidiroglou, "Automatically patching errors in deployed software," in Proceedings of the 22nd International Symposium on Operating Systems Principles, 2009.
- G. Candea, S. Kawamoto, Y. Fujiki, G. Friedman, and A. Fox, "Microreboot-a technique for cheap recovery," in Proceedings of the 6th Symposium on Operating Systems Design & Implementation, 2004.
- F. Qin, J. Tucek, Y. Zhou, and J. Sundaresan, "Rx: Treating bugs as allergies-a safe method to survive software failures," ACM Transac- tions on Computer Systems, vol. 25, no. 3, 2007.
- H. Chang, L. Mariani, and M. Pezzè, "In-field healing of integration problems with COTS components," in Proceedings of the 31st Interna- tional Conference on Software Engineering, 2009.
- B. Cabral and P. Marques, "A transactional model for automatic excep- tion handling," Computer Languages, Systems and Structures, vol. 37, no. 1, 2011.
- A. Carzaniga, A. Gorla, N. Perino, and M. Pezzè, "Automatic workarounds for Web applications," in Proceedings of the 18th Interna- tional Symposium on the Foundations of Software Engineering, 2010.
- M. Kim, V. Sazawal, D. Notkin, and G. Murphy, "An empirical study of code clone genealogies," in Proceedings of the 10th Conference on the Foundations of Software Engineering, 2005.
- T. Kamiya, S. Kusumoto, and K. Inoue, "CCFinder: a multilinguistic token-based code clone detection system for large scale source code," IEEE Transactions on Software Engineering, vol. 28, no. 7, 2002.
- A. Hindle, E. Barr, Z. Su, P. Devanbu, and M. Gabel, "On the "naturalness" of software," in Proceedings of the 34th International Conference on Software Engineering, 2012.
- L. Jiang and Z. Su, "Automatic mining of functionally equivalent code fragments via random testing," in Proceedings of the 18th International Symposium on Software testing and analysis, 2009.
- R. Just, F. Schweiggert, and G. M. Kapfhammer, "MAJOR: An efficient and extensible tool for mutation analysis in a java compiler," in 2011 26th International Conference on Automated Software Engineering, 2011.
- M. D. Ernst, J. Cockrell, W. G. Griswold, and D. Notkin, "Dynamically discovering likely program invariants to support program evolution," IEEE Transactions on Software Engineering, vol. 27, no. 2, 2001.
- J.-C. Laprie, C. Béounes, and K. Kanoun, "Definition and analysis of hardware-and software-fault-tolerant architectures," Computer, vol. 23, no. 7, 1990.
- P. E. Ammann and J. C. Knight, "Data diversity: An approach to software fault tolerance," IEEE Transactions on Computers, vol. 37, no. 4, 1988.
- P. Popov, S. Riddle, A. Romanovsky, and L. Strigini, "On systematic design of protectors for employing OTS items," in Proceedings of the 27th Euromicro Conference, 2001.
- F. Cristian, "Exception handling and software fault tolerance," IEEE Transactions on Computers, vol. 31, no. 6, 1982.
- Y. Huang, C. Kintala, N. Kolettis, and N. Fulton, "Software rejuve- nation: analysis, module and applications," in Proceedings of the 25th International Symposium on Fault-Tolerant Computing, 1995.
- M. Elnozahy, L. Alvisi, Y.-M. Wang, and D. B. Johnson, "A survey of rollback-recovery protocols in message-passing systems," ACM Comput- ing Surveys, vol. 34, no. 3, 2002.
- B. Elkarablieh, I. Garcia, Y. L. Suen, and S. Khurshid, "Assertion-based repair of complex data structures," in Proceedings of the 22th IEEE Conference on Automated Software Engineering, 2007.
- H. Samimi, M. Schäfer, S. Artzi, T. Millstein, F. Tip, and L. Hendren, "Automated repair of HTML generation errors in PHP applications using string constraint solving," in Proceedings of the 34th International Conference on Software Engineering, 2012.
- F. Long, V. Ganesh, M. Carbin, S. Sidiroglou, and M. Rinard, "Au- tomatic input rectification," in Proceedings of the 34th International Conference on Software Engineering, 2012.
- D. Harmanci, V. Gramoli, and P. Felber, "Atomic boxes: coordinated exception handling with transactional memory," in Proceedings of the 25th European Conference on Object-Oriented Programming, 2011.
- V. Dallmeier, A. Zeller, and B. Meyer, "Generating fixes from object be- havior anomalies," in Proceedings of the 24th International Conference on Automated Software Engineering, 2009.
- W. Weimer, T. Nguyen, C. L. Goues, and S. Forrest, "Automatically finding patches using genetic programming," in Proceedings of the 31st International Conference on Software Engineering, 2009.
- A. Arcuri and X. Yao, "A novel co-evolutionary approach to automatic software bug fixing," in Proceedings of 11th IEEE Congress on Evolu- tionary Computation, 2008.
- Y. Wei, Y. Pei, C. A. Furia, L. S. Silva, S. Buchholz, B. Meyer, and A. Zeller, "Automated fixing of programs with contracts," in Proceedings of the 19th International Symposium on Software Testing and Analysis, 2010.