Academia.eduAcademia.edu

Outline

Know You Neighbor: Fast Static Prediction of Test Flakiness

2021, IEEE Access

https://doi.org/10.1109/ACCESS.2021.3082424

Abstract

Context: Flaky tests plague regression testing in Continuous Integration environments by slowing down change releases and wasting testing time and effort. Despite the growing interest in mitigating the burden of test flakiness, how to efficiently and effectively detect flaky tests is still an open problem. Objective: In this study, we present and evaluate FLAST, an approach designed to statically predict test flakiness. FLAST leverages vector-space modeling, similarity search, dimensionality reduction, and k-Nearest Neighbor classification in order to timely and efficiently detect test flakiness. Method: In order to gain insights into the efficiency and effectiveness of FLAST, we conduct an empirical evaluation of the approach by considering 13 real-world projects, for a total of 1,383 flaky and 26,702 non-flaky tests. We carry out a quantitative comparison of FLAST with the state-of-the-art methods to detect test flakiness, by considering a balanced dataset comprising 1,402 real-world flaky and as many non-flaky tests. Results: From the results we observe that the effectiveness of FLAST is comparable with the state-of-the-art, while providing considerable gains in terms of efficiency. In addition, the results demonstrate how by tuning the threshold of the approach FLAST can be made more conservative, so to reduce false positives, at the cost of missing more potentially flaky tests. Conclusion: The collected results demonstrate that FLAST provides a fast, low-cost and reliable approach that can be used to guide test rerunning, or to gate the inclusion of new potentially flaky tests.

References (53)

  1. Q. Luo, F. Hariri, L. Eloussi, and D. Marinov, ''An empirical analysis of flaky tests,'' in Proc. 22nd ACM SIGSOFT Int. Symp. Found. Softw. Eng., New York, NY, USA, 2014, pp. 643-653.
  2. Google Testing Blog. (2008). Testing on the Toilet Avoiding Flaky Tests. Accessed: Aug. 6, 2019. [Online]. Available: https://testing.googleblog. com/2008/04/tott-avoiding-flakey-tests.html
  3. M. Fowler. (2011). Eradicating Non-Determinism in Tests. Accessed: Aug. 2, 2019. [Online]. Available: https://martinfowler.com/articles/ nonDeterminism.html
  4. P. Sudarshan. (2012). No More Flaky Tests on the Go Team. Accessed: Aug. 6, 2019. [Online]. Available: https://www.thoughtworks. com/insights/blog/no-more-flaky-tests-go-team,
  5. MDN Web Docs Mozilla. (2019). Test Verification. Accessed: Jul. 13, 2020. [Online]. Available: https://developer.mozilla.org/en-US/ docs/Mozilla/QA/Test_Verification
  6. J. Micco. (2016). Flaky Tests at Google and How Mitigate Them. Accessed: Jul. 22, 2019. [Online]. Available: https://testing.googleblog. com/2016/05/flaky-tests-at-google-and-how-we%.html
  7. C. Leong, A. Singh, M. Papadakis, Y. L. Traon, and J. Micco, ''Assessing transition-based test selection algorithms at Google,'' in Proc. 41st Int. Conf. Softw. Eng., Softw. Eng. Pract., Piscataway, NJ, USA, Mar. 2019, pp. 101-110.
  8. W. Lam, P. Godefroid, S. Nath, A. Santhiar, and S. Thummalapenta, ''Root causing flaky tests in a large-scale industrial setting,'' in Proc. 28th ACM SIGSOFT Int. Symp. Softw. Test. Anal., New York, NY, USA, Jul. 2019, pp. 101-111.
  9. A. Labuschagne, L. Inozemtseva, and R. Holmes, ''Measuring the cost of regression testing in practice: A study of java projects using continuous integration,'' in Proc. 11th Joint Meeting Found. Softw. Eng., New York, NY, USA, Aug. 2017, pp. 821-830.
  10. M. Beller, G. Gousios, A. Panichella, S. Proksch, S. Amann, and A. Zaidman, ''Developer testing in the IDE: Patterns, beliefs, and behav- ior,'' IEEE Trans. Softw. Eng., vol. 45, no. 3, pp. 261-284, Mar. 2017.
  11. J. Micco. (2017). The State of Continuous Integration Testing Google. Accessed: May 20, 2021. [Online]. Available: https://ai.google/ research/pubs/pub45880
  12. J. Bell, O. Legunsen, M. Hilton, L. Eloussi, T. Yung, and D. Marinov, ''DeFlaker: Automatically detecting flaky tests,'' in Proc. 40th Int. Conf. Softw. Eng., New York, NY, USA, May 2018, pp. 433-444.
  13. W. Lam, R. Oei, A. Shi, D. Marinov, and T. Xie, ''IDFlakies: A framework for detecting and partially classifying flaky tests,'' in Proc. 12th IEEE Conf. Softw. Test., Validation Verification (ICST), Apr. 2019, pp. 312-322.
  14. M. Waterloo, S. Person, and S. Elbaum, ''Test analysis: Searching for faults in tests (N),'' in Proc. 30th IEEE/ACM Int. Conf. Automated Softw. Eng. (ASE), Piscataway, NJ, USA, Nov. 2015, pp. 149-154.
  15. K. Herzig and N. Nagappan, ''Empirically detecting false test alarms using association rules,'' in Proc. IEEE/ACM 37th IEEE Int. Conf. Softw. Eng., May 2015, pp. 39-48.
  16. T. M. King, D. Santiago, J. Phillips, and P. J. Clarke, ''Towards a Bayesian network model for predicting flaky automated tests,'' in Proc. IEEE Int. Conf. Softw. Qual., Rel. Secur. Companion (QRS-C), Jul. 2018, pp. 100-107.
  17. M. Harman and P. O'Hearn, ''From start-ups to scale-ups: Opportunities and open problems for static and dynamic program analysis,'' in Proc. IEEE 18th Int. Work. Conf. Source Code Anal. Manipulation (SCAM), Sep. 2018, pp. 1-23.
  18. C. Ziftci and D. Cavalcanti, ''De-flake your tests: Automatically locating root causes of flaky tests in code at Google,'' in Proc. IEEE Int. Conf. Softw. Maintenance Evol. (ICSME), Sep. 2020, pp. 736-745.
  19. B. Miranda, E. Cruciani, R. Verdecchia, and A. Bertolino, ''FAST approaches to scalable similarity-based test case prioritization,'' in Proc. 40th Int. Conf. Softw. Eng., New York, NY, USA, May 2018, pp. 222-232.
  20. E. Cruciani, B. Miranda, R. Verdecchia, and A. Bertolino, ''Scalable approaches for test suite reduction,'' in Proc. IEEE/ACM 41st Int. Conf. Softw. Eng. (ICSE), May 2019, pp. 419-429.
  21. G. Pinto, B. Miranda, S. Dissanayake, M. d'Amorim, C. Treude, and A. Bertolino, ''What is the vocabulary of flaky tests?'' in Proc. 17th Int. Conf. Mining Softw. Repositories, Jun. 2020, pp. 492-502.
  22. N. S. Altman, ''An introduction to kernel and nearest-neighbor nonpara- metric regression,'' Amer. Statist., vol. 46, no. 3, pp. 175-185, Aug. 1992.
  23. E. Kowalczyk, K. Nair, Z. Gao, L. Silberstein, T. Long, and A. Memon, ''Modeling and ranking flaky tests at Apple,'' in Proc. ACM/IEEE 42nd Int. Conf. Softw. Eng., Softw. Eng. Pract., New York, NY, USA, Jun. 2020, pp. 110-119.
  24. A. Vahabzadeh, A. M. Fard, and A. Mesbah, ''An empirical study of bugs in test code,'' in Proc. IEEE Int. Conf. Softw. Maintenance Evol. (ICSME), Washington, DC, USA, Sep. 2015, pp. 101-110.
  25. M. T. Rahman and P. C. Rigby, ''The impact of failing, flaky, and high failure tests on the number of crash reports associated with firefox builds,'' in Proc. 26th ACM Joint Meeting Eur. Softw. Eng. Conf. Symp. Found. Softw. Eng., New York, NY, USA, Oct. 2018, pp. 857-862.
  26. S. Thorve, C. Sreshtha, and N. Meng, ''An empirical study of flaky tests in Android apps,'' in Proc. IEEE Int. Conf. Softw. Maintenance Evol. (ICSME), Washington, DC, USA, Sep. 2018, pp. 534-538.
  27. K. Presler-Marshall, E. Horton, S. Heckman, and K. Stolee, ''Wait, wait. No, tell Me. Analyzing selenium configuration effects on test flakiness,'' in Proc. IEEE/ACM 14th Int. Workshop Autom. Softw. Test (AST), Piscataway, NJ, USA, May 2019, pp. 7-13.
  28. M. Eck, F. Palomba, M. Castelluccio, and A. Bacchelli, ''Understand- ing flaky tests: The developer's perspective,'' in Proc. 27th ACM Joint Meeting Eur. Softw. Eng. Conf. Symp. Found. Softw. Eng., Aug. 2019, pp. 830-840.
  29. W. Lam, K. Muälu, H. Sajnani, and S. Thummalapenta, ''A study on the lifecycle of flaky tests,'' in Proc. ACM/IEEE 42nd Int. Conf. Softw. Eng., New York, NY, USA, Jun. 2020, pp. 1471-1482.
  30. W. Lam, S. Winter, A. Astorga, V. Stodden, and D. Marinov, ''Understand- ing reproducibility and characteristics of flaky tests through test reruns in java projects,'' in Proc. IEEE 31st Int. Symp. Softw. Rel. Eng. (ISSRE), Oct. 2020, pp. 403-413.
  31. W. Lam, S. Winter, A. Wei, T. Xie, D. Marinov, and J. Bell, ''A large-scale longitudinal study of flaky tests,'' Proc. ACM Program. Lang., vol. 4, no. 5, pp. 1-29, Nov. 2020.
  32. A. Gyori, A. Shi, F. Hariri, and D. Marinov, ''Reliable testing: Detecting state-polluting tests to prevent test dependency,'' in Proc. Int. Symp. Softw. Test. Anal., New York, NY, USA, Jul. 2015, pp. 223-233.
  33. A. Gambi, J. Bell, and A. Zeller, ''Practical test dependency detection,'' in Proc. IEEE 11th Int. Conf. Softw. Test., Verification Validation (ICST), Apr. 2018, pp. 1-11.
  34. O. Parry, G. M. Kapfhammer, M. Hilton, and P. Mcminn, ''Flake it 'Till you make it: Using automated repair to induce and fix latent test flakiness,'' in Proc. IEEE/ACM 42nd Int. Conf. Softw. Eng. Workshops, New York, NY, USA, Jun. 2020, pp. 11-12.
  35. V. Terragni, P. Salza, and F. Ferrucci, ''A container-based infrastructure for fuzzy-driven root causing of flaky tests,'' in Proc. ACM/IEEE 42nd Int. Conf. Softw. Eng., New Ideas Emerg. Results, New York, NY, USA, Jun. 2020, pp. 69-72.
  36. A. Shi, W. Lam, R. Oei, T. Xie, and D. Marinov, ''IFixFlakies: A frame- work for automatically fixing order-dependent flaky tests,'' in Proc. 27th ACM Joint Meeting Eur. Softw. Eng. Conf. Symp. Found. Softw. Eng., New York, NY, USA, Aug. 2019, pp. 545-555.
  37. J. Malm, A. Causevic, B. Lisper, and S. Eldh, ''Automated analysis of flakiness-mitigating delays,'' Proc. IEEE/ACM 1st Int. Conf. Autom. Softw. Test, New York, NY, USA, Oct. 2020, pp. 81-84.
  38. A. Shi, A. Gyori, O. Legunsen, and D. Marinov, ''Detecting assumptions on deterministic implementations of non-deterministic specifications,'' in Proc. IEEE Int. Conf. Softw. Test., Verification Validation (ICST), Apr. 2016, pp. 80-90.
  39. D. Silva, L. Teixeira, and M. d'Amorim, ''Shake it! Detecting flaky tests caused by concurrency with shaker,'' in Proc. IEEE Int. Conf. Softw. Maintenance Evol. (ICSME), Sep. 2020, pp. 301-311.
  40. S. Dutta, A. Shi, R. Choudhary, Z. Zhang, A. Jain, and S. Misailovic, ''Detecting flaky tests in probabilistic and machine learning applications,'' in Proc. 29th ACM SIGSOFT Int. Symp. Softw. Test. Anal., New York, NY, USA, Jul. 2020, pp. 211-224.
  41. K. Herzig, ''Let's assume we had to pay for testing,'' Keynote at AST, 2016. [Online]. Available: https://www.slideshare.net/kim.herzig/keynote- ast-2016
  42. J. Leskovec, A. Rajaraman, and J. D. Ullman, Mining Massive Datasets. New York, NY, USA: Cambridge Univ. Press, 2014.
  43. K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, ''When is 'nearest neighbor' meaningful?'' in Proc. Int. Conf. Database Theory. Berlin, Germany: Springer, 1999, pp. 217-235.
  44. D. Achlioptas, ''Database-friendly random projections: Johnson- Lindenstrauss with binary coins,'' J. Comput. Syst. Sci., vol. 66, no. 4, pp. 671-687, Jun. 2003.
  45. P. Li, T. J. Hastie, and K. W. Church, ''Very sparse random projections,'' in Proc. 12th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2006, pp. 287-296.
  46. W. B. Johnson and J. Lindenstrauss, ''Extensions of lipschitz mappings into a Hilbert space,'' Contemp. Math., vol. 26, pp. 189-206, May 1984.
  47. A. Magen, ''Dimensionality reductions in 2 that preserve volumes and dis- tance to affine spaces,'' Discrete Comput. Geometry, vol. 38, pp. 139-153, Jul. 2007.
  48. S. Elbaum, G. Rothermel, and J. Penix, ''Techniques for improving regres- sion testing in continuous integration development environments,'' in Proc. 22nd ACM SIGSOFT Int. Symp. Found. Softw. Eng., New York, NY, USA, 2014, pp. 235-245.
  49. V. R. Basili, ''Goal question metric paradigm,'' in Encyclopedia of Soft- ware Engineering, 1994, pp. 528-532.
  50. R. Kohavi, ''A study of cross-validation and bootstrap for accuracy estima- tion and model selection,'' in Proc. 14th Int. Joint Conf. Artif. Intell., vol. 2. San Francisco, CA, USA: Morgan Kaufmann, 1995, pp. 1137-1143.
  51. J. L. Bentley, ''Multidimensional binary search trees used for associative searching,'' Commun. ACM, vol. 18, no. 9, pp. 509-517, Sep. 1975.
  52. S. M. Omohundro, Five Balltree Construction Algorithms. Berkeley, CA, USA: International Computer Science Institute Berkeley, 1989.
  53. C. Wohlin, P. Runeson, M. Höst, M. C. Ohlsson, B. Regnell, and A. Wessln, Experimentation in Software Engineering. Berlin, Germany: Springer, 2012. ROBERTO VERDECCHIA is currently pursuing the double Ph.D. degree in computer science with the Vrije Universiteit Amsterdam, The Nether- lands, and Gran Sasso Science Institute, L'Aquila, Italy. He is currently a Research Associate with the and Sustainability Group (S2), Vrije Uni- versiteit Amsterdam. His research interests include adoption of empirical methods to improve soft- ware development and system evolution, with par- ticular emphasis in the fields of technical debt, software architecture, software testing, and software energy efficiency. EMILIO CRUCIANI received the M.Sc. degree in engineering (computer science) from Sapienza University Rome, Rome, Italy, in 2016, and the Ph.D. degree in computer science from the Gran Sasso Science Institute, L'Aquila, Italy, in 2019. From 2019 to 2020, he was a Postdoctoral Researcher with COATI Team, INRIA Sophia Antipolis Méditerranée, France, and he is currently a Postdoctoral Researcher with the Efficient Algo- rithms Group, University of Salzburg, Austria. His research interests include the analysis of stochastic processes on com- plex networks and the design and implementation of scalable and efficient algorithms for massive datasets. BRENO MIRANDA received the master's degree in computer science from the Federal University of Pernambuco, Brazil, in 2011, and the Ph.D. degree in computer science from the University of Pisa, Italy, in 2016. He is currently an Assistant Pro- fessor with the Federal University of Pernambuco. His research interest includes software engineer- ing, with particular focus in software testing. ANTONIA BERTOLINO is currently a Research Director of the Italian National Research Coun- cil (CNR), Institute for Information Science and Technologies ''Alessandro Faedo'' (ISTI), Pisa, Italy. Her research covers a broad range of topics and techniques within software testing. She has published more than 200 papers in international journals, conferences, and workshops. She has par- ticipated to several collaborative projects, includ- ing more recently the European projects ElasTest, Learn Pad, and CHOReOS. She serves regularly in the Program Com- mittee of top conferences in software engineering, such as ESEC-FSE, ICSE, Software Testing, and ISSTA, ICST. She currently serves as a Senior Associate Editor for the Journal of Systems and Software (Elsevier), and as an Associate Editor of ACM Transactions on Software Engineering and Methodology, Empirical Software Engineering (Springer), and Journal of Software: Evolution and Process (Wiley).