Bellwethers: A Baseline Method For Transfer Learning
2018, IEEE Transactions on Software Engineering
https://doi.org/10.1109/TSE.2018.2821670Abstract
Transfer learning has been the subject of much recent research. In practice, that research means that the models are unstable since they are continually revised whenever new data arrives. This paper offers a very simple "bellwether" transfer learner. Given N datasets, we find which one produces the best predictions on all the others. This "bellwether" dataset is then used for all subsequent predictions (when its predictions start failing, one may seek another bellwether). Bellwethers are interesting since they are very simple to find (wrap a for-loop around standard data miners). They simplify the task of making general policies in software engineering since as long as one bellwether remains useful, stable conclusions for N datasets can be achieved by reasoning over that bellwether. This paper shows that this bellwether approach works for multiple datasets from various domains in SE. From this, we conclude that (1) bellwether method is a useful (and simple) transfer learner; (2) Unlike bellwethers, other complex transfer learners do not generalized to all domains in SE; (3) "bellwethers" are a baseline method against which future transfer learners should be compared; (4) When building increasingly complex automatic methods, researchers should pause and compare more sophisticated method against simpler alternatives.
References (84)
- RQ2: How well do transfer learners perform across different domains? RQ4: How much data is required to find the bellwether dataset? REFERENCES
- J. Czerwonka, R. Das, N. Nagappan, A. Tarvo, and A. Teterev, "Crane: Failure prediction, change analysis and test prioritization in practice -experiences from windows," in Software Testing, Verification and Validation (ICST), 2011 IEEE Fourth International Conference on, march 2011, pp. 357 -366.
- T. J. Ostrand, E. J. Weyuker, and R. M. Bell, "Where the bugs are," in ISSTA '04: Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis. New York, NY, USA: ACM, 2004, pp. 86-96.
- T. Menzies, A. Dekhtyar, J. Distefano, and J. Greenwald, "Problems with Precision: A Response to "Comments on 'Data Mining Static Code Attributes to Learn Defect Predictors'"," IEEE Transactions on Software Engineering, vol. 33, no. 9, pp. 637-640, sep 2007. [Online]. Available: http://ieeexplore.ieee.org/ lpdocs/epic03/wrapper.htm?arnumber=4288197
- B. Turhan, A. Tosun, and A. Bener, "Empirical evaluation of mixed-project defect prediction models," in Software Engineering and Advanced Applications (SEAA), 2011 37th EUROMICRO Conference on. IEEE, 2011, pp. 396-403.
- E. Kocaguneli, T. Menzies, A. Bener, and J. Keung, "Exploiting the essential assumptions of analogy-based effort estimation," IEEE Transactions on Software Engineering, vol. 28, pp. 425-438, 2012, available from http://menzies.us/pdf/ 11teak.pdf.
- T. Savor, M. Douglas, M. Gentili, L. Williams, K. Beck, and M. Stumm, "Contin- uous deployment at facebook and oanda," in Proceedings of the 38th International Conference on Software Engineering Companion. ACM, 2016, pp. 21-30.
- M. Linares-Vásquez, G. Bavota, C. Bernal-Cárdenas, R. Oliveto, M. Di Penta, and D. Poshyvanyk, "Mining energy-greedy api usage patterns in android apps: an empirical study," in Proceedings of the 11th Working Conference on Mining Software Repositories. ACM, 2014, pp. 2-11.
- A. Begel and T. Zimmermann, "Analyze this! 145 questions for data scientists in software engineering," in Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, June 2014. [Online]. Available: http://research.microsoft.com/apps/pubs/default.aspx?id=208800
- C. Theisen, K. Herzig, P. Morrison, B. Murphy, and L. Williams, "Approximating attack surfaces with stack traces," in ICSE'15, 2015.
- T. Zimmermann and T. Menzies, "Software analytics: So what?" IEEE Software, vol. 30, no. 4, pp. 0031-37, 2013.
- C. Bird, T. Menzies, and T. Zimmermann, The Art and Science of Analyzing Software Data. Elsevier, 2015.
- F. Sarro, A. Petrozziello, and M. Harman, "Multi-objective software effort estima- tion," in 38th International Conference on Software Engineering (ICSE'16), no. to appear. ACM, 2016.
- M. Choetkiertikul, H. K. Dam, T. Tran, and A. Ghose, "Predicting delays in software projects using networked classification (t)," in Automated Software Engineering (ASE), 2015 30th IEEE/ACM International Conference on. IEEE, 2015, pp. 353-364.
- F. Rahman, S. Khatri, E. T. Barr, and P. Devanbu, "Comparing static bug finders and statistical prediction," in Proceedings of the 36th International Conference on Software Engineering. ACM, 2014, pp. 424-434.
- Y. Shin and L. Williams, "Can traditional fault prediction models be used for vulnerability prediction?" Empirical Software Engineering, vol. 18, no. 1, pp. 25- 59, 2013.
- B. Turhan, T. Menzies, A. Bener, and J. Distefano, "On the relative value of cross-company and within-company data for defect prediction," Empirical Software Engineering, vol. 68, no. 2, pp. 278-290, 2009, available from http://menzies.us/ pdf/08ccwc.pdf.
- J. Nam and S. Kim, "Heterogeneous defect prediction," in Proc. 2015 10th Jt. Meet. Found. Softw. Eng. -ESEC/FSE 2015. New York, New York, USA: ACM Press, 2015, pp. 508-519. [Online]. Available: http: //dl.acm.org/citation.cfm?doid=2786805.2786814
- Z. He, F. Peters, T. Menzies, and Y. Yang, "Learning from open-source projects: An empirical study on defect prediction," in Empirical Software Engineering and Measurement, 2013 ACM/IEEE International Symposium on. IEEE, 2013, pp. 45-54.
- A. Tosun, A. Bener, and R. Kale, "AI-based software defect predictors: Applications and benefits in a case study," in Twenty-Second IAAI Conference on Artificial Intelligence, 2010.
- A. Tosun, A. Bener, and B. Turhan, "Practical considerations of deploying ai in defect prediction: A case study within the Turkish telecommunication industry," in PROMISE'09, 2009.
- M. Lumpe, R. Vasa, T. Menzies, R. Rush, and R. Turhan, "Learning better inspection optimization policies," International Journal of Software Engineering and Knowledge Engineering, vol. 21, no. 45, pp. 725-753, 2011.
- C. Bird, A. Bachmann, E. Aune, J. Duffy, A. Bernstein, V. Filkov, and P. Devanbu, "Fair and balanced?: bias in bug-fix datasets," in Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering. ACM, 2009, pp. 121-130.
- F. Rahman and P. Devanbu, "How, and why, process metrics are better," in 2013 35th International Conference on Software Engineering (ICSE), May 2013, pp. 432-441.
- J. Nam, S. J. Pan, and S. Kim, "Transfer defect learning," in Proceedings - International Conference on Software Engineering, 2013, pp. 382-391.
- X. Jing, F. Wu, X. Dong, F. Qi, and B. Xu, "Heterogeneous Cross-Company Defect Prediction by Unified Metric Representation and CCA-Based Transfer Learning Categories and Subject Descriptors," Proceeding of the 10th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE 2015), pp. 496-507, 2015.
- E. Kocaguneli and T. Menzies, "How to find relevant data for effort estimation?" in Empirical Software Engineering and Measurement (ESEM), 2011 International Symposium on. IEEE, 2011, pp. 255-264.
- E. Kocaguneli, T. Menzies, and E. Mendes, "Transfer learning in effort estimation," Empirical Software Engineering, vol. 20, no. 3, pp. 813-843, jun 2015. [Online]. Available: http://link.springer.com/10.1007/s10664-014-9300-5
- B. Turhan, T. Menzies, A. B. Bener, and J. Di Stefano, "On the relative value of cross-company and within-company data for defect prediction," Empirical Software Engineering, vol. 14, no. 5, pp. 540-578, 2009.
- F. Peters, T. Menzies, and L. Layman, "LACE2: Better privacy-preserving data sharing for cross project defect prediction," in Proceedings -International Confer- ence on Software Engineering, vol. 1, 2015, pp. 801-811.
- F. Rahman, D. Posnett, and P. Devanbu, "Recalling the "imprecision" of cross-project defect prediction," in Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, ser. FSE '12. New York, NY, USA: ACM, 2012, pp. 61:1-61:11. [Online]. Available: http://doi.acm.org/10.1145/2393596.2393669
- R. Krishna, T. Menzies, and W. Fu, "Too Much Automation? The Bellwether Effect and Its Implications for Transfer Learning," in ASE'16, 2016.
- Y. Ma, G. Luo, X. Zeng, and A. Chen, "Transfer learning for cross-company software defect prediction," Information and Software Technology, vol. 54, no. 3, pp. 248-256, 2012.
- D. Ryu, O. Choi, and J. Baik, "Value-cognitive boosting with a support vector machine for cross-project defect prediction," Empir. Softw. Eng., vol. 21, no. 1, pp. 43-71, feb 2016. [Online]. Available: http://link.springer.com/10.1007/ s10664-014-9346-4
- T. Zimmermann, N. Nagappan, H. Gall, E. Giger, and B. Murphy, "Cross-project defect prediction: a large scale experiment on data vs. domain vs. process," in Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering. ACM, 2009, pp. 91-100.
- F. Zhang, A. Mockus, I. Keivanloo, and Y. Zou, "Towards building a universal defect prediction model with rank transformed predictors," Empirical Software Engineering, pp. 1-39, 2015. [Online]. Available: http: //dx.doi.org/10.1007/s10664-015-9396-2
- T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell, "A systematic literature review on fault prediction performance in software engineering," IEEE Transactions on Software Engineering, vol. 38, no. 6, pp. 1276-1304, Nov 2012.
- R. Krishna, T. Menzies, and L. Layman, "Less is More: Minimizing Code Reorganization using XTREE," CoRR, vol. abs/1609.03614, 2016. [Online]. Available: http://arxiv.org/abs/1609.03614
- M. Fowler, K. Beck, J. Brant, W. Opdyke, and D. Roberts, Refactoring: Improving the Design of Existing Code. Boston, MA, USA: Addison-Wesley Longman, 1999.
- A. Yamashita and S. Counsell, "Code smells as system-level indicators of main- tainability: An empirical study," Journal of Systems and Software, vol. 86, no. 10, pp. 2639-2653, 2013.
- A. Yamashita and L. Moonen, "Exploring the impact of inter-smell relations on software maintainability: An empirical study," in Proceedings of the 2013 International Conference on Software Engineering. IEEE Press, 2013, pp. 682- 691.
- N. Zazworka, M. A. Shaw, F. Shull, and C. Seaman, "Investigating the impact of design debt on software quality," in Proceedings of the 2nd Workshop on Managing Technical Debt. ACM, 2011, pp. 17-23.
- F. A. Fontana, P. Braione, and M. Zanoni, "Automatic detection of bad smells in code: An experimental assessment." Journal of Object Technology, vol. 11, no. 2, pp. 5-1, 2012.
- J. Kreimer, "Adaptive Detection of Design Flaws," Electronic Notes in Theoretical Computer Science, vol. 141, no. 4, pp. 117-136, 2005. [Online]. Available: //www.sciencedirect.com/science/article/pii/S1571066105051844
- F. Khomh, S. Vaucher, Y. G. Guhneuc, and H. Sahraoui, "A bayesian approach for the detection of code and design smells," in 2009 Ninth International Conference on Quality Software, Aug 2009, pp. 305-314.
- F. Khomh, S. Vaucher, Y.-G. Guhneuc, and H. Sahraoui, "Bdtex: A gqm-based bayesian approach for the detection of antipatterns," Journal of Systems and Software, vol. 84, no. 4, pp. 559 -572, 2011, the Ninth International Conference on Quality Software. [Online]. Available: //www.sciencedirect.com/science/article/pii/S0164121210003225
- J. Yang, K. Hotta, Y. Higo, H. Igaki, and S. Kusumoto, "Filtering clones for individual user based on machine learning analysis," in 2012 6th International Workshop on Software Clones (IWSC), June 2012, pp. 76-77.
- F. Arcelli Fontana, M. V. Mäntylä, M. Zanoni, and A. Marino, "Comparing and experimenting machine learning techniques for code smell detection," Empir. Softw. Eng., vol. 21, no. 3, pp. 1143-1191, jun 2016. [Online]. Available: http://dx.doi.org/ 10.1007/s10664-015-9378-4http://link.springer.com/10.1007/s10664-015-9378-4
- E. Tempero, C. Anslow, J. Dietrich, T. Han, J. Li, M. Lumpe, H. Melton, and J. Noble, "Qualitas corpus: A curated collection of java code for empirical studies," in 2010 Asia Pacific Software Engineering Conference (APSEC2010), Dec. 2010, pp. 336-345.
- L. D. Panjer, "Predicting eclipse bug lifetimes," in Proceedings of the Fourth International Workshop on Mining Software Repositories, ser. MSR '07. Washington, DC, USA: IEEE Computer Society, 2007, pp. 29-. [Online]. Available: http://dx.doi.org/10.1109/MSR.2007.25
- E. Giger, M. Pinzger, and H. Gall, "Predicting the fix time of bugs," in Proceedings of the 2Nd International Workshop on Recommendation Systems for Software Engineering, ser. RSSE '10. New York, NY, USA: ACM, 2010, pp. 52-56. [Online]. Available: http://doi.acm.org/10.1145/1808920.1808933
- H. Zhang, L. Gong, and S. Versteeg, "Predicting bug-fixing time: An empirical study of commercial software projects," in Proceedings of the 2013 International Conference on Software Engineering, ser. ICSE '13. Piscataway, NJ, USA: IEEE Press, 2013, pp. 1042-1051. [Online]. Available: http://dl.acm.org/citation.cfm?id=2486788.2486931
- M. Rees-jones, M. Martin, C. College, and T. Menzies, "Better Predictors for Issue Lifetime," , pp. 1-8, 2017.
- R. Kikas, M. Dumas, and D. Pfahl, "Using dynamic and contextual features to predict issue lifetime in github projects," in Proceedings of the 13th International Conference on Mining Software Repositories, ser. MSR '16. New York, NY, USA: ACM, 2016, pp. 291-302. [Online]. Available: http://doi.acm.org/10.1145/2901739.2901751
- B. A. Kitchenham, E. Mendes, and G. H. Travassos, "Cross versus within-company cost estimation studies: A systematic review," IEEE Trans. Softw. Eng., vol. 33, no. 5, pp. 316-329, May 2007. [Online]. Available: http://dx.doi.org/10.1109/TSE.2007.1001
- Y. Yang, L. Xie, Z. He, Q. Li, V. Nguyen, B. Boehm, and R. Valerdi, "Local bias and its impacts on the performance of parametric estimation models," in Proceedings of the 7th International Conference on Predictive Models in Software Engineering -Promise '11. New York, New York, USA: ACM Press, 2011, pp. 1-10. [Online]. Available: http://dl.acm.org/citation.cfm?doid=2020390.2020404
- T. Menzies, Y. Yang, G. Mathew, B. Boehm, and J. Hihn, "Negative results for software effort estimation," Empirical Software Engineering, pp. 1-26, 2016. [Online]. Available: http://dx.doi.org/10.1007/s10664-016-9472-2
- B. W. Boehm et al., Software engineering economics. Prentice-hall Englewood Cliffs (NJ), 1981, vol. 197.
- S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, "Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings," IEEE Trans. Softw. Eng., vol. 34, no. 4, pp. 485-496, jul 2008. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm? arnumber=4527256
- M. Lowry, M. Boyd, and D. Kulkami, "Towards a theory for integration of math- ematical verification and empirical testing," in Automated Software Engineering, 1998. Proceedings. 13th IEEE International Conference on. IEEE, 1998, pp. 322-331.
- N. Nagappan and T. Ball, "Static analysis tools as early indicators of pre-release defect density," in ICSE 2005, St. Louis, 2005.
- T. Menzies, D. Raffo, S. on Setamanit, Y. Hu, and S. Tootoonian, "Model-based tests of truisms," in Proceedings of IEEE ASE 2002, 2002, available from http: //menzies.us/pdf/02truisms.pdf.
- C. Lewis, Z. Lin, C. Sadowski, X. Zhu, R. Ou, and E. J. Whitehead Jr., "Does bug prediction support human developers? findings from a google case study," in Proceedings of the 2013 International Conference on Software Engineering, ser. ICSE '13. Piscataway, NJ, USA: IEEE Press, 2013, pp. 372-381. [Online]. Available: http://dl.acm.org/citation.cfm?id=2486788.2486838
- S. Rakitin, Software Verification and Validation for Practitioners and Managers, Second Edition. Artech House, 2001.
- T. Menzies, J. Greenwald, and A. Frank, "Data mining static code attributes to learn defect predictors," IEEE Transactions on Software Engineering, January 2007, available from http://menzies.us/pdf/06learnPredict.pdf.
- F. Shull, V. B. ad B. Boehm, A. Brown, P. Costa, M. Lindvall, D. Port, I. Rus, R. Tesoriero, and M. Zelkowitz, "What we have learned about fighting defects," in Proceedings of 8th International Software Metrics Symposium, Ottawa, Canada, 2002, pp. 249-258.
- M. Fagan, "Design and code inspections to reduce errors in program development," IBM Systems Journal, vol. 15, no. 3, 1976.
- M. D'Ambros, M. Lanza, and R. Robbes, "Evaluating defect prediction approaches: a benchmark and an extensive comparison," Empir. Softw. Eng., vol. 17, no. 4-5, pp. 531-577, aug 2012. [Online]. Available: http: //link.springer.com/10.1007/s10664-011-9173-9
- R. Wu, H. Zhang, S. Kim, and S.-C. Cheung, "ReLink," in Proc. 19th ACM SIGSOFT Symp. 13th Eur. Conf. Found. Softw. Eng. -SIGSOFT/FSE '11. New York, New York, USA: ACM Press, 2011, p. 15.
- V. R. Basili, L. C. Briand, and W. L. Melo, "A validation of object-oriented design metrics as quality indicators," Software Engineering, IEEE Transactions on, vol. 22, no. 10, pp. 751-761, 1996.
- N. Ohlsson and H. Alberg, "Predicting fault-prone software modules in telephone switches," Software Engineering, IEEE Transactions on, vol. 22, no. 12, pp. 886- 894, 1996.
- S. Kim, H. Zhang, R. Wu, and L. Gong, "Dealing with noise in defect prediction," in Software Engineering (ICSE), 2011 33rd International Conference on. IEEE, 2011, pp. 481-490.
- M. Jureczko and L. Madeyski, "Towards identifying software project clusters with regard to defect prediction," in Proc. 6th Int. Conf. Predict. Model. Softw. Eng. - PROMISE '10. New York, New York, USA: ACM Press, 2010, p. 1. [Online]. Available: http://portal.acm.org/citation.cfm?doid=1868328.1868342
- L. Breiman, "Random forests," Machine learning, pp. 5-32, 2001. [Online]. Available: http://link.springer.com/article/10.1023/A:1010933404324
- L. Pelayo and S. Dick, "Applying Novel Resampling Strategies To Software Defect Prediction," in NAFIPS 2007 -2007 Annu. Meet. North Am. Fuzzy Inf. Process. Soc. IEEE, jun 2007, pp. 69-72. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4271036
- N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: Synthetic minority over-sampling technique," J. Artif. Intell. Res., vol. 16, 2002.
- Y. Ma and B. Cukic, "Adequate and precise evaluation of quality models in software engineering studies," in Predictor Models in Software Engineering, 2007. PROMISE'07: ICSE Workshops 2007. International Workshop on, May 2007, pp. 1-1.
- N. Mittas and L. Angelis, "Ranking and clustering software cost estimation models through a multiple comparisons algorithm," IEEE Trans. Software Eng., vol. 39, no. 4, pp. 537-551, 2013.
- B. Efron and R. J. Tibshirani, An introduction to the bootstrap, ser. Mono. Stat. Appl. Probab. London: Chapman and Hall, 1993.
- M. J. Shepperd and S. G. MacDonell, "Evaluating prediction systems in software project estimation," Information & Software Technology, vol. 54, no. 8, pp. 820- 827, 2012.
- V. B. Kampenes, T. Dybå, J. E. Hannay, and D. I. K. Sjøberg, "A systematic review of effect size in software engineering experiments," Information & Software Technology, vol. 49, no. 11-12, pp. 1073-1086, 2007.
- E. Kocaguneli, T. Zimmermann, C. Bird, N. Nagappan, and T. Menzies, "Dis- tributed development considered harmful?" in Proceedings -International Confer- ence on Software Engineering, 2013, pp. 882-890.
- A. Arcuri and L. Briand, "A practical guide for using statistical tests to assess randomized algorithms in software engineering," in ICSE'11, 2011, pp. 1-10.
- W. Fu, T. Menzies, and X. Shen, "Tuning for software analytics: Is it really necessary?" Information and Software Technology, vol. 76, pp. 135 - 146, 2016. [Online]. Available: http://www.sciencedirect.com/science/article/pii/ S0950584916300738