Liyan Song

Follower

Following

Public Views

Interests

Uploads

Papers by Liyan Song

On the validity of retrospective predictive performance evaluation procedures in just-in-time software defect prediction

Empirical Software Engineering, Aug 31, 2023

Just-In-Time Software Defect Prediction (JIT-SDP) is concerned with predicting whether software c... more Just-In-Time Software Defect Prediction (JIT-SDP) is concerned with predicting whether software changes are defect-inducing or clean. It operates in scenarios where labels of software changes arrive over time with delay, which in part corresponds to the time we wait to label software changes as clean (waiting time). However, clean labels decided based on waiting time may be different from the true labels of software changes, i.e., there may be label noise. This typically overlooked issue has recently been shown to affect the validity of continuous performance evaluation procedures used to monitor the predictive performance of JIT-SDP models during the software development process. It is still unknown whether this issue could potentially also affect evaluation procedures that rely on retrospective collection of software changes such as those adopted in JIT-SDP research studies, affecting the validity of the conclusions of a large body of existing work. We conduct the first investigation of the extent with which the choice of waiting time and its corresponding label noise would affect the validity of retrospective performance evaluation procedures. Based on 13 GitHub projects, we found that the choice of waiting time did not have a significant impact on the validity and that even small waiting times resulted in high validity. Therefore, (1) the estimated predictive performances in JIT-SDP studies are likely reliable in view of different waiting times, and (2) future studies can make use of not only larger (5k+ software changes), but also smaller (1k software changes) projects for evaluating performance of JIT-SDP models.

Download

A Procedure to Continuously Evaluate Predictive Performance of Just-In-Time Software Defect Prediction Models During Software Development

IEEE Transactions on Software Engineering, Feb 1, 2023

Just-In-Time Software Defect Prediction (JIT-SDP) uses machine learning to predict whether softwa... more Just-In-Time Software Defect Prediction (JIT-SDP) uses machine learning to predict whether software changes are defect-inducing or clean. When adopting JIT-SDP, changes in the underlying defect generating process may significantly affect the predictive performance of JIT-SDP models over time. Therefore, being able to continuously track the predictive performance of JIT-SDP models during the software development process is of utmost importance for software companies to decide whether or not to trust the predictions provided by such models over time. However, there has been little discussion on how to continuously evaluate predictive performance in practice, and such evaluation is not straightforward. In particular, labeled software changes that can be used for evaluation arrive over time with a delay, which in part corresponds to the time we have to wait to label software changes as 'clean' (waiting time). A clean label assigned based on a given waiting time may not correspond to the true label of the software changes. This can potentially hinder the validity of any continuous predictive performance evaluation procedure for JIT-SDP models. This paper provides the first discussion of how to continuously evaluate predictive performance of JIT-SDP models over time during the software development process, and the first investigation of whether and to what extent waiting time affects the validity of such continuous performance evaluation procedure in JIT-SDP. Based on 13 GitHub projects, we found that waiting time had a significant impact on the validity. Though typically small, the differences in estimated predicted performance were sometimes large, and thus inappropriate choices of waiting time can lead to misleading estimations of predictive performance over time. Such impact did not normally change the ranking between JIT-SDP models, and thus conclusions in terms of which JIT-SDP model performs better are likely reliable independent of the choice of waiting time, especially when considered across projects.

Download

Spectral Decomposition for Optimal Graph Index Prediction

Lecture Notes in Computer Science, 2013

Recently, there has been ample of research on indexing for structural graph queries. However, as ... more Recently, there has been ample of research on indexing for structural graph queries. However, as verified by our experiments with a large number of random graphs and scale-free graphs, the performances of indexes of graph queries may vary greatly. Unfortunately, the structures of graph indexes are too often complex and ad-hoc; and deriving an accurate performance model appears a daunting task. As a result, database practitioners may encounter difficulties in choosing the optimal index for their data graphs. In this paper, we address this problem by a spectral decomposition for predicting relative performances of graph indexes. Specifically, given a graph, we compute its spectrum. We propose a similarity function to compare the spectrums of graphs. We adopt a classification algorithm to build a model and a voting algorithm for the prediction of the optimal index. Our empirical studies on a large number of random graphs and scale-free graphs and four structurally distinguishable indexes demonstrate that our spectral decomposition is robust and almost always exhibits accuracies higher than 70%.

Download

1 So ware E ort Interval Prediction via Bayesian Inference and Synthetic Bootstrap Resampling

Soware eort estimation (SEE) usually suers from inherent uncertainty arising from predictive m... more Soware eort estimation (SEE) usually suers from inherent uncertainty arising from predictive model limitations and data noise. Relying on point estimation only may ignore the uncertain factors and lead project managers (PMs) to wrong decision-making. Prediction intervals (PIs) with condence levels (CLs) present a more reasonable representation of reality, potentially helping PMs to make beer informed decisions and enable more exibility in these decisions. However, existing methods for PIs either have strong limitations, or are unable to provide informative PIs. To develop a ‘beer’ eort predictor, we propose a novel PI estimator called Synthetic Bootstrap ensemble of Relevance Vector Machines (SynB-RVM) that adopts Bootstrap resampling to produce multiple RVM models based on modied training bags whose replicated data projects are replaced by their synthetic counterparts. We then provide three ways to ensemble those RVM models into a nal probabilistic eort predictor, from w...

Download

An investigation of cross-project learning in online just-in-time software defect prediction

Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, 2020

Just-In-Time Software Defect Prediction (JIT-SDP) is concerned with predicting whether software c... more Just-In-Time Software Defect Prediction (JIT-SDP) is concerned with predicting whether software changes are defect-inducing or clean based on machine learning classifiers. Building such classifiers requires a sufficient amount of training data that is not available at the beginning of a software project. Cross-Project (CP) JIT-SDP can overcome this issue by using data from other projects to build the classifier, achieving similar (not better) predictive performance to classifiers trained on Within-Project (WP) data. However, such approaches have never been investigated in realistic online learning scenarios, where WP software changes arrive continuously over time and can be used to update the classifiers. It is unknown to what extent CP data can be helpful in such situation. In particular, it is unknown whether CP data are only useful during the very initial phase of the project when there is little WP data, or whether they could be helpful for extended periods of time. This work thus provides the first investigation of when and to what extent CP data are useful for JIT-SDP in a realistic online learning scenario. For that, we develop three different CP JIT-SDP approaches that can operate in online mode and be updated with both incoming CP and WP training examples over time. We also collect 2048 commits from three software repositories being developed by a software company over the course of 9 to 10 months, and use 19,8468 commits from 10 active open source GitHub projects being developed over the course of 6 to 14 years. The study shows that training classifiers with incoming CP+WP data can lead to improvements in G-mean of up to 53.90% compared to classifiers using only WP data at the initial stage of the projects. For the open source projects, which have been running for longer periods of time, using CP data to supplement WP data also helped the classifiers to reduce or prevent large drops in predictive performance that may occur over time, leading to up to around 40% better G-Mean during such periods. Such use of CP data was shown to be beneficial even after a large number of WP data were received, leading to overall G-means up to 18.5% better than those of WP classifiers.

Download

A novel automated approach for software effort estimation based on data augmentation

Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2018

Download

Software Effort Interval Prediction via Bayesian Inference and Synthetic Bootstrap Resampling

ACM Transactions on Software Engineering and Methodology, 2019

Software effort estimation (SEE) usually suffers from inherent uncertainty arising from predictiv... more Software effort estimation (SEE) usually suffers from inherent uncertainty arising from predictive model limitations and data noise. Relying on point estimation only may ignore the uncertain factors and lead project managers (PMs) to wrong decision making. Prediction intervals (PIs) with confidence levels (CLs) present a more reasonable representation of reality, potentially helping PMs to make better-informed decisions and enable more flexibility in these decisions. However, existing methods for PIs either have strong limitations or are unable to provide informative PIs. To develop a “better” effort predictor, we propose a novel PI estimator called Synthetic Bootstrap ensemble of Relevance Vector Machines (SynB-RVM) that adopts Bootstrap resampling to produce multiple RVM models based on modified training bags whose replicated data projects are replaced by their synthetic counterparts. We then provide three ways to assemble those RVM models into a final probabilistic effort predict...

Download

The potential benefit of relevance vector machine to software effort estimation

Proceedings of the 10th International Conference on Predictive Models in Software Engineering - PROMISE '14, 2014

Where a licence is displayed above, please note the terms and conditions of the licence govern yo... more Where a licence is displayed above, please note the terms and conditions of the licence govern your use of this document. When citing, please reference the published version. Take down policy While the University of Birmingham exercises care and attention in making items available there are rare occasions when an item has been uploaded in error or has been deemed to be commercially or otherwise sensitive.

Download

The impact of parameter tuning on software effort estimation using learning machines

Proceedings of the 9th International Conference on Predictive Models in Software Engineering, 2013

Background: The use of machine learning approaches for software effort estimation (SEE) has been ... more Background: The use of machine learning approaches for software effort estimation (SEE) has been studied for more than a decade. Most studies performed comparisons of different learning machines on a number of data sets. However, most learning machines have more than one parameter that needs to be tuned, and it is unknown to what extent parameter settings may affect their performance in SEE. Many works seem to make an implicit assumption that parameter settings would not change the outcomes significantly. Aims: To investigate to what extent parameter settings affect the performance of learning machines in SEE, and what learning machines are more sensitive to their parameters. Method: Considering an online learning scenario where learning machines are updated with new projects as they become available, systematic experiments were performed using five learning machines under several different parameter settings on three data sets. Results: While some learning machines such as bagging using regression trees were not so sensitive to parameter settings, others such as multilayer perceptrons were affected dramatically. Combining learning machines into bagging ensembles helped making them more robust against different parameter settings. The average performance of k-NN across different projects was not so much affected by different parameter settings, but the parameter settings that obtained the best average performance across time steps were not so consistently the best throughout time steps as in the other approaches. Conclusions: Learning machines that are more/less sensitive to different parameter settings were identified. The different sensitivity obtained by different learning machines shows that sensitivity to parameters should be considered as one of the criteria for evaluation of SEE approaches. A good learning machine for SEE is not only one which is able to achieve superior performance, but also one that is either less dependent on parameter settings or to which good parameter choices are easy to make.

Download

A Region Enhanced Discrete Multi-Objective Fireworks Algorithm for Low-Carbon Vehicle Routing Problem

Complex System Modeling and Simulation

A constrained multi-objective optimization model for the low-carbon vehicle routing problem (VRP)... more A constrained multi-objective optimization model for the low-carbon vehicle routing problem (VRP) is established. A carbon emission measurement method considering various practical factors is introduced. It minimizes both the total carbon emissions and the longest time consumed by the sub-tours, subject to the limited number of available vehicles. According to the characteristics of the model, a region enhanced discrete multi-objective fireworks algorithm is proposed. A partial mapping explosion operator, a hybrid mutation for adjusting the sub-tours, and an objective-driven extending search are designed, which aim to improve the convergence, diversity, and spread of the non-dominated solutions produced by the algorithm, respectively. Nine low-carbon VRP instances with different scales are used to verify the effectiveness of the new strategies. Furthermore, comparison results with four state-of-the-art algorithms indicate that the proposed algorithm has better performance of convergence and distribution on the low-carbon VRP. It provides a promising scalability to the problem size.

Download

An Investigation of Cross-Project Learning in Online Just-In-Time Software Defect Prediction

Conference: International Conference on Software Engineering (ICSE), 2020

Download

Liyan Song

Uploads

Papers by Liyan Song

Log In