MetaBags: Bagged Meta-Decision Trees for Regression

Ammar Shaker

doi:10.1007/978-3-030-10925-7_39

Outline

MetaBags: Bagged Meta-Decision Trees for Regression

Ammar Shaker

2019, Lecture Notes in Computer Science

https://doi.org/10.1007/978-3-030-10925-7_39

visibility

…

description

11 pages

link

1 file

Abstract

Ensembles are popular methods for solving practical supervised learning problems. They reduce the risk of having underperforming models in production-grade software. Although critical, methods for learning heterogeneous regression ensembles have not been proposed at large scale, whereas in classical ML literature, stacking, cascading and voting are mostly restricted to classification problems. Regression poses distinct learning challenges that may result in poor performance, even when using well established homogeneous ensemble schemas such as bagging or boosting. In this paper, we introduce MetaBags, a novel, practically useful stacking framework for regression. MetaBags is a meta-learning algorithm that learns a set of meta-decision trees designed to select one base model (i.e. expert) for each query, and focuses on inductive bias reduction. A set of meta-decision trees are learned using different types of meta-features, specially created for this purpose. Each meta-decision tree is learned on a different data bootstrap sample, and, given a new example, selects a suitable base model that computes a prediction. Finally, these predictions are aggregated into a single prediction. This procedure is designed to learn a model with a fair bias-variance trade-off, and its improvement over base model performance is correlated with the prediction diversity of different experts on specific input space subregions. The proposed method and meta-features are designed in such a way that they enable good predictive performance even in subregions of space which are not adequately represented in the available training data. An exhaustive empirical testing of the method was performed, evaluating both generalization error and scalability of the approach on synthetic, open and real-world application datasets. The obtained results show that our method significantly outperforms existing state-of-the-art approaches.

References (37)

R. Bell and Y. Koren. 2007. Lessons from the Netflix prize challenge. Acm Sigkdd Explorations Newsletter 9, 2 (2007), 75-79.
A. Beygelzimer, S. Kakade, and J. Langford. 2006. Cover trees for nearest neighbor. In Proceedings of the 23rd ICML. ACM, 97-104.
P. Brazdil, C. Carrier, C. Soares, and R. Vilalta. 2008. Metalearning: Applications to data mining. Springer.
L. Breiman. 1996. Bagging predictors. Machine learning 24, 2 (1996), 123-140.
Leo Breiman. 1996. Stacked regressions. Machine learning 24, 1 (1996), 49-64.
L. Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5-32.
L Breiman, JH Friedman, RA Olshen, and CJ Stone. 1984. Classification and regression trees (CART) Wadsworth International Group. Belmont, CA, USA (1984).
Gavin Brown, Jeremy L Wyatt, and Peter Tiňo. 2005. Managing diversity in regression ensembles. Journal of machine learning research 6, Sep (2005), 1621- 1650.
V. Chandola, A. Banerjee, and V. Kumar. 2009. Anomaly detection: A survey. ACM computing surveys (CSUR) 41, 3 (2009), 15.
T. Cover and P. Hart. 1967. Nearest neighbor pattern classification. IEEE transac- tions on information theory 13, 1 (1967), 21-27.
H. Drucker, C. Burges, L. Kaufman, A. Smola, and V. Vapnik. 1997. Support vector regression machines. NIPS (1997), 155-161.
M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter. 2015. Efficient and robust automated machine learning. In NIPS. 2962-2970.
M. Feurer, J. Springenberg, and F. Hutter. 2015. Initializing Bayesian Hyperpa- rameter Optimization via Meta-Learning.. In AAAI. 1128-1135.
Y. Freund and R. Schapire. 1997. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. System Sci. 55, 1 (1997), 119-139.
J. Friedman. 1991. Multivariate adaptive regression splines. The annals of statistics (1991), 1-67.
J. Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189-1232.
J. Friedman and Werner Stuetzle. 1981. Projection pursuit regression. Journal of the American statistical Association 76, 376 (1981), 817-823.
J. Gama and P. Brazdil. 2000. Cascade Generalization. Machine Learning 41, 3 (2000), 315-343.
S. Hassan, L. Moreira-Matias, J. Khiari, and O. Cats. 2016. Feature Selection Issues in Long-Term Travel Time Prediction. Springer, 98-109.
T. Hastie and R. Tibshirani. 1987. Generalized additive models: some applications. J. Amer. Statist. Assoc. 82, 398 (1987), 371-386.
J. Kiefer. 1953. Sequential minimax search for a maximum. Proceedings of the American mathematical society 4, 3 (1953), 502-506.
C. Lemke, M. Budka, and B. Gabrys. 2015. Metalearning: a survey of trends and technologies. Artificial intelligence review 44, 1 (2015), 117-130.
J. Mendes-Moreira, C. Soares, A. Jorge, and J. Sousa. 2012. Ensemble approaches for regression: A survey. ACM Computing Surveys (CSUR) 45, 1 (2012), 10.
C. Merz. 1996. Dynamical Selection of Learning Algorithms. 281-290.
L. Moreira-Matias, J. Mendes-Moreira, J. Freire de Sousa, and J. Gama. 2015. On Improving Mass Transit Operations by using AVL-based Systems: A Survey. IEEE Transactions on Intelligent Transportation Systems 16, 4 (2015), 1636-1653.
B. Pfahringer, H. Bensusan, and C. Giraud-Carrier. 2000. Meta-Learning by Landmarking Various Learning Algorithms.. In ICML. 743-750.
C. Schaffer. 1994. A conservation law for generalization performance. In Machine Learning Proceedings 1994. Elsevier, 259-265.
R. Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) (1996), 267-288.
L. Todorovski and S. Dzeroski. 2003. Combining classifiers with meta decision trees. Machine learning 50, 3 (2003), 223-249.
A. Tsymbal, M. Pechenizkiy, and P. Cunningham. 2006. Dynamic integration with random forests. In European conference on machine learning. Springer, 801-808.
D. Wolpert. 1992. Stacked generalization. Neural networks 5, 2 (1992), 241-259.
Table 3: Detailed predictive performance results, comparing base learners vs. MetaBags (top) and SoA methods in model integration vs. MetaBags -including variations (bottom). The results reported on the average and (std. error) of RMSE. The last rows depict the wins and losses based on the two-sample t-test with the significance level. Dataset SVR PPR RF GB MetaBags R11 232.23(5.2)
Delta A. 3.2e -4 (1.2e -4 ) 3.6e -4 (2.4e -4 ) 4.2e -4 (2.3e -4 ) 2.5e -4 (1.8e -4 ) 2.2e -4 (9.9e -5 )
Physic.
R. Wine
Physic.
R. Wine

MetaBags: Bagged Meta-Decision Trees for Regression

Sign up for access to the world's latest research

Abstract

Related papers

References (37)

Related papers

Related topics

Cited by