Academia.eduAcademia.edu

Random Forest Regression

description95 papers
group43 followers
lightbulbAbout this topic
Random Forest Regression is an ensemble learning method that utilizes multiple decision trees to predict continuous outcomes. It aggregates the predictions of individual trees to improve accuracy and control overfitting, making it robust against noise and capable of handling large datasets with complex relationships.
lightbulbAbout this topic
Random Forest Regression is an ensemble learning method that utilizes multiple decision trees to predict continuous outcomes. It aggregates the predictions of individual trees to improve accuracy and control overfitting, making it robust against noise and capable of handling large datasets with complex relationships.

Key research themes

1. How can variable selection and feature importance methods enhance Random Forest Regression accuracy and interpretability in high-dimensional or complex data?

This research area focuses on identifying optimal variable subsets and measuring feature importance in Random Forest Regression (RFR) models to improve prediction accuracy and interpretability, especially when dealing with high-dimensional data or when predictor variables are correlated, categorical, or of mixed types. Understanding and managing variable selection reduces noise, limits bias, and enhances model generalization.

Key finding: This study presents a stepwise Random Forest (SRF) variable selection method that outperforms standard variable selection techniques such as Boruta, VSURF, and linear stepwise regression for predicting forest growing stem... Read more
Key finding: Introduces the Intervention in Prediction Measure (IPM), a novel variable importance measure for Random Forests independent of prediction performance metrics and adaptable to multivariate responses. IPM, based on tree... Read more
Key finding: Proposes xRF, an improved Random Forest algorithm incorporating unbiased feature sampling by separating informative from uninformative features using p-value and chi-square tests before splitting. This approach addresses bias... Read more
Key finding: Demonstrates that pre-estimation dimension reduction (targeting) via supervised variable pre-selection (e.g., LASSO) enhances Random Forest Regression performance by increasing the probability of splits on strong predictors,... Read more
Key finding: Develops a methodology to select significant variables within Random Forest classification models applied to chemical data (NMR spectra) to interpret the influence of variables on maximum pour point (MPP) of crude oil. Using... Read more

2. How can Random Forest Regression be adapted or combined with ensemble and optimization techniques to improve predictive speed and accuracy in real-world applications?

This theme investigates the development of ensemble variants, pruning methods, and hybrid frameworks of Random Forest Regression (RFR) to optimize computational efficiency and maintain or improve predictive accuracy. It addresses practical constraints in healthcare, environmental modeling, manufacturing, and sensor-based systems where faster inference or better generalized performance is critical.

Key finding: Introduces CLUB-DRF, a pruned Random Forest ensemble that clusters similar trees to inject diversity and select representatives, resulting in a substantially smaller model (over 92% pruning) with equivalent or improved... Read more
Key finding: Proposes a novel soft sensor combining Random Forest Regression (RFR) and Partial Least Squares (PLS) for dynamic process modeling, showing improved one-step-ahead prediction accuracy and stability over traditional models... Read more
Key finding: Develops RFR models to surrogate computational fluid dynamics (CFD) simulations of turbulent flow characteristics in curved pipes, producing significant computational cost reductions while maintaining high prediction accuracy... Read more
Key finding: Combines Random Forest with Random Search optimization (RS-RF) to predict soil erosion status, achieving improved classification metrics (accuracy, MCC, F1-score) by optimizing RF hyperparameters via metaheuristics. The study... Read more
Key finding: Presents hybrid models coupling Random Forest Regression with metaheuristic optimizers FDA, GJO, and GTO for predicting maximum dry density of soil. The RFGJ (RF with GJO optimizer) model achieved the highest R² (0.9966) and... Read more

3. What are the practical applications of Random Forest Regression in diverse domains such as healthcare, environmental monitoring, remote sensing, and manufacturing for accurate and interpretable predictive modeling?

This theme highlights the application of Random Forest Regression (RFR) in real-world, domain-specific problems where accurate prediction and model interpretability are necessary. It surveys how RFR supports decision-making in healthcare systems, hydrology, mobile network performance, manufacturing machining processes, and ecological modeling, showing its adaptability across multidisciplinary datasets.

Key finding: Demonstrates that Random Forest Regression, trained on remotely sensed data, can predict streamflow in a snowmelt-dominated mountainous watershed with higher accuracy and less calibration effort compared to the Soil and Water... Read more
Key finding: Develops an IoT-enabled healthcare system using sensors connected via Raspberry Pi and employing Random Forest Regression models to accurately predict physiological measures such as heart rate and blood pressure. The system... Read more
Key finding: Applies multiple machine learning models including Random Forest Regression to predict mobile network throughput performance (downlink bit rate) based on cellular network parameters. The Random Forest model obtained highest... Read more
Key finding: Assesses Random Forest Regression performance on count data sets with overdispersion, comparing it with classical count-based generalized linear models. Results indicate RF achieves comparable or better predictive accuracy,... Read more
Key finding: Evaluates machine learning algorithms including Random Forest Regression to model the relationship between machining process parameters and surface roughness. RFR's ensemble learning capability effectively captures... Read more

All papers in Random Forest Regression

One of the problems with new medications is their poor water solubility that is possible to be addressed by using supercritical method. This study aims to predict the solubility of raloxifene and the density of supercritical CO 2 using... more
The integration of deep learning techniques into smart power systems has gained significant attention due to their potential to optimize energy management, enhance grid reliability, and enable efficient utilization of renewable energy... more
Machine learning plays a major role from past years in image detection, spam reorganization, normal speech command, product recommendation and medical diagnosis. Present machine learning algorithm helps us in enhancing security alerts,... more
This research investigates the predictive power of pre-release metadata in forecasting IMDb movie ratings. Unlike prior studies that rely on post-release variables such as box office income or audience reviews, our approach focuses... more
This study provides an in-depth study advanced machining processes and their optimization using various machine learning algorithms. The study focuses on key machining parameters such as cutting speed (m/min), feed rate (mm/rev), and... more
ABSTRACTIt is generally agreed that models that better simulate historical and current features of climate should also be the ones that more reliably simulate future climate. This article describes the ability of a selection of global... more
In this work, we propose a novel boosting-based machine learning algorithm called EvoBoost, invented by Sudip Barua. Gradient boosting has emerged as a cornerstone technique in machine learning, achieving state-of-the-art performance in... more
Il presente report illustra l'applicazione del modello Random Forest Regressor per la previsione dei livelli idrici di cinque importanti bacini lacustri: il Lago Tana, il Lago Kainji, il Lago Nasser e i laghi Victoria e Turkana.... more
Il presente report illustra l'applicazione del modello Random Forest Regressor per la previsione dei livelli idrici di cinque importanti bacini lacustri: il Lago Tana, il Lago Kainji, il Lago Nasser e i laghi Victoria e Turkana.... more
Appropriate selection of gridded precipitation data is very important for the region where long-term precipitation observations are not available. An approach based on compromise programing (CP) is proposed to select the gridded... more
Planning of crops for the next season has been a tedious task for the farmers as it is a difficult prediction about metrics of prices that their crop will fetch in a particular season which will be typically based on dynamic weather... more
Compression is the method of compressing soil by decreasing the amount of air space within it. The level of compression needed for a particular soil is determined by its dry density, which reaches its highest point when the soil has the... more
Mobile network management and drive tests provide services that give a clear insight into the quality of mobile network coverage and other wireless networks including identifying areas of poor signal quality and identification of black... more
Forest fires are a significant environmental hazard with increasing frequency due to climate change. Early predictions and mitigation are essential for minimizing the damage caused by these fires. In this paper, we develop a... more
Exploiting Renewable energy to the maximum extent possible in an electric vehicle charging station (EVCS) is the key in supporting the anticipated carbon reduction from the electric vehicles (EVs). Knowing the expected load and the solar... more
Progress in sensor technologies has allowed real-time monitoring of soil water. It is a challenge to model soil water content based on remote sensing data. Here, we retrieved and modeled surface soil moisture (SSM) at the U.S. Climate... more
Energy is one of the most critical and costly resources, playing a vital role in our daily lives. As technology advances, the demand for energy also increases. This work proposes a model for predicting energy consumption in smart homes,... more
To predict physiological indicators, such as heart rate, blood pressure, and body heat sensors, this study develops an internet of things (IoT)-based healthcare approach performing on random forest regression models and mean square error... more
In this paper, a digital twin of the network of heating systems for smart cities is developed using the example of the city of Almaty. The study used machine learning algorithms to estimate future thermal energy consumption and develop... more
Death investigations often include an effort to establish the postmortem interval (PMI) in cases in which the time of death is uncertain. The postmortem interval can lead to the identification of the deceased and the validation of witness... more
A change-point (piecewise linear regression) model fitted to the pre-retrofit data as the counterfactual for the savings calculation, is considered to be the best approach to evaluating the energy savings of building retrofits (ASHRAE... more
A change-point (piecewise linear regression) model fitted to the pre-retrofit data as the counterfactual for the savings calculation, is considered to be the best approach to evaluating the energy savings of building retrofits (ASHRAE... more
A change-point (piecewise linear regression) model fitted to the pre-retrofit data as the counterfactual for the savings calculation, is considered to be the best approach to evaluating the energy savings of building retrofits (ASHRAE... more
Drivers spend an enormous amount of time searching for parking spots every year. Waste of time, emission of carbon and air pollution have been issues in hunting for parking spots without proper prediction. In this paper, we have proposed... more
Storing carbon (C) within soils is significant for maintaining soil-health and reinforces the feedback loop of C loss from soils as CO2 to the atmosphere. Seasonal variation with increased temperatures and inconsistent precipitation as... more
Cars have become a necessity in this modern world. Every middle class family needs a vehicle or a mode of transport in order to move from one place to another. Not everyone is able to afford a new vehicle as they are costly and there’s an... more
Drivers spend an enormous amount of time searching for parking spots every year. Waste of time, emission of carbon and air pollution have been issues in hunting for parking spots without proper prediction. In this paper, we have proposed... more
The imaging spectroscopy mission EnMAP aims to assess the state and evolution of terrestrial and aquatic ecosystems, examine the multifaceted impacts of human activities, and support a sustainable use of natural resources. Once in... more
Wind energy is one of the sustainable and clean energy resources that are uncertain because of its high fluctuation and stochastic volatility. Its uncertainty follows wind speed and occurs at multiple timescales. This fact necessitates... more
Possible changes in rainfall extremes in Peninsular Malaysia were assessed in this study using an ensemble of four GCMs of CMIP5. The performance of four bias correction methods was compared, and the most suitable method was used for... more
x 2.3.4 Rising Temperature and Heat Waves in South Asia 2.3.5 Extreme Temperature and Heat Waves in Pakistan 2.3.6 Analysis of Temperature Trends 2.3.6.1 Non-Parametric Methods of Trend Analysis 2.3.6.2 Trends Under the Hypothesis of... more
Possible changes in rainfall extremes in Peninsular Malaysia were assessed in this study using an ensemble of four GCMs of CMIP5. The performance of four bias correction methods was compared, and the most suitable method was used for... more
Possible changes in rainfall extremes in Peninsular Malaysia were assessed in this study using an ensemble of four GCMs of CMIP5. The performance of four bias correction methods was compared, and the most suitable method was used for... more
Rice is the primary staple food source for Indonesian people, with consumption increasing so that rice production needs to be increased. Rice drought is one of the problems that can hamper rice production. This research aims to determine... more
Possible changes in rainfall extremes in Peninsular Malaysia were assessed in this study using an ensemble of four GCMs of CMIP5. The performance of four bias correction methods was compared, and the most suitable method was used for... more
Dirilis oleh WHO, penyakit jantung merupakan salah satu dari penyakit yang paling mematikan di dunia. Penyakit jantung biasanya diidap oleh seseorang yang berumur 30-70 tahun. Penyebabnya bisa bermacam-macam antara lain obesitas, tekanan... more
Drought is a slow developing phenomenon that accumulates over period and affecting various sectors. It is one of natural hazards that occurs each year, particularly in Indonesia over Australian Monsoon period. During drought event,... more
Many elements affect the expenses of health insurance and it's far quite a tough project to analyze the sample from those capabilities. We use a regression version to recognize and study a complex sample that enables us to predict the... more
Machine learning plays a major role from past years in image detection, spam reorganization, normal speech command, product recommendation and medical diagnosis. Present machine learning algorithm helps us in enhancing security alerts,... more
This paper proposes a fully automated atlas-based pancreas segmentation method from CT volumes utilizing atlas localization by regression forest and atlas generation using blood vessel information. Previous probabilistic atlas-based... more
This paper presents a fully automated atlas-based pancreas segmentation method from CT volumes utilizing 3D fully convolutional network (FCN) feature-based pancreas localization. Segmentation of the pancreas is difficult because it has... more
This paper presents a colon deformation estimation method, which can be used to estimate colon deformations during colonoscope insertions. Colonoscope tracking or navigation system that navigates a physician to polyp positions during a... more
Rice is the primary staple food source for Indonesian people, with consumption increasing so that rice production needs to be increased. Rice drought is one of the problems that can hamper rice production. This research aims to determine... more
Human activity has dramatically altered the environment around the world. Most of these modifications occurred in natural habitats due to human activities. On the other hand, the variations in climatic conditions required to sustain... more
Land use classification is the basis for making further policy in many fields including agriculture. Effective methods in landuse/landcover (LULC) classification are essential for later application in policy making. The development of... more
Land use classification is the basis for making further policy in many fields including agriculture. Effective methods in landuse/landcover (LULC) classification are essential for later application in policy making. The development of... more
Background and Aims: Storing carbon (C) within soils is significant for maintaining soil-health and reinforces the feedback loop of C loss from soils as CO 2 to the atmosphere. Seasonal variation with increased temperatures and... more
Download research papers for free!