Regression Analysis and Prediction of Medical Insurance Cost
2022
Sign up for access to the world's latest research
Abstract
Many elements affect the expenses of health insurance and it's far quite a tough project to analyze the sample from those capabilities. We use a regression version to recognize and study a complex sample that enables us to predict the price of medical insurance. In this paper, we used a dataset from Kaggle that consists of 6 capabilities and 1338 instances. We will be taking the assistance of diverse regression models and reading which model’s overall performance is excessive in predicting medical insurance. Regression is a statistical procedure for calculating the cost of a primarily based variable from an impartial variable. Regression measures the affiliation among variables. it's far a modeling method wherein a based variable is predicted primarily based on one or more unbiased variables. Regression evaluation is the maximum widely used of all statistical techniques. This text explains the primary concepts and explains how we can do regression calculations. INTRODUCTION ...
Related papers
International Journal of Environmental Research and Public Health, 2021
The increasing healthcare cost imposes a large economic burden for the Japanese government. Predicting the healthcare cost may be a useful tool for policy making. A database of the area-basis public health insurance of one city was analyzed to predict the medical healthcare cost by the dental healthcare cost with a machine learning strategy. The 30,340 subjects who had continued registration of the area-basis public health insurance of Ebina city during April 2017 to September 2018 were analyzed. The sum of the healthcare cost was JPY 13,548,831,930. The per capita healthcare cost was JPY 446,567. The proportion of medical healthcare cost, medication cost, and dental healthcare cost was 78%, 15%, and 7%, respectively. By the results of the neural network model, the medical healthcare cost proportionally depended on the medical healthcare cost of the previous year. The dental healthcare cost of the previous year had a reducing effect on the medical healthcare cost. However, the effec...
International Journal for Research in Applied Science & Engineering Technology (IJRASET), 2022
Insurance is a policy that helps to cover up all loss or decrease loss in terms of expenses incurred by various risks. A number of variables affect how much insurance costs. These considerations of different factors contribute to the insurance policy cost expression. Machine Learning(ML) in the insurance sector can make insurance more effective. In the domains of computational and applied mathematics the machine learning (ML) is a well-known research area. ML is one of the computational intelligence aspects when it comes to exploitation of historical data that may be addressed in a wide range of applications and systems. There are some limitations in ML so; Predicting medical insurance costs using ML approaches is still a problem in the healthcare industry and thus it requires few more investigation and improvement. Using the machine learning algorithms, this study provides a computational intelligence approach for predicting healthcare insurance costs. The proposed research approach uses Linear Regression, Decision Tree Regression and Gradient Boosting Regression and also streamlit as a framework. We had used a medical insurance cost dataset that was acquired from the KAGGLE repository for the cost prediction purpose, and machine learning methods are used to show the forecasting of insurance costs by regression model comparing their accuracies.
International Journal of Engineering Research and, 2020
In this thesis, we analyse the personal health data to predict insurance amount for individuals. Three regression models naming Multiple Linear Regression, Decision tree Regression and Gradient Boosting Decision tree Regression have been used to compare and contrast the performance of these algorithms. Dataset was used for training the models and that training helped to come up with some predictions. Then the predicted amount was compared with the actual data to test and verify the model. Later the accuracies of these models were compared. It was gathered that multiple linear regression and gradient boosting algorithms performed better than the linear regression and decision tree. Gradient boosting is best suited in this case because it takes much less computational time to achieve the same performance metric, though its performance is comparable to multiple regression.
arXiv (Cornell University), 2023
Predictive modeling in healthcare continues to be an active actuarial research topic as more insurance companies aim to maximize the potential of Machine Learning (ML) approaches to increase their productivity and efficiency. In this paper, the authors deployed three regressionbased ensemble ML models that combine variations of decision trees through Extreme Gradient Boosting (XGBoost), Gradient-boosting Machine (GBM), and Random Forest (RF) methods in predicting medical insurance costs. Explainable Artificial Intelligence (XAi) methods SHapley Additive exPlanations (SHAP) and Individual Conditional Expectation (ICE) plots were deployed to discover and explain the key determinant factors that influence medical insurance premium prices in the dataset. The dataset used comprised 986 records and is publicly available in the KAGGLE repository. The models were evaluated using four performance evaluation metrics, including R-squared (R 2), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE). The results show that all models produced impressive outcomes; however, the XGBoost model achieved a better overall performance although it also expanded more computational resources, while the RF model recorded a lesser prediction error and consumed far fewer computing resources than the XGBoost model. Furthermore, we compared the outcome of both XAi methods in identifying the key determinant features that influenced the PremiumPrices for each model and whereas both XAi methods produced similar outcomes, we found that the ICE plots showed in more detail the interactions between each variable than the SHAP analysis which seemed to be more high-level. It is the aim of the authors that the contributions of this study will help policymakers, insurers, and potential medical insurance buyers in their decision-making process for selecting the right policies that meet their specific needs.
Proceedings
People’s health care cost prediction is nowadays a valuable tool to improve accountability in health care. In this work, we study if an interpretable method can reach the performance of black-box methods for the problem of predicting health care costs. We present an interpretable regression method based on the Dempster-Shafer theory, using the Evidence Regression model and a discount function based on the contribution of each dimension. Optimal parameters are learned using gradient descent. The k-nearest neighbors’ algorithm was also used to speed up computations. With the transparency of the evidence regression model, it is possible to create a set of rules based on a patient’s vicinity. When making a prediction, the model gives a set of rules for such a result. We used Japanese health records from Tsuyama Chuo Hospital to test our method, which includes medical checkups, exam results, and billing information from 2016 to 2017. We compared our model to an Artificial Neural Network ...
Editora Científica Digital eBooks, 2022
Machine learning projects have been providing a better patient experience in care services. Healthcare has many issues and the cost of it is an essential indicator for insurance providers. In this context, the purpose of the present paper is to offer a comparison of machine learning approaches for the prediction of American medical insurance cost provided by Kaggle community with 1.338 instances. The focus did not consist of winning any competition, but developing a preliminary investigation of algorithms' performance assessment. Linear Regression regularizations were compared with more sophisticated algorithms, KNR, SVR, Simple Tree, Random Forest and xgboost in terms of accuracy with R² and RMSE along with computational time. Linear Regression and its regularizations presented a good accuracy with a five-fold cross validation. However, gridsearchcv selection for best parameters achieved superior performance for more advanced algorithms, except Support Vector Machine that did not exhibit competitive accuracy. Computational time revealed to be an interesting assessment and depending on the organizational context, simple tree, R² 0.88, would occasionally overcome the others, since it had a competitive computational time comparing to xgboost and Random Forest, the ones with the highest accuracy. The present study has contributed on proving machine learning value for health insurance price prediction and the importance of applying comparative performance metrics for the algorithms not only in accuracy, but also in computational time.
Technology in Society, 2020
Covid-19 Short Commentaries In recognition of the swift and negative impact of Covid-19 upon innovation, entrepreneurial behaviour and SME performance we have instigated a series of short commentaries to reflect upon the potential implications of the crisis. These are published as notes critically evaluating issues covering finance, pivoting, policy issues, gender, innovation-to name but a few-using considered arguments to generate informed speculation upon how the rapid, and largely negative effects upon contemporary markets, will impact upon entrepreneurs and their ventures. These commentaries will be offered as free downloads on the website in Online First with the first three included in the August issue of the journal. This commentary by Brown et al., critically evaluates the impact of the virus upon seedcorn finance for nascent entrepreneurial ventures. Further commentaries will be published on Online First as they become available and in hard copy in future issues of the journal. We hope our readership find them informative in guiding contemporary debate and fuelling future research.
Operations Research, 2008
The rising cost of health care is one of the world's most important problems. Accordingly, predicting such costs with accuracy is a significant first step in addressing this problem. Since the 1980s, there has been research on the predictive modeling of medical costs based on (health insurance) claims data using heuristic rules and regression methods. These methods, however, have not been appropriately validated using populations that the methods have not seen. We utilize modern data-mining methods, specifically classification trees and clustering algorithms, along with claims data from over 800,000 insured individuals over three years, to provide rigorously validated predictions of health-care costs in the third year, based on medical and cost data from the first two years. We quantify the accuracy of our predictions using unseen (out-of-sample) data from over 200,000 members. The key findings are: (a) our data-mining methods provide accurate predictions of medical costs and represent a powerful tool for prediction of health-care costs, (b) the pattern of past cost data is a strong predictor of future costs, and (c) medical information only contributes to accurate prediction of medical costs of high-cost members.
Journal of Healthcare Engineering, 2022
Medical costs are one of the most common recurring expenses in a person’s life. Based on different research studies, BMI, ageing, smoking, and other factors are all related to greater personal medical care costs. The estimates of the expenditures of health care related to obesity are needed to help create cost-effective obesity prevention strategies. Obesity prevention at a young age is a top concern in global health, clinical practice, and public health. To avoid these restrictions, genetic variants are employed as instrumental variables in this research. Using statistics from public huge datasets, the impact of body mass index (BMI) on overall healthcare expenses is predicted. A multiview learning architecture can be used to leverage BMI information in records, including diagnostic texts, diagnostic IDs, and patient traits. A hierarchy perception structure was suggested to choose significant words, health checks, and diagnoses for training phase informative data representations, b...
Lecture Notes in Networks and Systems, 2019
Among Cigna's claimant population with at least one year of continuous medical or pharmacy eligibility over 2014-2015 (N = 2.7 million), our objective was to accurately identify high-cost claimants and identify clinical and demographic cost drivers among individuals with commercial health plan benefits. High-cost claimants were defined as those with annual costs over $100,000. We collected 800+ potential risk factors and utilized multivariable weighted logistic regression on an oversampled model dataset to estimate odds ratios for clinical and demographic factors available in claims data. We used decision tree methodology to assist in variable selection/reduction. High-cost claimants (n = 17,702) comprised only 0.6% of the 2015-2016 population, but accounted for over 20% of 2015-2016 total costs. Our optimized maximum likelihood estimation model identified cost drivers including: actuarial prospective episode-related group (ERG) risk score, prior-year medical claim costs, prior-year pharmacy claim costs, gaps in care/noncompliance score, hemophilia, short stature, and end-stage renal disease. Our findings show that weighted logistic regression modeling with oversampling techniques can be used to identify high-cost claimants in the upcoming year more accurately than traditional maximum likelihood estimation. Managed care decision makers should use prospective claims data analyses to target and implement intervention programs, with the goal of managing care among those at risk for incurring catastrophic costs.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (7)
- What is ridge regression from My great learning https://www.mygreatlearning.com/blog/what-is-ridge-regression/
- Linear regression in machine learning from great learning https://www.mygreatlearning.com/blog/linear-regression-in- machine-learning/
- Machine learning ridge regression using sklearn from geeksforgeeks https://www.geeksforgeeks.org/ml-ridge- regressor-using-sklearn/
- Five types of health insurance plan from acko articles https://www.acko.com/articles/health-insurance/5-types-of- health-insurance-plan-in-india/
- K Swathi and R Anuradha (2017), Health insurance in India-An overview.
- Suman Devi and Dr. Vazir Singh Nehra (2015), The problems with health insurance sector in India.12. Shatakshi Chatterjee, Dr. ArunangshuGiri, Dr. S.N. Bandyopadhyay (2018), Health insurance sector in India: A study.
- Types of health insurance from reliance general https://www.reliancegeneral.co.in/Insurance/Knowledge- Center/Insurance-Reads/Types-Of-Health-Insurance-Covers.Aspx