By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email
No need to pay just yet!
About this sample
About this sample
Words: 924 |
Pages: 6|
5 min read
Published: Feb 13, 2024
Words: 924|Pages: 6|5 min read
Published: Feb 13, 2024
Chronic kidney disease (CKD) is a condition where the kidneys gradually lose function over time. It can lead to heart problems and eventually end-stage renal disease (ESRD). The prevalence of CKD is roughly 800 per million people. In this paper, we explored using Machine Learning to predict CKD. We compared seven different algorithms. We started with 24 parameters plus the class attribute, using 25% of the data for testing. We evaluated the data using fivefold cross-validation and assessed the system's performance with classification accuracy, confusion matrix, specificity, and sensitivity.
CKD is a lasting reduction in kidney function that can progress to ESRD, which requires either ongoing dialysis or a kidney transplant to sustain life. CKD also affects how many medications are eliminated from the body. In routine practice, a lab serum creatinine value is used to estimate kidney function by incorporating it into a formula to estimate the glomerular filtration rate and establish whether a patient has CKD. It's becoming a major threat in developing and undeveloped countries, mainly due to diseases like diabetes and high blood pressure. Other risk factors include heart disease, obesity, and a family history of CKD. Treatments like dialysis or kidney transplants are very costly, so early detection is needed.
In the US, around 117,000 patients developed ESRD requiring dialysis in 2013, with more than 663,000 on dialysis. In 2012, 5.6% of the total medical budget was spent on ERDS, about $28 billion. In India, CKD is widespread, with 800 per million populations and ESRD at 150-200 per million populations. We considered seven machine learning classifiers: Logistic Regression, Support Vector Machine, K-nearest Neighbour, Naïve Bayes, Stochastic Gradient Descent classifier, Decision Trees, and Random Forest. We used standard performance metrics to design the computer-aided diagnosis system for estimating each classifier's performance.
We used a CKD dataset from the UCI machine learning lab. It includes 24 attributes plus a binary class attribute. Out of these, 11 are numerical, two are categorical with five levels, and the rest are binary. In the class attribute, one indicates CKD presence, and zero means CKD is not present. This dataset has 400 instances, with 150 samples without CKD and 250 with CKD. We used 300 instances for training the algorithms and 100 for testing. The attributes include Age, blood pressure, specific gravity, albumin, sugar, red blood cells, pus cell, pus cell clumps, bacteria, blood glucose random, blood urea, serum creatinine, sodium, potassium, hemoglobin, Packed cell volume, White Blood Cell Count, Red Blood Cell Count, hypertension, Diabetes Mellitus, appetite, Pedal Edema, Anaemia, and Class.
It's a performance measurement for classification problems with two or more classes. It shows four different combinations of predicted and actual values.
Predicted Negative | Predicted Positive | |
---|---|---|
Negative cases | TN | FP |
Positive cases | FN | TP |
Table 1: Confusion Matrix (CM)
We also define some evaluation measures:
The seven machine learning algorithms were compared. All techniques were trained and tested by the proposed method. The confusion matrices for each algorithm are shown in Table 2.
Model | Not CKD | CKD |
---|---|---|
Logistic Regression | 38 (TN) 0 (FP) | 0 (FN) 62 (TP) |
Support Vector Machine | 36 (TN) 2 (FP) | 2 (FN) 60 (TP) |
K-Nearest Neighbor | 38 (TN) 0 (FP) | 2 (FN) 60 (TP) |
Naïve Bayes | 38 (TN) 0 (FP) | 3 (FN) 59 (TP) |
Stochastic Gradient Descent | 38 (TN) 0 (FP) | 0 (FN) 62 (TP) |
Decision Trees | 38 (TN) 0 (FP) | 3 (FN) 59 (TP) |
Random Forest | 38 (TN) 0 (FP) | 0 (FN) 62 (TP) |
Table 2: Confusion matrices of all the algorithms.
Figure 2 shows the average accuracy of the classifiers. From the results, Logistic Regression, Random Forest, and SGD Classifier give the highest accuracy (100%). Decision Tree, SVM classifier, Naive Bayes, and KNN have average accuracies of 97.0%, 96.0%, 97.0%, and 97.0%, respectively. The accuracy of each class is vital because incorrect predictions can harm the patient. Therefore, sensitivity and specificity values are used to evaluate the proposed methods.
We trained seven different machine learning models to predict CKD. Logistic Regression, SGD Classifier, and Random Forest provided the best results, surpassing other classifiers in accurately detecting CKD. If these models are trained with a varied and extensive range of attributes, they may result in even more accurate predictions. Increasing the dataset would also provide more assurance. Hospitals and diagnostic centers could use this for faster and digitized analysis in predicting CKD.
Browse our vast selection of original essay samples, each expertly formatted and styled