close
test_template

A Comparative Analysis of Seven Algorithms Using a Comprehensive Dataset

AI-Generated
download print

About this sample

About this sample

close
AI-Generated

Words: 924 |

Pages: 6|

5 min read

Published: Feb 13, 2024

Words: 924|Pages: 6|5 min read

Published: Feb 13, 2024

Sample
Details

Table of contents

  1. Introduction
  2. Methodology
  3. Dataset
  4. Machine Learning Techniques
  5. Evaluation Parameters
  6. Confusion Matrix
  7. Result
  8. Conclusion

Introduction

Chronic kidney disease (CKD) is a condition where the kidneys gradually lose function over time. It can lead to heart problems and eventually end-stage renal disease (ESRD). The prevalence of CKD is roughly 800 per million people. In this paper, we explored using Machine Learning to predict CKD. We compared seven different algorithms. We started with 24 parameters plus the class attribute, using 25% of the data for testing. We evaluated the data using fivefold cross-validation and assessed the system's performance with classification accuracy, confusion matrix, specificity, and sensitivity.

CKD is a lasting reduction in kidney function that can progress to ESRD, which requires either ongoing dialysis or a kidney transplant to sustain life. CKD also affects how many medications are eliminated from the body. In routine practice, a lab serum creatinine value is used to estimate kidney function by incorporating it into a formula to estimate the glomerular filtration rate and establish whether a patient has CKD. It's becoming a major threat in developing and undeveloped countries, mainly due to diseases like diabetes and high blood pressure. Other risk factors include heart disease, obesity, and a family history of CKD. Treatments like dialysis or kidney transplants are very costly, so early detection is needed.

In the US, around 117,000 patients developed ESRD requiring dialysis in 2013, with more than 663,000 on dialysis. In 2012, 5.6% of the total medical budget was spent on ERDS, about $28 billion. In India, CKD is widespread, with 800 per million populations and ESRD at 150-200 per million populations. We considered seven machine learning classifiers: Logistic Regression, Support Vector Machine, K-nearest Neighbour, Naïve Bayes, Stochastic Gradient Descent classifier, Decision Trees, and Random Forest. We used standard performance metrics to design the computer-aided diagnosis system for estimating each classifier's performance.

Methodology

Dataset

We used a CKD dataset from the UCI machine learning lab. It includes 24 attributes plus a binary class attribute. Out of these, 11 are numerical, two are categorical with five levels, and the rest are binary. In the class attribute, one indicates CKD presence, and zero means CKD is not present. This dataset has 400 instances, with 150 samples without CKD and 250 with CKD. We used 300 instances for training the algorithms and 100 for testing. The attributes include Age, blood pressure, specific gravity, albumin, sugar, red blood cells, pus cell, pus cell clumps, bacteria, blood glucose random, blood urea, serum creatinine, sodium, potassium, hemoglobin, Packed cell volume, White Blood Cell Count, Red Blood Cell Count, hypertension, Diabetes Mellitus, appetite, Pedal Edema, Anaemia, and Class.

Machine Learning Techniques

  • Logistic Regression: Logistic Regression (LR) is a linear regression model. It computes the distribution between example X and boolean class label Y by P(X|Y). LR classifies boolean class label Y as follows:
    P(Y=1│X)=1/(1+exp⁡(w0 + ∑i=1n (wi Xi)))
    P(Y=0│X)=1/(1+exp⁡(w0 + ∑i=1n (wi Xi)))
  • Support Vector Machine: SVM is a popular method for predicting the category of data. It finds the optimal hyperplane between data of two classes in the training data by solving an optimization problem.
  • K-Nearest Neighbors: KNN classifies unknown examples by searching the closest data in pattern space. It predicts the class by using the Euclidean distance:
    d(x,y)=√(∑i=1k ((xi-yi)2))
  • Naïve Bayes: Naïve Bayes classifiers are based on Bayes Theorem. Each value is marked independent of the other values, contributing independently to the probability. It uses the concept of Maximum Likelihood for prediction.
  • SGD Classifier: This is a Logistic Regression Classifier based on Stochastic Gradient Descent Optimization. It performs a parameter update for each training example x(i) and label y(i):
    θ = θ - η ∇θ J(θ;xi;yi)
  • Decision Tree: This method includes a structure with a root node, branches, and leaf nodes. It divides the data into classes based on the attribute value found in training samples.
  • Random Forest: RF is an ensemble classifier consisting of a collection of tree-structured classifiers, defined as multiple tree predictors. It uses random selection of input attributes for producing individual base decision trees.

Evaluation Parameters

Confusion Matrix

It's a performance measurement for classification problems with two or more classes. It shows four different combinations of predicted and actual values.

Predicted Negative Predicted Positive
Negative cases TN FP
Positive cases FN TP

Table 1: Confusion Matrix (CM)

We also define some evaluation measures:

  • Accuracy = (TN + TP) / (TN + TP + FN + FP)
  • Recall, Sensitivity = TP / (TP + FN)
  • Specificity = TN / (TN + FP)

Result

The seven machine learning algorithms were compared. All techniques were trained and tested by the proposed method. The confusion matrices for each algorithm are shown in Table 2.

Model Not CKD CKD
Logistic Regression 38 (TN) 0 (FP) 0 (FN) 62 (TP)
Support Vector Machine 36 (TN) 2 (FP) 2 (FN) 60 (TP)
K-Nearest Neighbor 38 (TN) 0 (FP) 2 (FN) 60 (TP)
Naïve Bayes 38 (TN) 0 (FP) 3 (FN) 59 (TP)
Stochastic Gradient Descent 38 (TN) 0 (FP) 0 (FN) 62 (TP)
Decision Trees 38 (TN) 0 (FP) 3 (FN) 59 (TP)
Random Forest 38 (TN) 0 (FP) 0 (FN) 62 (TP)

Table 2: Confusion matrices of all the algorithms.

Figure 2 shows the average accuracy of the classifiers. From the results, Logistic Regression, Random Forest, and SGD Classifier give the highest accuracy (100%). Decision Tree, SVM classifier, Naive Bayes, and KNN have average accuracies of 97.0%, 96.0%, 97.0%, and 97.0%, respectively. The accuracy of each class is vital because incorrect predictions can harm the patient. Therefore, sensitivity and specificity values are used to evaluate the proposed methods.

Get a custom paper now from our expert writers.

Conclusion

We trained seven different machine learning models to predict CKD. Logistic Regression, SGD Classifier, and Random Forest provided the best results, surpassing other classifiers in accurately detecting CKD. If these models are trained with a varied and extensive range of attributes, they may result in even more accurate predictions. Increasing the dataset would also provide more assurance. Hospitals and diagnostic centers could use this for faster and digitized analysis in predicting CKD.

References

  1. R. Xi, N. Lin, and Y. Chen, “Compression and Aggregation for Logistic Regression Analysis in Data Cubes,” IEEE Trans. Knowledge and Data Engineering, vol. 21, pp. 479-492, April 2009.
  2. R. G. Brereton, and G. R. Lloyd, “Support Vector Machines for classification and regression,” Analyst, vol. 135, no. 2, pp. 230-267, 2010.
  3. S. Galit, R. P. Nitin, and C. B. Peter, Data Mining for Business Intelligence: Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner: Wiley Publishing, 2010.
  4. http://www.statsoft.com/textbook/naive-bayes-classifier
  5. Sebastian Ruder, “An overview of gradient descent optimization algorithms” arXiv:1609.04747, June 2017.
  6. J. R. Quinlan, C4.5: programs for machine learning: Morgan Kaufmann Publishers Inc., 1993.
  7. Anbarasi, M.S., & Janani, V. (2017). Ensemble classifier with Random Forest algorithm to deal with imbalanced healthcare data. In International Conference on Information Communication and Embedded Systems (ICICES) (pp. 1–7), Chennai.
Image of Prof. Linda Burke
This essay was reviewed by
Prof. Linda Burke

Cite this Essay

A Comparative Analysis of Seven Algorithms using a Comprehensive Dataset. (2024, February 13). GradesFixer. Retrieved December 8, 2024, from https://gradesfixer.com/free-essay-examples/a-comparative-analysis-of-seven-algorithms-using-a-comprehensive-dataset/
“A Comparative Analysis of Seven Algorithms using a Comprehensive Dataset.” GradesFixer, 13 Feb. 2024, gradesfixer.com/free-essay-examples/a-comparative-analysis-of-seven-algorithms-using-a-comprehensive-dataset/
A Comparative Analysis of Seven Algorithms using a Comprehensive Dataset. [online]. Available at: <https://gradesfixer.com/free-essay-examples/a-comparative-analysis-of-seven-algorithms-using-a-comprehensive-dataset/> [Accessed 8 Dec. 2024].
A Comparative Analysis of Seven Algorithms using a Comprehensive Dataset [Internet]. GradesFixer. 2024 Feb 13 [cited 2024 Dec 8]. Available from: https://gradesfixer.com/free-essay-examples/a-comparative-analysis-of-seven-algorithms-using-a-comprehensive-dataset/
copy
Keep in mind: This sample was shared by another student.
  • 450+ experts on 30 subjects ready to help
  • Custom essay delivered in as few as 3 hours
Write my essay

Still can’t find what you need?

Browse our vast selection of original essay samples, each expertly formatted and styled

close

Where do you want us to send this sample?

    By clicking “Continue”, you agree to our terms of service and privacy policy.

    close

    Be careful. This essay is not unique

    This essay was donated by a student and is likely to have been used and submitted before

    Download this Sample

    Free samples may contain mistakes and not unique parts

    close

    Sorry, we could not paraphrase this essay. Our professional writers can rewrite it and get you a unique paper.

    close

    Thanks!

    Please check your inbox.

    We can write you a custom essay that will follow your exact instructions and meet the deadlines. Let's fix your grades together!

    clock-banner-side

    Get Your
    Personalized Essay in 3 Hours or Less!

    exit-popup-close
    We can help you get a better grade and deliver your task on time!
    • Instructions Followed To The Letter
    • Deadlines Met At Every Stage
    • Unique And Plagiarism Free
    Order your paper now