Home — Essay Samples — Business — Comparative Analysis — A Comparative Analysis of Seven Algorithms using a Comprehensive Dataset

A Comparative Analysis of Seven Algorithms Using a Comprehensive Dataset

Categories: Comparative Analysis

AI-Generated

About this sample

AI-Generated

Words: 924 |

Pages: 6|

5 min read

Published: Feb 13, 2024

Words: 924|Pages: 6|5 min read

Published: Feb 13, 2024

Sample

Details

Introduction
Methodology
Dataset
Machine Learning Techniques
Evaluation Parameters
Confusion Matrix
Result
Conclusion

Introduction

Chronic kidney disease (CKD) is a condition where the kidneys gradually lose function over time. It can lead to heart problems and eventually end-stage renal disease (ESRD). The prevalence of CKD is roughly 800 per million people. In this paper, we explored using Machine Learning to predict CKD. We compared seven different algorithms. We started with 24 parameters plus the class attribute, using 25% of the data for testing. We evaluated the data using fivefold cross-validation and assessed the system's performance with classification accuracy, confusion matrix, specificity, and sensitivity.

CKD is a lasting reduction in kidney function that can progress to ESRD, which requires either ongoing dialysis or a kidney transplant to sustain life. CKD also affects how many medications are eliminated from the body. In routine practice, a lab serum creatinine value is used to estimate kidney function by incorporating it into a formula to estimate the glomerular filtration rate and establish whether a patient has CKD. It's becoming a major threat in developing and undeveloped countries, mainly due to diseases like diabetes and high blood pressure. Other risk factors include heart disease, obesity, and a family history of CKD. Treatments like dialysis or kidney transplants are very costly, so early detection is needed.

In the US, around 117,000 patients developed ESRD requiring dialysis in 2013, with more than 663,000 on dialysis. In 2012, 5.6% of the total medical budget was spent on ERDS, about $28 billion. In India, CKD is widespread, with 800 per million populations and ESRD at 150-200 per million populations. We considered seven machine learning classifiers: Logistic Regression, Support Vector Machine, K-nearest Neighbour, Naïve Bayes, Stochastic Gradient Descent classifier, Decision Trees, and Random Forest. We used standard performance metrics to design the computer-aided diagnosis system for estimating each classifier's performance.

Methodology

Dataset

We used a CKD dataset from the UCI machine learning lab. It includes 24 attributes plus a binary class attribute. Out of these, 11 are numerical, two are categorical with five levels, and the rest are binary. In the class attribute, one indicates CKD presence, and zero means CKD is not present. This dataset has 400 instances, with 150 samples without CKD and 250 with CKD. We used 300 instances for training the algorithms and 100 for testing. The attributes include Age, blood pressure, specific gravity, albumin, sugar, red blood cells, pus cell, pus cell clumps, bacteria, blood glucose random, blood urea, serum creatinine, sodium, potassium, hemoglobin, Packed cell volume, White Blood Cell Count, Red Blood Cell Count, hypertension, Diabetes Mellitus, appetite, Pedal Edema, Anaemia, and Class.

Machine Learning Techniques

Logistic Regression: Logistic Regression (LR) is a linear regression model. It computes the distribution between example X and boolean class label Y by P(X|Y). LR classifies boolean class label Y as follows:
P(Y=1│X)=1/(1+exp⁡(w₀ + ∑_i=1ⁿ (w_i X_i)))
P(Y=0│X)=1/(1+exp⁡(w₀ + ∑_i=1ⁿ (w_i X_i)))
Support Vector Machine: SVM is a popular method for predicting the category of data. It finds the optimal hyperplane between data of two classes in the training data by solving an optimization problem.
K-Nearest Neighbors: KNN classifies unknown examples by searching the closest data in pattern space. It predicts the class by using the Euclidean distance:
d(x,y)=√(∑_i=1^k ((x_i-y_i)²))
Naïve Bayes: Naïve Bayes classifiers are based on Bayes Theorem. Each value is marked independent of the other values, contributing independently to the probability. It uses the concept of Maximum Likelihood for prediction.
SGD Classifier: This is a Logistic Regression Classifier based on Stochastic Gradient Descent Optimization. It performs a parameter update for each training example x(i) and label y(i):
θ = θ - η ∇_θ J(θ;xⁱ;yⁱ)
Decision Tree: This method includes a structure with a root node, branches, and leaf nodes. It divides the data into classes based on the attribute value found in training samples.
Random Forest: RF is an ensemble classifier consisting of a collection of tree-structured classifiers, defined as multiple tree predictors. It uses random selection of input attributes for producing individual base decision trees.

Evaluation Parameters

Confusion Matrix

It's a performance measurement for classification problems with two or more classes. It shows four different combinations of predicted and actual values.

	Predicted Negative	Predicted Positive
Negative cases	TN	FP
Positive cases	FN	TP

Table 1: Confusion Matrix (CM)

We also define some evaluation measures:

Accuracy = (TN + TP) / (TN + TP + FN + FP)
Recall, Sensitivity = TP / (TP + FN)
Specificity = TN / (TN + FP)

Result

The seven machine learning algorithms were compared. All techniques were trained and tested by the proposed method. The confusion matrices for each algorithm are shown in Table 2.

Model	Not CKD	CKD
Logistic Regression	38 (TN) 0 (FP)	0 (FN) 62 (TP)
Support Vector Machine	36 (TN) 2 (FP)	2 (FN) 60 (TP)
K-Nearest Neighbor	38 (TN) 0 (FP)	2 (FN) 60 (TP)
Naïve Bayes	38 (TN) 0 (FP)	3 (FN) 59 (TP)
Stochastic Gradient Descent	38 (TN) 0 (FP)	0 (FN) 62 (TP)
Decision Trees	38 (TN) 0 (FP)	3 (FN) 59 (TP)
Random Forest	38 (TN) 0 (FP)	0 (FN) 62 (TP)

Table 2: Confusion matrices of all the algorithms.

Figure 2 shows the average accuracy of the classifiers. From the results, Logistic Regression, Random Forest, and SGD Classifier give the highest accuracy (100%). Decision Tree, SVM classifier, Naive Bayes, and KNN have average accuracies of 97.0%, 96.0%, 97.0%, and 97.0%, respectively. The accuracy of each class is vital because incorrect predictions can harm the patient. Therefore, sensitivity and specificity values are used to evaluate the proposed methods.

Conclusion

We trained seven different machine learning models to predict CKD. Logistic Regression, SGD Classifier, and Random Forest provided the best results, surpassing other classifiers in accurately detecting CKD. If these models are trained with a varied and extensive range of attributes, they may result in even more accurate predictions. Increasing the dataset would also provide more assurance. Hospitals and diagnostic centers could use this for faster and digitized analysis in predicting CKD.

References

R. Xi, N. Lin, and Y. Chen, “Compression and Aggregation for Logistic Regression Analysis in Data Cubes,” IEEE Trans. Knowledge and Data Engineering, vol. 21, pp. 479-492, April 2009.
R. G. Brereton, and G. R. Lloyd, “Support Vector Machines for classification and regression,” Analyst, vol. 135, no. 2, pp. 230-267, 2010.
S. Galit, R. P. Nitin, and C. B. Peter, Data Mining for Business Intelligence: Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner: Wiley Publishing, 2010.
http://www.statsoft.com/textbook/naive-bayes-classifier
Sebastian Ruder, “An overview of gradient descent optimization algorithms” arXiv:1609.04747, June 2017.
J. R. Quinlan, C4.5: programs for machine learning: Morgan Kaufmann Publishers Inc., 1993.
Anbarasi, M.S., & Janani, V. (2017). Ensemble classifier with Random Forest algorithm to deal with imbalanced healthcare data. In International Conference on Information Communication and Embedded Systems (ICICES) (pp. 1–7), Chennai.

Exploring Women’s Perspectives in the Poetry of Sappho and Erinna

Sigmund Freud Vs. Jean Piaget

This essay was reviewed by

Prof. Linda Burke

More about our Team

Cite this Essay

A Comparative Analysis of Seven Algorithms using a Comprehensive Dataset. (2024, February 13). GradesFixer. Retrieved April 8, 2025, from https://gradesfixer.com/free-essay-examples/a-comparative-analysis-of-seven-algorithms-using-a-comprehensive-dataset/

“A Comparative Analysis of Seven Algorithms using a Comprehensive Dataset.” GradesFixer, 13 Feb. 2024, gradesfixer.com/free-essay-examples/a-comparative-analysis-of-seven-algorithms-using-a-comprehensive-dataset/

A Comparative Analysis of Seven Algorithms using a Comprehensive Dataset. [online]. Available at: <https://gradesfixer.com/free-essay-examples/a-comparative-analysis-of-seven-algorithms-using-a-comprehensive-dataset/> [Accessed 8 Apr. 2025].

A Comparative Analysis of Seven Algorithms using a Comprehensive Dataset [Internet]. GradesFixer. 2024 Feb 13 [cited 2025 Apr 8]. Available from: https://gradesfixer.com/free-essay-examples/a-comparative-analysis-of-seven-algorithms-using-a-comprehensive-dataset/

copy

Keep in mind: This sample was shared by another student.

450+ experts on 30 subjects ready to help
Custom essay delivered in as few as 3 hours

Get high-quality help

Dr. Heisenberg

Verified writer

Expert in: Business

4.9

(456 reviews)

“Dr. Heisenberg followed all my directions. It was really easy to contact him and respond very fast as well.”

+120 experts online

Hire writer

Learn the cost and time for your paper

Paper Topic

Deadline: in 10 days

Number of pages

Email Invalid email

By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email

"You must agree to out terms of services and privacy policy"

Get an estimate

No need to pay just yet!

Remember! This is just a sample.

You can get your custom paper by one of our expert writers.

Get custom essay

121 writers online

Still can’t find what you need?

Browse our vast selection of original essay samples, each expertly formatted and styled

A Comparative Analysis of Seven Algorithms Using a Comprehensive Dataset

Table of contents

Introduction