Home — Essay Samples — Information Science and Technology — Data Mining — The Treatment of Missing Values – One of the Most Important Issues in Data Preprocessing

The Treatment of Missing Values – One of The Most Important Issues in Data Preprocessing

Categories: Data Mining

Human-Written

About this sample

Human-Written

Words: 450 |

Page: 1|

3 min read

Published: Mar 28, 2019

Words: 450|Page: 1|3 min read

Published: Mar 28, 2019

The treatment of missing values (MVs) is an important issue in data pre-processing in data mining. One reason is that attributes from data can be aggregated from different sources. Cases may not exist in all the data sources. The other reason is because of reporting omission. The simplest way of dealing with MVs is to discard the cases that contain at least one MV. However, this is practical only when the data contain a small number of cases with MVs and when the analysis of the complete cases will not lead to serious bias results for inference. For example, in our study, 10%-30% students are missing their high school GPA or SAT scores. It is impossible to simply discard these students, as most of them are international students or transfer students which constitute an important subset of the populations. It is also not practical to discard these variables, as they are proved to be important predictors for predicting students’ performance. Thus, it is important to apply appropriate imputation strategy on the data.

There are also a variety of data mining methods. Unlike traditional explanatory models where the goal is to explore the relationship between an outcome variable and explanatory variables, the goal of data mining model is to make predictions on a new data set. There is a target variable, which can be either continuous or categorical. There are also predictors, called features, which measure a set of characteristics of the sample members. By applying different data mining models, a prediction model can be built based on current data. The model can be applied to new data, where a new set of characteristics values are used to make predictions. Different data mining methods have different algorithms and thus will result in different prediction performance. Based on Luengo, imputation methods can improve data mining methods for different categories, as there may be an interaction between imputation strategies and data mining methods. We would like to explore how this works on our data. In this chapter, we will first introduce the imputation strategies applied in this dissertation. Then, we will introduce the data mining methods applied on our data.

Third, a commonly used over-sampling method SMOTE will be introduced to deal with the imbalanced data issue. Imbalanced data typically refers to a problem with classification problems where the classes are not represented equally. For example, in our data set, there are around 3000 students in total, with 90% of them are labeled as pass students and the remaining 10% of them are labeled as failure students. Most machine learning methods do not work well on an imbalanced data. Thus, techniques need to be used to tackle imbalanced data issue. SMOTE is one of them.

Data Harvesting Is Destroying Our Privacy

Bangla OCR

This essay was reviewed by

Alex Wood

More about our Team

Cite this Essay

The Treatment of Missing Values – One of the Most Important Issues in Data Preprocessing. (2019, March 27). GradesFixer. Retrieved April 8, 2025, from https://gradesfixer.com/free-essay-examples/the-treatment-of-missing-values-one-of-the-most-important-issues-in-data-preprocessing/

“The Treatment of Missing Values – One of the Most Important Issues in Data Preprocessing.” GradesFixer, 27 Mar. 2019, gradesfixer.com/free-essay-examples/the-treatment-of-missing-values-one-of-the-most-important-issues-in-data-preprocessing/

The Treatment of Missing Values – One of the Most Important Issues in Data Preprocessing. [online]. Available at: <https://gradesfixer.com/free-essay-examples/the-treatment-of-missing-values-one-of-the-most-important-issues-in-data-preprocessing/> [Accessed 8 Apr. 2025].

The Treatment of Missing Values – One of the Most Important Issues in Data Preprocessing [Internet]. GradesFixer. 2019 Mar 27 [cited 2025 Apr 8]. Available from: https://gradesfixer.com/free-essay-examples/the-treatment-of-missing-values-one-of-the-most-important-issues-in-data-preprocessing/

copy

Keep in mind: This sample was shared by another student.

450+ experts on 30 subjects ready to help
Custom essay delivered in as few as 3 hours

Get high-quality help

Dr. Heisenberg

Verified writer

Expert in: Information Science and Technology

4.9

(456 reviews)

“Dr. Heisenberg followed all my directions. It was really easy to contact him and respond very fast as well.”

+120 experts online

Hire writer

Learn the cost and time for your paper

Paper Topic

Deadline: in 10 days

Number of pages

Email Invalid email

By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email

"You must agree to out terms of services and privacy policy"

Get an estimate

No need to pay just yet!

Remember! This is just a sample.

You can get your custom paper by one of our expert writers.

Get custom essay

121 writers online

Still can’t find what you need?

Browse our vast selection of original essay samples, each expertly formatted and styled

The Treatment of Missing Values – One of The Most Important Issues in Data Preprocessing

Cite this Essay

Related Essays

Still can’t find what you need?

Related Essays

Related Topics

Get Your Personalized Essay in 3 Hours or Less!

Get Your
Personalized Essay in 3 Hours or Less!