close
test_template

The Treatment of Missing Values – One of The Most Important Issues in Data Preprocessing

Human-Written
download print

About this sample

About this sample

close
Human-Written

Words: 450 |

Page: 1|

3 min read

Published: Mar 28, 2019

Words: 450|Page: 1|3 min read

Published: Mar 28, 2019

The treatment of missing values (MVs) is an important issue in data pre-processing in data mining. One reason is that attributes from data can be aggregated from different sources. Cases may not exist in all the data sources. The other reason is because of reporting omission. The simplest way of dealing with MVs is to discard the cases that contain at least one MV. However, this is practical only when the data contain a small number of cases with MVs and when the analysis of the complete cases will not lead to serious bias results for inference. For example, in our study, 10%-30% students are missing their high school GPA or SAT scores. It is impossible to simply discard these students, as most of them are international students or transfer students which constitute an important subset of the populations. It is also not practical to discard these variables, as they are proved to be important predictors for predicting students’ performance. Thus, it is important to apply appropriate imputation strategy on the data.

There are also a variety of data mining methods. Unlike traditional explanatory models where the goal is to explore the relationship between an outcome variable and explanatory variables, the goal of data mining model is to make predictions on a new data set. There is a target variable, which can be either continuous or categorical. There are also predictors, called features, which measure a set of characteristics of the sample members. By applying different data mining models, a prediction model can be built based on current data. The model can be applied to new data, where a new set of characteristics values are used to make predictions. Different data mining methods have different algorithms and thus will result in different prediction performance. Based on Luengo, imputation methods can improve data mining methods for different categories, as there may be an interaction between imputation strategies and data mining methods. We would like to explore how this works on our data. In this chapter, we will first introduce the imputation strategies applied in this dissertation. Then, we will introduce the data mining methods applied on our data.

Third, a commonly used over-sampling method SMOTE will be introduced to deal with the imbalanced data issue. Imbalanced data typically refers to a problem with classification problems where the classes are not represented equally. For example, in our data set, there are around 3000 students in total, with 90% of them are labeled as pass students and the remaining 10% of them are labeled as failure students. Most machine learning methods do not work well on an imbalanced data. Thus, techniques need to be used to tackle imbalanced data issue. SMOTE is one of them.

Image of Alex Wood
This essay was reviewed by
Alex Wood

Cite this Essay

The Treatment of Missing Values – One of the Most Important Issues in Data Preprocessing. (2019, March 27). GradesFixer. Retrieved November 19, 2024, from https://gradesfixer.com/free-essay-examples/the-treatment-of-missing-values-one-of-the-most-important-issues-in-data-preprocessing/
“The Treatment of Missing Values – One of the Most Important Issues in Data Preprocessing.” GradesFixer, 27 Mar. 2019, gradesfixer.com/free-essay-examples/the-treatment-of-missing-values-one-of-the-most-important-issues-in-data-preprocessing/
The Treatment of Missing Values – One of the Most Important Issues in Data Preprocessing. [online]. Available at: <https://gradesfixer.com/free-essay-examples/the-treatment-of-missing-values-one-of-the-most-important-issues-in-data-preprocessing/> [Accessed 19 Nov. 2024].
The Treatment of Missing Values – One of the Most Important Issues in Data Preprocessing [Internet]. GradesFixer. 2019 Mar 27 [cited 2024 Nov 19]. Available from: https://gradesfixer.com/free-essay-examples/the-treatment-of-missing-values-one-of-the-most-important-issues-in-data-preprocessing/
copy
Keep in mind: This sample was shared by another student.
  • 450+ experts on 30 subjects ready to help
  • Custom essay delivered in as few as 3 hours
Write my essay

Still can’t find what you need?

Browse our vast selection of original essay samples, each expertly formatted and styled

close

Where do you want us to send this sample?

    By clicking “Continue”, you agree to our terms of service and privacy policy.

    close

    Be careful. This essay is not unique

    This essay was donated by a student and is likely to have been used and submitted before

    Download this Sample

    Free samples may contain mistakes and not unique parts

    close

    Sorry, we could not paraphrase this essay. Our professional writers can rewrite it and get you a unique paper.

    close

    Thanks!

    Please check your inbox.

    We can write you a custom essay that will follow your exact instructions and meet the deadlines. Let's fix your grades together!

    clock-banner-side

    Get Your
    Personalized Essay in 3 Hours or Less!

    exit-popup-close
    We can help you get a better grade and deliver your task on time!
    • Instructions Followed To The Letter
    • Deadlines Met At Every Stage
    • Unique And Plagiarism Free
    Order your paper now