By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email
No need to pay just yet!
About this sample
About this sample
Words: 1079 |
Pages: 2|
6 min read
Published: Sep 20, 2018
Words: 1079|Pages: 2|6 min read
Published: Sep 20, 2018
With the development of Information Technology a large amount of databases and huge amount of data in various areas has been generated. The research in different databases and information technology has always given rise to an approach to store and manipulate this precious data for further decision making. Data mining is a process of extracting useful information and patterns from large amount of data and is called as knowledge discovery process, knowledge mining from data, knowledge extraction or data analysis or pattern analysis.
Data mining is a logical process that searches useful data from a large amount of raw data. The main goal of this technique is to find previously unknown patterns. Once these patterns are found, they can further be used to make certain decisions for machine learning and predicting analysis.
A. Exploration: firstly the data is cleaned and transformed to important variables and then nature of data based on the problem are determined.
B. Pattern Identification: After the exploration, refining and defining of data for the specific variables the second step is to form pattern identification. Identify and choose the patterns which make the best prediction.
C. Deployment: Finally the patterns are put into use for desired outcome.[2]
Knowledge is discovered from available databases with the use of different kind of algorithms and techniques like Classification, Clustering, Regression, Artificial Intelligence, Neural Networks, Association Rules, Decision Trees, Genetic Algorithm, Nearest Neighbour method etc.
Classification is a data mining technique that assigns categories to a collection if data in order to aid in more accurate predictions and analysis. One of its several methods is decision tree. The goal is to set of classification rules that will answer a question, make decision or predict behavior. To start a set of training data is developed that contains a certain set of attributes as well as the likely outcome. The job of classification algorithm is to discover how the set of attributes reaches its conclusion. Different types of classification models are classification by decision tree, Neural Networks, Support Vector Machine.
Clustering can be said as identification of similar classes of objects. By using clustering techniques we can further identify dense and sparse regions in object space and can discover overall distribution pattern and correlations among data attributes. Clustering approach can also be used for effective means of distinguishing groups or classes of object. But, it becomes costly so clustering can be used as pre-processing approach for attribute subset selection and classification. For example, to form group of customers based on purchasing patterns, to categories genes with similar functionality. Partitioning Methods, Hierarchical Agglomerative (divisive) methods Density based methods, Grid-based methods Model-based methods are the different types of clustering methods
Regression technique can be adapted for prediction. Regression analysis can be used to model the relationship between one or more independent variables and dependent variables. In data mining attributes already known are independent variables and what we want to predict are the response variables. Unfortunately, many real-world problems are not simply prediction. For instance, sales volumes, stock prices, and product failure rates are all very difficult to predict because they may depend on complex interactions of multiple predictor variables. Therefore, more complex techniques (e.g., logistic regression, decision trees, or neural nets) may be necessary to forecast future values. The same model types can often be used for both regression and classification. For example, the CART (Classification and Regression Trees) decision tree algorithm can be used to build both classification trees (to classify categorical response variables) and regression trees (to forecast continuous response variables). Neural networks too can create both classification and regression models.
Different types of regression methods are Linear Regression, Multivariate Linear Regression, Nonlinear Regression, and Multivariate Nonlinear Regression
Association and correlation is usually to find frequent item set findings among large data sets. This type of findings helps to make certain decisions, such as catalogue design, cross marketing and customer shopping behavior analysis. Association Rule algorithms need to be able to generate rules with confidence values less than one. However the number of possible Association Rules for a given data set is generally very large and a high proportion of the rules are usually of little value.
Different types of association rule are Multi-level association rule, Multidimensional association rule and Quantitative association rule
Neural network is a set of connected input/output units and each connection has a weight present with it. During the learning phase, network learns by adjusting weights so as to be able to predict the correct class labels of the input tuples. Neural networks have the remarkable ability to derive meaning from complicated or imprecise data and can be used to extract patterns and detect trends that are complex to be noticed by either humans or other computer techniques. These are well suited for continuous valued inputs and outputs. Neural networks are best at identifying patterns or trends in data and well suited for prediction or forecasting needs.
Data mining is an essential process where intelligent methods are applied to extract data patterns. It has an important significance regarding finding the patterns, forecasting, discovery of complete knowledge etc., in different field of Information Technology. Data mining techniques and algorithms such as classification, clustering etc., helps in finding the patterns in accordance with the certain similar characteristics of the data. Data mining has wide application domain almost in every industry where the data is generated, this is why data mining is considered to be one of the most important frontiers in database and information systems and also the most promising interdisciplinary developments in Information Technology.
[1] Jiawei Han and Micheline Kamber, Data Mining Concepts and Techniques, published by Morgan Kauffman, 3rd edition.
[2] Mrs. Bharati M. Ramageri, “Data Mining Techniques And Applications” ,Indian Journal of Computer Science and Engineering Vol. 1 No. 4, ISSN : 0976-5166 pg: 301-305.
[3] Ke Jie, Dong Hongbin, Tan Chengyu and Liang Yiwen, ”PBWA: A Provenance-Based What-If Analysis Approach for Data Mining Processes” Chinese Journal of Electronics Vol.26, No.5, Sept. 2017
[4] LiHua Wang BeiHang Zijun Zhou, “Congestion Prediction for Urban Areas by Spatiotemporal Data Mining”, International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery 978-1-5386-2209-4/17 2017 IEEE
[5] Sagardeep Roy Anchal Garg,” Analyzing Performance of Students by Using Data Mining Techniques A Literature Survey” 4th IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics (UPCON) GLA University, Mathura, Oct 26-28, 2017, 978-1-5386-3004-4/17
Browse our vast selection of original essay samples, each expertly formatted and styled