close
test_template

Predictive Modeling and Machine Learning

About this sample

About this sample

close
Human-Written

Words: 1572 |

Pages: 3|

8 min read

Published: Apr 11, 2019

Words: 1572|Pages: 3|8 min read

Published: Apr 11, 2019

Table of contents

  1. Supervised learning
  2. Unsupervised learning
  3. Reinforcement learning
  4. Ensemble learning

Machine learning is the process of automatically extracting patterns from historical data to forecast future events and can be used to build predictive models (Kelleher et al., 2015). In Perry (2013) machine learning is defined as algorithms that are automated for structure extraction from historical data, results generalisation and prediction making for future data. As Shouval et al. (2013) further explain, machine learning starts without a predefined model, instead the model is created by learning patterns in the data being used. Kelleher et al. (2015), Perry (2013) and Shouval et al. (2013) classify machine learning into two main types known as supervised and unsupervised learning, with a third type called reinforced learning explained in Sutton and Barto (2015). More than one machine learning algorithms can be combined and their results used in a complimentary manner in what is known as ensemble learning (Polikar, 2010; Brown, 2010). These different learning types have been evaluated in order to select a learning method to be used in this research and are describe herewith.

Supervised learning

Supervised learning algorithms build models by learning relationships between descriptive features (input) and target features (output) based on historic datasets (Kelleher et al., 2015). The algorithm is trained by supplying it with known inputs and their matching responses, and from the learned relationship it can predict responses for unknown inputs (Shouval et al., 2013). In Shouval et al. (2013), supervised learning can be further grouped into regression and classification.

Regression algorithms - linear regression are predictions of continuous nature such as time measurements and logical regression predictions are of discrete nature like male/female (Shouval et al., 2013). The regression types of machine learning algorithms are therefore not suitable for an index producing model.

Classification algorithms - decision trees are classification algorithms that assemble a decision tree from a labeled dataset, with the root at the top branching downwards to form branches and terminating at the leaves (Suknovic et al., 2011). The root and each of the branches are discrete functions with attribute – value pairs that require a decision to be made to proceed to the next tree depth or level (Barros, et al., 2015). As Barros et al. (2015) explain, each branch has only one incoming input and can have two or more output branches, while the root has only output branches and no input. The leaves also known as terminals are at the end of the tree and each one represents an output class. Suknovic et al. (2011) explain that from the labeled dataset of known attribute and corresponding values, the algorithm can learn the classification and be able to predict a class given a future an unknown attribute. Advantages of decision trees are ease of use and interpretation as stated by Shalev-Shwartz and Ben-David (2014). Decision trees algorithms are proposed for usage in this study.

  • CART
  • Bayes Nave
  • K-NN

Unsupervised learning

Unlike supervised learning the unsupervised algorithms are only fed descriptive features with no corresponding target features. The algorithm learns by identifying relationships in the inputs and either groups them by clusters, association or detect anomaly behaviours.

Association algorithms - in machine learning association algorithms are rule based algorithms that learn by discovering interesting relationships among data points in a dataset (Rudin et al., 2013). According to Al-Maolegi and Arkok (2014), the extraction of association rules in databases is a widely performed in sales transactions to build patterns between items. The basic concepts in association are to detect frequent items in a dataset and generate association rules based on how the items occur together (Al-Maolegi & Arkok, 2014). Apriori is the main association algorithm in unsupervised machine learning (A-Maolegi & Arkok, 2014), with Eclat, and FP-Growth are also used (Heaton, 2017). The strengths of association learning and the algorithms is in the interestingness of the association between items and frequency of occurrence. This will not be helpful in the initial modeling of the RPPI.

Anomaly detection algorithms - Blomquist and Möller (2015) defined anomalies as patterns of data that do not conform to what has been specified as normal behaviour. Anomaly detection algorithms are used to detect anomalies or data points that do not follow the norm in a dataset. These algorithms grew out statistical methods of pruning and cleansing datasets of outliers (Goldstein & Uchida, 2016). The authors detail how these algorithms are in use today to detect fraud in the financial industry and as intrusion detection tools in information security. These algorithms do not fit in with the modelling of the RPPI and are not considered for this study.

Clustering algorithms - clustering is partitioning of data consisting of similar objects into groups called clusters. Objects in a group or cluster share similarities and are dissimilar compare to objects in other clusters (Al-Haddad & Aldabbagh, 2015). In Al-Haddad and Aldabbagh (2015) explain that clustering create structures from unlabeled data by separate data according to its properties. Three types of clustering are shown in Al-Haddad and Aldabbagh (2015) namely, exclusive, overlapping and hierarchical clustering. In exclusive, clustering an object can only belong to one group no other group. In overlapping clustering, an object can belong to more than one group. With hierarchical clustering objects are clustered in a hierarchical manner and can belong to more than one cluster. The clustering algorithms considered for the study are k-means.

o K-means algorithm is a clustering algorithm that partitions data into a user predetermined number of clusters by iterating through the dataset to find commonalities (Goswami, 2015). K-means algorithm is easy to understand and use (Goswami, 2015), and can be used in any research field (Morissette & Chartier, 2013). Al-Haddad and Aldabbagh (2015) and Goswami (2015) explain the process as follows: initially random cluster centres called centroids are selected and all objects in the dataset are assigned to a centroid closest to them to form a cluster. When all objects have been clustered new centroids are calculated based on the previous clusters and the process is repeated until no new clusters can be created. According to Goswami (2015), the limitations of k-means are that the number of clusters has to pre-determined, objects must belong to a cluster and random centroids seeds must be selected. If the number of clusters is not known the quality of the clusters could be inconsequential. K-means clusters can be negatively affected by outliers due to the requirement that objects must belong to a cluster. To mitigate the weakness of the randomness of initial centroids other statistical techniques for randomness may be used. The k-means algorithm will be one of the algorithms used in the construction of the RPPI model.

Reinforcement learning

Reinforcement learning is a type of learning that allows the learner or learning agent to determine its behaviour based on the environmental conditions, the learner is not told what to do but rather which actions brings the most reward (Ayodele, 2010). Sutton and Barto (2014, 2015) explain how it differs from unsupervised learning, in that there are no structures or patterns to determine the action; from supervised learning in that there are no labelled datasets; the agent must take decisions and continue to learn from its own experiences and rewards received. The two features that mostly distinguish reinforced learning, are ‘trial-and-error search and delayed reward’ according to Sutton and Barto (2014, 2015, 2016, 2017). A practical application of reinforcement learning is the game of chess between machine and person. The machine cannot learn all the moves given all the opponent’s moves possibilities, therefore it must calculate which move achieves the best rewards to win. Reinforcement learning algorithms are used in online systems which are state-based and decisions have to be taken from one state to the next (Sutton and Barto, 2014, 2015, 2016, 2017), such as game theory, simulation-based models, etc. and therefore, will not be utilised for this research.

Ensemble learning

This is machine learning where outputs of multiple algorithms are combined to achieve the best output (Brown, 2010). The combined learners known as a committee, Brown (2010) can be from different learning categories such as classification, regression, clustering, etc. and decisions can be by voting, probability, ranking or any other statistical technique. Polikar (2010) attest to the use of ensemble algorithms in a wide spectrum of fields to address many challenges that are inherent in machine learning such as error correction, estimation, imbalances, etc.

The following ensemble methods are described by van Hasselt and Wiering (2015)

Get a custom paper now from our expert writers.

  • The majority voting (MV) method combines the best action of each algorithm and bases its final decision on the number of times an action is preferred by each algorithm
  • The rank voting (RV) method lets each algorithm rank the different actions and combines these rankings to select a final action,
  • The Boltzmann multiplication (BM) method is based on using Boltzmann exploration for each algorithm and multiplies the Boltzmann probabilities of each action computed by each algorithm, and
  • The Boltzmann addition (BA) method is similar to the BM method, but adds the Boltzmann probabilities of actions.

Adaboost algorithms as described in Polikar (2010) are the most popular in ensemble learning, covering both classification and regression. Brown (2010), indicates that weak algorithms can be majorly boosted by employing Adaboost ensemble techniques, however the stronger algorithm used with it does not perform much better. At this stage of the study, it is difficult to know the outcome of the selected algorithms with certainty and it is not known if ensemble algorithms will be required.

Image of Alex Wood
This essay was reviewed by
Alex Wood

Cite this Essay

Predictive Modeling and Machine Learning. (2019, April 10). GradesFixer. Retrieved November 19, 2024, from https://gradesfixer.com/free-essay-examples/predictive-modeling-and-machine-learning/
“Predictive Modeling and Machine Learning.” GradesFixer, 10 Apr. 2019, gradesfixer.com/free-essay-examples/predictive-modeling-and-machine-learning/
Predictive Modeling and Machine Learning. [online]. Available at: <https://gradesfixer.com/free-essay-examples/predictive-modeling-and-machine-learning/> [Accessed 19 Nov. 2024].
Predictive Modeling and Machine Learning [Internet]. GradesFixer. 2019 Apr 10 [cited 2024 Nov 19]. Available from: https://gradesfixer.com/free-essay-examples/predictive-modeling-and-machine-learning/
copy
Keep in mind: This sample was shared by another student.
  • 450+ experts on 30 subjects ready to help
  • Custom essay delivered in as few as 3 hours
Write my essay

Still can’t find what you need?

Browse our vast selection of original essay samples, each expertly formatted and styled

close

Where do you want us to send this sample?

    By clicking “Continue”, you agree to our terms of service and privacy policy.

    close

    Be careful. This essay is not unique

    This essay was donated by a student and is likely to have been used and submitted before

    Download this Sample

    Free samples may contain mistakes and not unique parts

    close

    Sorry, we could not paraphrase this essay. Our professional writers can rewrite it and get you a unique paper.

    close

    Thanks!

    Please check your inbox.

    We can write you a custom essay that will follow your exact instructions and meet the deadlines. Let's fix your grades together!

    clock-banner-side

    Get Your
    Personalized Essay in 3 Hours or Less!

    exit-popup-close
    We can help you get a better grade and deliver your task on time!
    • Instructions Followed To The Letter
    • Deadlines Met At Every Stage
    • Unique And Plagiarism Free
    Order your paper now