Home — Essay Samples — Information Science and Technology — Data Analysis — Survival Prediction for Titanic Data Using Machine Learning Algorithms

Survival Prediction for Titanic Data Using Machine Learning Algorithms

Categories: Data Analysis Intelligent Machines

Human-Written

About this sample

Human-Written

Words: 1227 |

Pages: 3|

7 min read

Published: Sep 18, 2018

Words: 1227|Pages: 3|7 min read

Published: Sep 18, 2018

Introduction
Literature survey
Decision Trees
Methodolgy
Experimental Analysis and Discussion

The sinking of the RMS Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. This sensational tragedy shocked the international community and led to better safety regulations for ships.

In this paper we are going to make the predictive analysis of what sorts of people were likely to survive and using some tools of machine learing to predict which passengers survived the tragedy with accuracy.. IndexTerms - Machine learning.

Introduction

Machine learning means the application of any computer-enabled algorithm that can be applied against a data set to find a pattern in the data. This encompasses basically all types of data science algorithms, supervised, unsupervised,segmentation, classification, or regression”.few important areas where machine learning can be applied are Handwriting Recognition:convert written letters into digital letters Language Translation:translate spoken and or written languages (e.g. Google Translate) Speech Recognition:convert voice snippets to text (e.g. Siri, Cortana, and Alexa)ü Image Classification:label images with appropriate categories (e.g. Google Photos) Autonomous Drivin:genable cars to drive (e.g. NVIDIA and Google Car) some features of machine learning algorithms are : Features are the observations that are used to form predictions For image classification, the pixels are the features For voice recognition, the pitch and volume of the sound samples are the features For autonomous cars, data from the cameras, range sensors, and GPS are features Extracting relevant features is important for building a model Source of mail is an irrelevant feature when classifying images Source is relevant when classifying emails because SPAM often originates from reported sources

Literature survey

Every machine learning algorithm works best under a given set of conditions. Making sure your algorithm fits the assumptions requirements ensures superior performance. You can’t use any algorithm in any condition. Instead, in such situations, you should try using algorithms such as Logistic Regression, Decision Trees, SVM, Random Forest etc. Logistic Regression?

Logistic Regression is a classification algorithm. It is used to predict a binary outcome given a set of independent variables. To represent binary categorical outcome, we use dummy variables. You can also think of logistic regression as a special case of linear regression when the outcome variable is categorical, where we are using log of odds as dependent variable. In simple words, it predicts the probability of occurrence of an event by fitting data to a logit function.

Peformance of Logistic regression model: AIC (AkaikeInformation Criteria) –The analogous metric of adjusted R² in logistic regression is AIC. AIC is the measure of fit which penalizes model for the number of model coefficients. Therefore, we always prefer model with minimum AIC value Null Deviance and Residual Deviance –Null Deviance indicates the response predicted by a model with nothing but an intercept. Lower the value, better the model. Residual deviance indicates the response predicted by a model on adding independent variables. Lower the value, better the model. Confusion Matrix: It is nothing but a tabular representation of Actual vs Predicted values. This helps us to find the accuracy of the model and avoid overfitting. McFadden R2 is called as pseudo R2. Whenanalyzingdata with a logistic regression, an equivalent statistic to R-squared does not exist. However, to evaluate the goodness-of-fit of logistic models, several pseudo R-squareds have been developed accuracy=truepostives + true negatives

Decision Trees

Decision tree is a hierarchical tree structurethat can be used to divide up a large collection of records into smaller sets of classes by applying a sequence of simple decision rules. A decision tree model consists of a set of rules for dividing a large heterogeneous population into smaller, more homogeneous(mutually exclusive) classes.The attributes of the classes can be any type of variables from binary, nominal, ordinal, and quantitative values, while the classes must be qualitative type (categorical or binary, or ordinal). In short, given a data of attributes together with its classes, a decision tree produces a sequence of rules (or series of questions) that can be used to recognize the class. One rule is applied after another, resulting in a hierarchy of segments within segments. The hierarchy is called a tree, and each segment is called a node.With each successive division, the members of the resulting sets become more and more similar to each other. Hence, the algorithm used to construct decision tree is referred to as recursive partitioning Decision tree applications : prediction tumor cells as benign or maligant classify credit card transaction as legitimate or fradulent classify buyers from non -buyers decision on whether or not to approve a loan diagnosis of various diseases based on symptoms and profiles

Methodolgy

Our approach solves the problem:

Collect the raw data need to solve the problem.
Improt the dataset into the working environment
Data preprocessing which includes data wrangling and feature engineering
Explore the data and prepare a model for performing analysis using machine learing algorithms
Evaluate the model and re-iterate till we get satisfactory model performance
Compare the results and select a model which gives a more accurate result.

The data we collected is still rawdata which is very likely to contains mistakes,missing values and corrupt values. before drawing any conclusions from the data we need to do some data preprocessing which involves data wrangling and feature engineering . data wrangling is the process of cleaning and unify the messy and complex data sets for easy access and analysis feature engineering process attempts to create additional relevant features from existing raw features in the data and to increase the predictive power of learing algorithms

Experimental Analysis and Discussion

Data set description: The original data has been split into two groups :training dataset(70%) and test dataset(30%).The training set should be used to build your machine learning models.. The test set should be used to see how well your model performs on unseen data. For the test set, we do not provide the ground truth for each passenger. It is your job to predict these outcomes. For each passenger in the test set, use the model you trained to predict whether or not they survived the sinking of the Titanic.

Measures

Results after training with the algorithms, we have to validate our trained algorithms with test data set and measure the algorithms performance with godness of fit with confusion matrix for validation. 70% of data as training data set and 30% as training data set confusion matrix for decision tree trained data set test data set

References predictions 0 1 0 395 71 1 45 203

References predictions 0 1 0 97 20 1 12 48

Confusion matrix for logistic regression trained data test data

References predictions 0 1 0 395 12 1 21 204

References predictions 0 1 0 97 12 1 21 47

Enhancements and reasoning predicting the survival rate with others machine learing algorithms like random forests, various Support Vector machines may improve the accuracy of prediction for the given data set.

Conclusion: The analyses revealed interesting patterns across individual-level features. Factors such as socioeconomic status, social norms and family composition appeared to have an impact on likelihood of survival. These conclusions, however, were derived from findings in the dataThe accuracy of predicting the survival rate using decision tree algorithm(83.7) is high when compared with logistic regression(81.3) for a given data set

What is Datum and It’s Features

The Importance Of Database Management In Organization Or Society

This essay was reviewed by

Alex Wood

More about our Team

Cite this Essay

Survival Prediction for Titanic Data Using Machine Learning Algorithms. (2018, September 04). GradesFixer. Retrieved April 8, 2025, from https://gradesfixer.com/free-essay-examples/survival-prediction-for-titanic-data-using-machine-learing-algorithms-2/

“Survival Prediction for Titanic Data Using Machine Learning Algorithms.” GradesFixer, 04 Sept. 2018, gradesfixer.com/free-essay-examples/survival-prediction-for-titanic-data-using-machine-learing-algorithms-2/

Survival Prediction for Titanic Data Using Machine Learning Algorithms. [online]. Available at: <https://gradesfixer.com/free-essay-examples/survival-prediction-for-titanic-data-using-machine-learing-algorithms-2/> [Accessed 8 Apr. 2025].

Survival Prediction for Titanic Data Using Machine Learning Algorithms [Internet]. GradesFixer. 2018 Sept 04 [cited 2025 Apr 8]. Available from: https://gradesfixer.com/free-essay-examples/survival-prediction-for-titanic-data-using-machine-learing-algorithms-2/

copy

Keep in mind: This sample was shared by another student.

450+ experts on 30 subjects ready to help
Custom essay delivered in as few as 3 hours

Get high-quality help

Prof Ernest (PhD)

Verified writer

Expert in: Information Science and Technology Science

(571 reviews)

“Thank you so much for accepting my assignment the night before it was due. I look forward to working with you moving forward”

+120 experts online

Hire writer

Learn the cost and time for your paper

Paper Topic

Deadline: in 10 days

Number of pages

Email Invalid email

By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email

"You must agree to out terms of services and privacy policy"

Get an estimate

No need to pay just yet!

Remember! This is just a sample.

You can get your custom paper by one of our expert writers.

Get custom essay

121 writers online

Still can’t find what you need?

Browse our vast selection of original essay samples, each expertly formatted and styled

Survival Prediction for Titanic Data Using Machine Learning Algorithms

Table of contents

Introduction

Literature survey

Decision Trees

Methodolgy

Experimental Analysis and Discussion

Cite this Essay

Still can’t find what you need?

Get Your
Personalized Essay in 3 Hours or Less!

Survival Prediction for Titanic Data Using Machine Learning Algorithms

Table of contents

Introduction

Literature survey

Decision Trees

Methodolgy

Experimental Analysis and Discussion

Cite this Essay

Related Essays

Still can’t find what you need?

Related Essays on Data Analysis

Related Topics

Get Your Personalized Essay in 3 Hours or Less!

Get Your
Personalized Essay in 3 Hours or Less!