Survival Prediction for Titanic Data Using Machine Learning Algorithms

About this sample

About this sample


Words: 1227 |

Pages: 3|

7 min read

Published: Sep 18, 2018

Words: 1227|Pages: 3|7 min read

Published: Sep 18, 2018

Table of contents

  1. Introduction
  2. Literature survey
  3. Decision Trees
  4. Methodolgy
  5. Experimental Analysis and Discussion

The sinking of the RMS Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. This sensational tragedy shocked the international community and led to better safety regulations for ships.

'Why Violent Video Games Shouldn't Be Banned'?

In this paper we are going to make the predictive analysis of what sorts of people were likely to survive and using some tools of machine learing to predict which passengers survived the tragedy with accuracy.. IndexTerms - Machine learning.


Machine learning means the application of any computer-enabled algorithm that can be applied against a data set to find a pattern in the data. This encompasses basically all types of data science algorithms, supervised, unsupervised,segmentation, classification, or regression”.few important areas where machine learning can be applied are Handwriting Recognition:convert written letters into digital letters Language Translation:translate spoken and or written languages (e.g. Google Translate) Speech Recognition:convert voice snippets to text (e.g. Siri, Cortana, and Alexa)ü Image Classification:label images with appropriate categories (e.g. Google Photos) Autonomous Drivin:genable cars to drive (e.g. NVIDIA and Google Car) some features of machine learning algorithms are : Features are the observations that are used to form predictions For image classification, the pixels are the features For voice recognition, the pitch and volume of the sound samples are the features For autonomous cars, data from the cameras, range sensors, and GPS are features Extracting relevant features is important for building a model Source of mail is an irrelevant feature when classifying images Source is relevant when classifying emails because SPAM often originates from reported sources

Literature survey

Every machine learning algorithm works best under a given set of conditions. Making sure your algorithm fits the assumptions requirements ensures superior performance. You can’t use any algorithm in any condition. Instead, in such situations, you should try using algorithms such as Logistic Regression, Decision Trees, SVM, Random Forest etc. Logistic Regression?

Logistic Regression is a classification algorithm. It is used to predict a binary outcome given a set of independent variables. To represent binary categorical outcome, we use dummy variables. You can also think of logistic regression as a special case of linear regression when the outcome variable is categorical, where we are using log of odds as dependent variable. In simple words, it predicts the probability of occurrence of an event by fitting data to a logit function.

Peformance of Logistic regression model: AIC (AkaikeInformation Criteria) –The analogous metric of adjusted R² in logistic regression is AIC. AIC is the measure of fit which penalizes model for the number of model coefficients. Therefore, we always prefer model with minimum AIC value Null Deviance and Residual Deviance –Null Deviance indicates the response predicted by a model with nothing but an intercept. Lower the value, better the model. Residual deviance indicates the response predicted by a model on adding independent variables. Lower the value, better the model. Confusion Matrix: It is nothing but a tabular representation of Actual vs Predicted values. This helps us to find the accuracy of the model and avoid overfitting. McFadden R2 is called as pseudo R2. Whenanalyzingdata with a logistic regression, an equivalent statistic to R-squared does not exist. However, to evaluate the goodness-of-fit of logistic models, several pseudo R-squareds have been developed accuracy=truepostives + true negatives

Decision Trees

Decision tree is a hierarchical tree structurethat can be used to divide up a large collection of records into smaller sets of classes by applying a sequence of simple decision rules. A decision tree model consists of a set of rules for dividing a large heterogeneous population into smaller, more homogeneous(mutually exclusive) classes.The attributes of the classes can be any type of variables from binary, nominal, ordinal, and quantitative values, while the classes must be qualitative type (categorical or binary, or ordinal). In short, given a data of attributes together with its classes, a decision tree produces a sequence of rules (or series of questions) that can be used to recognize the class. One rule is applied after another, resulting in a hierarchy of segments within segments. The hierarchy is called a tree, and each segment is called a node.With each successive division, the members of the resulting sets become more and more similar to each other. Hence, the algorithm used to construct decision tree is referred to as recursive partitioning Decision tree applications : prediction tumor cells as benign or maligant classify credit card transaction as legitimate or fradulent classify buyers from non -buyers decision on whether or not to approve a loan diagnosis of various diseases based on symptoms and profiles


Our approach solves the problem:

  1. Collect the raw data need to solve the problem.
  2. Improt the dataset into the working environment
  3. Data preprocessing which includes data wrangling and feature engineering
  4. Explore the data and prepare a model for performing analysis using machine learing algorithms
  5. Evaluate the model and re-iterate till we get satisfactory model performance
  6. Compare the results and select a model which gives a more accurate result.

The data we collected is still rawdata which is very likely to contains mistakes,missing values and corrupt values. before drawing any conclusions from the data we need to do some data preprocessing which involves data wrangling and feature engineering . data wrangling is the process of cleaning and unify the messy and complex data sets for easy access and analysis feature engineering process attempts to create additional relevant features from existing raw features in the data and to increase the predictive power of learing algorithms

Experimental Analysis and Discussion

Data set description: The original data has been split into two groups :training dataset(70%) and test dataset(30%).The training set should be used to build your machine learning models.. The test set should be used to see how well your model performs on unseen data. For the test set, we do not provide the ground truth for each passenger. It is your job to predict these outcomes. For each passenger in the test set, use the model you trained to predict whether or not they survived the sinking of the Titanic.


Results after training with the algorithms, we have to validate our trained algorithms with test data set and measure the algorithms performance with godness of fit with confusion matrix for validation. 70% of data as training data set and 30% as training data set confusion matrix for decision tree trained data set test data set

References predictions 0 1 0 395 71 1 45 203

References predictions 0 1 0 97 20 1 12 48

Confusion matrix for logistic regression trained data test data

References predictions 0 1 0 395 12 1 21 204

References predictions 0 1 0 97 12 1 21 47

Enhancements and reasoning predicting the survival rate with others machine learing algorithms like random forests, various Support Vector machines may improve the accuracy of prediction for the given data set.

Get a custom paper now from our expert writers.

Conclusion: The analyses revealed interesting patterns across individual-level features. Factors such as socioeconomic status, social norms and family composition appeared to have an impact on likelihood of survival. These conclusions, however, were derived from findings in the dataThe accuracy of predicting the survival rate using decision tree algorithm(83.7) is high when compared with logistic regression(81.3) for a given data set

Image of Alex Wood
This essay was reviewed by
Alex Wood

Cite this Essay

Survival Prediction for Titanic Data Using Machine Learning Algorithms. (2018, September 04). GradesFixer. Retrieved December 5, 2023, from
“Survival Prediction for Titanic Data Using Machine Learning Algorithms.” GradesFixer, 04 Sept. 2018,
Survival Prediction for Titanic Data Using Machine Learning Algorithms. [online]. Available at: <> [Accessed 5 Dec. 2023].
Survival Prediction for Titanic Data Using Machine Learning Algorithms [Internet]. GradesFixer. 2018 Sept 04 [cited 2023 Dec 5]. Available from:
Keep in mind: This sample was shared by another student.
  • 450+ experts on 30 subjects ready to help
  • Custom essay delivered in as few as 3 hours
Write my essay

Still can’t find what you need?

Browse our vast selection of original essay samples, each expertly formatted and styled


Where do you want us to send this sample?

    By clicking “Continue”, you agree to our terms of service and privacy policy.


    Be careful. This essay is not unique

    This essay was donated by a student and is likely to have been used and submitted before

    Download this Sample

    Free samples may contain mistakes and not unique parts


    Sorry, we could not paraphrase this essay. Our professional writers can rewrite it and get you a unique paper.



    Please check your inbox.

    We can write you a custom essay that will follow your exact instructions and meet the deadlines. Let's fix your grades together!


    Get Your
    Personalized Essay in 3 Hours or Less!

    We can help you get a better grade and deliver your task on time!
    • Instructions Followed To The Letter
    • Deadlines Met At Every Stage
    • Unique And Plagiarism Free
    Order your paper now