This essay has been submitted by a student. This is not an example of the work written by professional essay writers.

Survival Prediction for Titanic Data Using Machine Learning Algorithms

downloadDownload printPrint

Remember! This is just a sample.

You can get your custom paper by one of our expert writers.

Get custom essay

121 writers online

Download PDF

The sinking of the RMS Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. This sensational tragedy shocked the international community and led to better safety regulations for ships.

In this paper we are going to make the predictive analysis of what sorts of people were likely to survive and using some tools of machine learing to predict which passengers survived the tragedy with accuracy.. IndexTerms – Machine learning.


Machine learning means the application of any computer-enabled algorithm that can be applied against a data set to find a pattern in the data. This encompasses basically all types of data science algorithms, supervised, unsupervised,segmentation, classification, or regression”.few important areas where machine learning can be applied are Handwriting Recognition:convert written letters into digital letters Language Translation:translate spoken and or written languages (e.g. Google Translate) Speech Recognition:convert voice snippets to text (e.g. Siri, Cortana, and Alexa)ü Image Classification:label images with appropriate categories (e.g. Google Photos) Autonomous Drivin:genable cars to drive (e.g. NVIDIA and Google Car) some features of machine learning algorithms are : Features are the observations that are used to form predictions For image classification, the pixels are the features For voice recognition, the pitch and volume of the sound samples are the features For autonomous cars, data from the cameras, range sensors, and GPS are features Extracting relevant features is important for building a model Source of mail is an irrelevant feature when classifying images Source is relevant when classifying emails because SPAM often originates from reported sources

Literature survey

Every machine learning algorithm works best under a given set of conditions. Making sure your algorithm fits the assumptions requirements ensures superior performance. You can’t use any algorithm in any condition. Instead, in such situations, you should try using algorithms such as Logistic Regression, Decision Trees, SVM, Random Forest etc. Logistic Regression?

Logistic Regression is a classification algorithm. It is used to predict a binary outcome given a set of independent variables. To represent binary categorical outcome, we use dummy variables. You can also think of logistic regression as a special case of linear regression when the outcome variable is categorical, where we are using log of odds as dependent variable. In simple words, it predicts the probability of occurrence of an event by fitting data to a logit function.

Peformance of Logistic regression model: AIC (AkaikeInformation Criteria) –The analogous metric of adjusted R² in logistic regression is AIC. AIC is the measure of fit which penalizes model for the number of model coefficients. Therefore, we always prefer model with minimum AIC value Null Deviance and Residual Deviance –Null Deviance indicates the response predicted by a model with nothing but an intercept. Lower the value, better the model. Residual deviance indicates the response predicted by a model on adding independent variables. Lower the value, better the model. Confusion Matrix: It is nothing but a tabular representation of Actual vs Predicted values. This helps us to find the accuracy of the model and avoid overfitting. McFadden R2 is called as pseudo R2. Whenanalyzingdata with a logistic regression, an equivalent statistic to R-squared does not exist. However, to evaluate the goodness-of-fit of logistic models, several pseudo R-squareds have been developed accuracy=truepostives + true negatives

Decision Trees

Decision tree is a hierarchical tree structurethat can be used to divide up a large collection of records into smaller sets of classes by applying a sequence of simple decision rules. A decision tree model consists of a set of rules for dividing a large heterogeneous population into smaller, more homogeneous(mutually exclusive) classes.The attributes of the classes can be any type of variables from binary, nominal, ordinal, and quantitative values, while the classes must be qualitative type (categorical or binary, or ordinal). In short, given a data of attributes together with its classes, a decision tree produces a sequence of rules (or series of questions) that can be used to recognize the class. One rule is applied after another, resulting in a hierarchy of segments within segments. The hierarchy is called a tree, and each segment is called a node.With each successive division, the members of the resulting sets become more and more similar to each other. Hence, the algorithm used to construct decision tree is referred to as recursive partitioning Decision tree applications : prediction tumor cells as benign or maligant classify credit card transaction as legitimate or fradulent classify buyers from non -buyers decision on whether or not to approve a loan diagnosis of various diseases based on symptoms and profiles


Our approach solves the problem:

  1. Collect the raw data need to solve the problem.
  2. Improt the dataset into the working environment
  3. Data preprocessing which includes data wrangling and feature engineering
  4. Explore the data and prepare a model for performing analysis using machine learing algorithms
  5. Evaluate the model and re-iterate till we get satisfactory model performance
  6. Compare the results and select a model which gives a more accurate result.

The data we collected is still rawdata which is very likely to contains mistakes,missing values and corrupt values. before drawing any conclusions from the data we need to do some data preprocessing which involves data wrangling and feature engineering . data wrangling is the process of cleaning and unify the messy and complex data sets for easy access and analysis feature engineering process attempts to create additional relevant features from existing raw features in the data and to increase the predictive power of learing algorithms

Experimental Analysis and Discussion

Data set description: The original data has been split into two groups :training dataset(70%) and test dataset(30%).The training set should be used to build your machine learning models.. The test set should be used to see how well your model performs on unseen data. For the test set, we do not provide the ground truth for each passenger. It is your job to predict these outcomes. For each passenger in the test set, use the model you trained to predict whether or not they survived the sinking of the Titanic.


Results after training with the algorithms, we have to validate our trained algorithms with test data set and measure the algorithms performance with godness of fit with confusion matrix for validation. 70% of data as training data set and 30% as training data set confusion matrix for decision tree trained data set test data set

References predictions 0 1 0 395 71 1 45 203

References predictions 0 1 0 97 20 1 12 48

Confusion matrix for logistic regression trained data test data

References predictions 0 1 0 395 12 1 21 204

References predictions 0 1 0 97 12 1 21 47

Enhancements and reasoning predicting the survival rate with others machine learing algorithms like random forests, various Support Vector machines may improve the accuracy of prediction for the given data set.

Conclusion: The analyses revealed interesting patterns across individual-level features. Factors such as socioeconomic status, social norms and family composition appeared to have an impact on likelihood of survival. These conclusions, however, were derived from findings in the dataThe accuracy of predicting the survival rate using decision tree algorithm(83.7) is high when compared with logistic regression(81.3) for a given data set

Remember: This is just a sample from a fellow student.

Your time is important. Let us write you an essay from scratch

experts 450+ experts on 30 subjects ready to help you just now

delivery Starting from 3 hours delivery

Find Free Essays

We provide you with original essay samples, perfect formatting and styling

Cite this Essay

To export a reference to this article please select a referencing style below:

Survival Prediction for Titanic Data Using Machine Learning Algorithms. (2018, September 04). GradesFixer. Retrieved May 24, 2022, from
“Survival Prediction for Titanic Data Using Machine Learning Algorithms.” GradesFixer, 04 Sept. 2018,
Survival Prediction for Titanic Data Using Machine Learning Algorithms. [online]. Available at: <> [Accessed 24 May 2022].
Survival Prediction for Titanic Data Using Machine Learning Algorithms [Internet]. GradesFixer. 2018 Sept 04 [cited 2022 May 24]. Available from:
copy to clipboard

Sorry, copying is not allowed on our website. If you’d like this or any other sample, we’ll happily email it to you.

    By clicking “Send”, you agree to our Terms of service and Privacy statement. We will occasionally send you account related emails.


    Attention! This essay is not unique. You can get a 100% Plagiarism-FREE one in 30 sec

    Receive a 100% plagiarism-free essay on your email just for $4.99
    get unique paper
    *Public papers are open and may contain not unique content
    download public sample

    Sorry, we could not paraphrase this essay. Our professional writers can rewrite it and get you a unique paper.



    Please check your inbox.

    Want us to write one just for you? We can custom edit this essay into an original, 100% plagiarism free essay.

    thanks-icon Order now

    Hi there!

    Are you interested in getting a customized paper?

    Check it out!
    Don't use plagiarized sources. Get your custom essay. Get custom paper

    Haven't found the right essay?

    Get an expert to write you the one you need!


    Professional writers and researchers


    Sources and citation are provided


    3 hour delivery