# Black Friday Consumer Analysis

## Introduction

Black Friday has been the busiest shopping day of the year in the United States of America for many years. Many stores put or offer highly promoted sales on Black Friday and they also make sure they open their stores early to have the best by the end of the day. Black Friday is called to be a shopping day due to various kinds of reasons.

It is like an unofficial beginning for the Christmas shopping season. The Black Friday deals are not only on the day of the eve, but many retailers including Amazon, offer deals earlier and much more earlier upstaging Black Friday itself. Competition increases every year so as the deals and the profit.

As Black Friday is the highest sales day, we end up with a huge data. Hence, we chose this dataset from Kaggle to analyze the transactions made in a retail store on the day of eve. We help the store to understand the trends based on the various categories and also help predict the (dependent variable) amount of purchase with the help of the given data (independent variables), using linear regression.

## PROBLEM STATEMENT

Our key objective is to analyze the data attributes provided by the dataset we have currently chosen i.e. the Black Friday sale from Kaggle. The data attributes are unique, and their nature and distribution must be first explored. Since we are dealing with a sensitive experiment, we don’t restrict our solution to a statement but rather performance analysis using exploratory data analysis portraying different types of visualizations and also using linear regression analysis to predict the purchase amount against the gender variable. Before we begin the process we first summarise the data thoroughly and indicate key attributes and find the correlations that exist among them. Once we have a thorough idea about the distribution and type of data we are about to display, it becomes easy for us to fit and understand the model and its outcome.

## DATA INFORMATION AND DESCRIPTION

The data set we chose is a sample of transactions made in a retail store. The retail store wants to know better the customer purchase behavior against different products. We are trying to solve the regression problem where we need to predict the dependent variable with the help of the information contained in other variables.

So, this dataset comprises of 550,000 about black Friday in retail stores. It contains different kinds of variables either numerical or categorical. It also contains missing values.

The dataset has a total of 12 attributes. They are as follows: –

• User_ID (Username)
• Product_ID (Product)
• Gender (Male;female)
• Age (Age in years)
• Occupation (Id occupation for each customer)
• City_Category
• Stay_in_Current_City_Years
• Marital Status
• Product_Category_1
• Product_Category_2
• Product_Category_3
• Purchase (Purchase amount in Dollars)

We have clearly described the data and the range of values it can hold. As we can see, the data attributes are a mix of numerical and categorical types such as age, gender, occupation, marital status, etc.

We have 550,000 observations which have been recorded and are mentioned above. So, first, we need to check for any missing or uncleaned data. So we use is.na function and check for any NULL values. As it returns False, we are now assured that there are no NULL values present in the data frame. Now we proceed with initializing the data and summarizing the distribution and key functions.

## EXPLORATORY DATA ANALYSIS AND VISUALIZATIONS

EDA is a critical process of performing an initial investigation on data so as to discover different kinds of variations & patterns to spot anomalies, to test hypotheses, and to check assumptions with the help of summary statistics and graphical representations.

The first EDA we perform in this analysis is about the gender of the consumers. It explains which gender has more number purchases for the Black Friday sales. It was found that the consumers on black Friday were dominated by men. There were about 4000 men and women were just about a thousand members.

The second EDA we perform in our analysis is about the Age of the customers. It explains which age group has done more number purchases for the Black Friday sales. The next EDA we perform in our analysis is about the location of the customers. It explains which city has done more number purchases for the Black Friday sales. Interestingly, we also explain another aspect here i.e., which city has made the highest purchase amount. The next EDA we perform in our analysis is about the Occupation of the customers. It explains which age group has done more number purchases for the Black Friday sales. The last EDA we perform in our analysis is about the Purchase amount of the customers. It explains the distribution of purchase amount has done more number purchases for the Black Friday sales.

## LINEAR REGRESSION ANALYSIS

Regression analysis is a statistical method that allows you to examine the relationship between two or more variables. The process of performing a regression allows us to confidently determine which factors matter the most, which factors can be ignored, and how well these factors influence each other

To understand regression analysis fully, it is essential to know the following terms: –

Dependent Variable: – This is the major factor which we are trying to understand or predict.

Independent Variable: – These are the factors that we have to hypothesize have an impact on our dependent variable.

So, to perform the regression analysis, we have to define a dependent variable that we hypothesize is driven by independent variables.

## CONCLUSION

Overall, we have discovered some insights from our EDA of this Black Friday dataset. We analyzed how consumers at the store were distributed over numerous classifications, for example, Gender, Age, Occupation, Stay in Current City, and so on. We have also figured out who are the top purchasing consumers on Black Friday and further arranged the products into ‘best sellers’ and ‘ worst sellers.’ Also, we have identified different metrics in regard to Purchases made on Black Friday including the average sum spent by the consumers and the total amount over various classifications.

After the EDA, we moved into regression analysis and predicted a model for the store on Black Friday with the help of linear regression. We have discovered multiple variables which have created an impact on the dependent variable i.e., the purchase amount.

