Pssst… we can write an original essay just for you.
Any subject. Any type of essay.
We’ll even meet a 3-hour deadline.Get your price
121 writers online
Artificial Immune System is inspired by the natural immune system and mimics the body’s learning and defensive mechanism. Its application has increased in many sectors including data mining in recent years. In this paper, we describe a fuzzy AIS algorithm applied on credit scoring in the banking sector. It uses a set of fuzzy if..then rules which makes the algorithm human-understandable. The classifier model is proposed to show higher accuracy and be competitive with other classifiers.
“Buy now, pay later” is a very popular and tempting offer made by financial and retail firms which helps them in increasing their customer base. However, the need to access the risk associated with the default wherein the customer is not able to pay back the loan needs to be understood by both the parties- the lender and the customer. It is important to understand the credit obligation and to pay back what is owed by the end of the loan term. So, the question of whom to offer the loan and of how much amount has been the most prominent risk lenders needed to assess.
The ability of credit rating to improve time and decision making has always been a pivotal point and attracted researchers to be helpful to both lenders and the public. This risk has then been reduced with the advancement of technology and using various statistical methods and algorithms. The higher the credit score, more beneficial for the lender to understand customer’s worthiness. The scoring is based on a predictive modeling that assesses the likelihood of customer defaulting from financial obligation. It analyses customer’s data with other data helping us predict the probability if the customer is going to default or not.
In the past, traditional statistical methods were used to generate the credit scoring for the banks and financial institutions. With the advancement in technology and data mining concepts in past decades, new and different advance data analytical algorithms have been applied to the problem of credit scoring which is more reliable and efficient than traditional methods. There have been numerous methods including parametric statistical methods like logistic regression, non-parametric methods like decision trees and k-nearest neighbors and computing approaches like artificial neural networks(ANN). Advance hybrid methods like genetic programming and fuzzy algorithms have also been applied for predicting credit scoring for the customer.
Logistic regression has been one of the most popular tools for classification problems. Along with being suitable for different kinds of distribution, to increase its accuracy and decrease error rate several methods have been combined with the general logistic model. ANNs were the next most popular tool used with having its accuracy superior to other traditional methods especially dealing with non-linear methods. They were developed to copy the neurophysiology of the human mind to that of the non-linear regression model.
Fuzzy Logic provides a mathematical tool for interpreting and manipulating information in the way it resembles human communication and reasoning processes. It makes use of fuzzy logic rather than normal Boolean logic to reason with data. Genetic Programming extracts intelligent relationships in a system and can be viewed as a tree-based structure having if..then rules.
A recent and on-going research is on Artificial Immune System(AIS) based on natural immune systems. It is an adaptive system inspired by human immunology and observed functions and principles and hence applied to the problem-solving and predictive analysis. We are proposing a model using AIS combined with fuzzy logic to predict credit scoring for banks and financial institutions.
In the past various methods have been developed for credit scoring. However, with advanced data mining algorithms and techniques the model used have constantly been upgrading for better and accurate results. Historically, one of the conventional statistical methods Logistic Regression used mostly for classification problems . Along with being suitable for different kinds of distribution, to increase accuracy and decrease error rate several methods have been combined with the general logistic model. The generalized logistic regression model is the general form of the binary logistic model and multinomial logistic regression model . The equation for the multinomial logistic regression model is given by,
Where x is the explanatory equations and Y be the response variable and is a (p+1) vector of regression coefficients for the jth variable.
However, where non-linear data was concerned, its accuracy was noted to decrease, and ANN was suggested to overcome it.
Artificial Neural Network model was a representation of three layers- input, hidden and an output layer. The input layer processes the input features to the hidden layer which in turn calculates adequate weights by using a transfer function and hence sending it to the output layer. Combining many neurons in an interconnected system helps detect non-linear relationship in the data. It reflects superior accuracy as compared to traditional statistical methods and logistic regression. The three-layer perceptron can be depicted as shown in Figure 1.
However, they were criticized to perform poorly with small data or irrelevant attributes. Even though many methods have been proposed to deal with it, it made the model more complicated and carried limitation due to the long training process.
Fuzzy System represents and manipulates data that resembles human reasoning and communication. Fuzzy variables are characterized by its name tag, a set of fuzzy values and it’s membership function which assigns a membership value, a label to a real value within a predefined range. Its structure includes four main components- a fuzzifier which translates crisp values to fuzzy values; an interference engine which applies fuzzy reasoning mechanism to get fuzzy output; a defuzzifier which translates into crisp value and knowledge base which contains rule base and initial database . It depicted as in Figure 2.
Artificial Immune System(AIS) is one of the newest methods applied for credit scoring. It is based on models of the natural immune system. For problem-solving, it follows theoretical and observed immune functions, principles, and models. It adopts some key concepts like clone selection, mutation and affinity measure. AIS along with fuzzy logic have great advantages with respect to managing uncertainty and vagueness and provides rules easily interpretable by users .
We propose an algorithm based on clonal selection principle which is used to explain fundamental features of the adaptive immune system to antigens. The main idea is those B-cells that identify the antigens are selected to grow. In this model, there is no distinction is done between B-cells and receptor, therefore each cell is called B-cell.
The algorithm uses a large population of B-cells. Each B-cell lives to a certain age and represents a rule which is coded as a string as discussed earlier.
The classifier uses 4 step process: initialization, rule generation, rule learning, and termination. From the dataset, an initial set of B-cells is created from antigens. Rule generation phase repeatedly uses AIS algorithm to generate rules until maxIteration using clone selection which selects B-cells based on their capacity to multiply or their fitness and then hyper-mutating them and finally replacement when each B-cell matures, and their ages reach 0, die. Age of the B-cell is calculated from:
newAge = oldAge + defaultAge * fitness if newFitness>oldFitness
Rule Learning phases choose best B-cell from the population of AIS algorithm and add it to the rule set if it increases the rate of the classification. Lastly, we check the termination conditions. We can use any stopping condition for terminating the loop. If the condition is met, the learning of current class is finished, the algorithm moves to the next class. If they are not met, the process is repeated with the next population run of AIS. We can also set the number of learned rules for each class.
While using fuzzy classifier system, the fuzzy if..then rules are used for n-dimensional pattern classification problem. For eg,
Rule Rj: If x1 is A1, x2 is A2 and in is Ajn, then Class Cj with CF= CFj.
Where Rj is the label of the jth fuzzy if..then rule; A1, A2…Ajn are the antecedent fuzzy set in the interval [0,1]; Cj is the resultant class and CFj is the certainty factor of the fuzzy rule Rj.
Introducing some “don’t care” conditions helps reduce antecedent rules and makes them more human-understandable than other rules. A set of linguistic values are used in fuzzy sets having membership homogenously partitioned into triangular fuzzy sets. For a pattern classification problem, we can use any tailored membership functions in the fuzzy classifier.
To calculate grade of certainty of the fuzzy rule by few steps.
Step 1: Calculate the compatibility of each training pattern with the rule.
Step 2: Calculate the relative compatibility grades of the training patterns.
Step 3: Find a class where relative compatibility is maximum. If two or more classes have maximum value and training pattern compatible with the fuzzy rule does not exist, then the resultant class cannot be determined.
Step 4: Calculate the grade of certainty CFj.
Winner rule will have a maximum product of compatibility and grade of certainty. We can some of the following linguistic values since each fuzzy rule is coded as a String,
0: don’t care(DC), 1: small(S), 2: medium-small(MS), 3: medium(M), 4: medium-large(ML) and 5: large (L).
There is two dataset available Australian and German credit data on the UCI machine learning repository . Australian credit dataset contains 690 instances and 2 classes. Class 1 has 307 instances and class 0 has 383 instances. German dataset contains 1000 instances and 2 classes. Class 1 has 700 instances and class 0 has 300 instances.
Both the datasets must be normalized for any missing value or for feature selection to find which attribute is contributing to the higher accuracy of the model. If required, the data would be changed to meaningless for confidentiality. The accuracy would be compared to other classifiers like Logistic regression, back propagation neural network, rough sets and support vector machine based on correctly classified, misclassified and ROC curve.
The project would take two and a half months approximately to build. The first milestone would be exploring and understanding the data- source of data, data types, missing value or outliers. The second milestone would be data pre-processing and any preliminary algorithms we apply and check its corresponding accuracy. We apply feature selection, discretization or aggregation on the data and apply the AIS classifier and compare it with other classifiers. For better understanding, we also keep a Logic of Problem available on Critical Thinking site which helps us address various problems we could face while building our model.
The reason for choosing this project as my final capstone project is to showcase my understanding and interest in the field of data analysis. I am taking an advanced data mining which would add to my previous knowledge of data analysis. I am also learning about new researches and advancement done in this area which would increase my knowledge and understanding of the concepts.
To export a reference to this article please select a referencing style below:
Sorry, copying is not allowed on our website. If you’d like this or any other sample, we’ll happily email it to you.
Your essay sample has been sent.
Want us to write one just for you? We can custom edit this essay into an original, 100% plagiarism free essay.Order now
Are you interested in getting a customized paper?Check it out!