This essay has been submitted by a student. This is not an example of the work written by professional essay writers.

Detecting Phishing Website Using Associative Classification

downloadDownload printPrint

Remember! This is just a sample.

You can get your custom paper by one of our expert writers.

Get custom essay

121 writers online

Download PDF

A phishing scam is a well-known fraudulent activity in which victims are tricked to reveal their confidential information especially those related to financial information. There are various phishing schemes such as deceptive phishing, malware based phishing, DNS-based phishing and many more. Therefore in this paper, a systematic review analysis of existing works related to the phishing detection and response techniques together with apoptosis have been further investigated and evaluated. Phishing is a significant problem involving fraudulent email and websites that trick unsuspecting users into revealing private information. In this paper, we present the design, implementation, and evaluation of various techniques for detecting phishing websites. Phishing websites are fake websites that are created by dishonest people to mimic web pages of real websites. Victims of phishing attacks may expose their financial sensitive information to the attacker whom might use this information for financial and criminal activities. This paper investigates features selection aiming to determine the effective set of features in terms of classification performance.

As online technology is growing at a faster level, so have other numerous online activities such as advertising, gaming, and e-commerce. As online financial activities are on the rise, so have online fraudulent activities in which phishing is playing a major role in illegally obtaining private individual details. Phishing activities against financial institutions have become a regular occurrence leading to a rising concern about how to increase security in these sectors which could relate to banks and online shopping such as eBay and Amazon. Fraudulent schemes conducted via the Internet are generally difficult to trace and prosecute, and they cost individuals and businesses millions of dollars each year. From computer viruses to website hacking and financial fraud, Internet crime became a larger concern than ever in the 1990s and early 2000s. In response to such issue, different anti-phishing tools were developed in order to counter such illegal online activities.

As for the phishing activities, it has also been evolving on a rapid level in order to evade other anti-phishing tools that are been developed to counter the phishing tricks. Phishing emails are also known to contain links the infected website where they are asked to type in their personal information such as username and password or account details so that the website will hack the information related to whatever the user enters. A phishing email is also sent to a large number of people and the phishers will also try to count the percentage of people who read that email and entered the information. It is very difficult to find that the individuals are actually visiting an actual site or malicious site. Phishing is also understood to be a sort of brand spoofing or carding.

As a result, researchers are attempting to reduce the risk and vulnerabilities of such fraudulent phishing activities. Some researchers also define phishing as a new type of network attack. The attacker creates a replica of an existing Web page to fool users for example by using specially designed e-mails or instant messages into submitting personal, financial, or password data to what they think is their service providers’ Website. Phishing Detection using Content-Based Associative Classification Data Mining [1] In this paper it is intended to prevent a phishing using data mining technique. MCAC Algorithm is given higher efficiency towards to detect phishing activity. In MCAC algorithm does not consider content-based features of websites. It is intended to add content and page style features in that algorithm and change the system for better performance. This paper shows proposed method and flowchart. This paper also shows all the features of the website which are considered during experimental analysis.

Content-Based Approach for Detection of Phishing sites[2]. In this paper, we present the design, implementation, and evaluation of a content-based approach to detecting phishing websites. We also discuss the design and evaluation of several heuristics we developed to reduce false positives. Our experiments show that CANTINA is good at detecting phishing sites, correctly labeling approximately 95% of phishing sites.

Phishing Websites Detection based on Phishing Characteristics in the Web page Source Code [3]. In this paper, we propose a phishing detection approach based on checking the webpage source code, we extract some phishing characteristics out of the W3C standards to evaluate the security of the websites, and check each character in the webpage source code, if we find a phishing character, we will decrease from the initial secure weight. Finally, we calculate the security percentage based on the final weight, the high percentage indicates secure website and others indicates the website is most likely to be a phishing website. We check two webpage source codes for legitimate and phishing websites and compare the security percentages between them, we find the phishing website is less security percentage than the legitimate website; our approach can detect the phishing website based on checking phishing characteristics in the webpage source code.

An Associative Classification Data Mining Approach for Detecting Phishing Websites [4]. This paper proposes a new AC algorithm called Phishing Associative Classification (PAC), for detecting phishing websites. PAC employed a novel methodology in the construction of the classifier which results in generating moderate size classifiers. The algorithm improved the effectiveness and efficiency of a known algorithm called MCAR, by introducing a new prediction procedure and adopting a different rule pruning procedure.

Detection and Prediction of Phishing Websites using Classification Mining Techniques [5]. This paper investigates features selection aiming to determine the effective set of features in terms of classification performance. We compare two known features selection method in order to determine the last set of features of phishing detection using data mining. Experimental tests on a large number of features dataset have been done using Information Gain and Correlation Features set methods. Further, two data mining algorithms namely PART and IREP have been trained on different sets of selected features to show the pros and cons of the feature selection process.

Associative Classification Mining for Website Phishing Classification [6].In this article, an Associative classification (AC) data mining algorithm that uses association rule methods to build classification systems (classifiers) is developed and applied to the important problem of phishing classification. The proposed algorithm employs a classifier building method that discovers vital rules that possibly can be utilized to detect phishing activity based on a number of significant website’s features. Experimental results using the proposed algorithms and three other rule-based algorithms on real legitimate and fake websites collected from different sources have been conducted. The results reveal that our algorithm is highly competitive in classifying websites if contrasted with the other rule-based classification algorithms with respect to accuracy rate.


Financial and governmental institutes offer a variety of financial services to their clients. Online banking and online shopping become popular in the late 80’s. Nowadays, almost all banks around the globe offer many online services to their clients while online shopping became a major sector of the world economy. Phishing is a method of imitating official websites or genuine websites of any organization such as banks, institutes social networking websites, etc. The word ‘Phishing ’Initially emerged in the 1990s. The early hackers often use ‘ph’ to replace ‘f’ to produce new words in the hacker’s community, since they usually hack by phones. Phishing is a new word produced from ‘fishing’, it refers to the act that the attacker allure users to visit a faked Website by Sending them faked e-mails (or instant messages), and stealthily get victim’s personal information such as username, password, and national security ID, etc. Mainly phishing is attempted to theft private credentials of users such as username, passwords, PIN number or any credit card Details etc. Phishing is attempted by trained hackers or attackers. Another trend of approaches for detecting phishing websites relies on using a machine learning or data mining algorithm that recognize the phishing website based on a set of characteristics or features that are extracted from the website. The features are recognized by experts to be distinguishing characteristics of a phishing website (e.g., uniform resource locator (URL), the age of domain). According to these approaches, phishing is a pattern recognition problem that can be solved by choosing the “right” set of features and a “suitable” pattern discovery or recognition algorithm.

CANTINA is a content-based approach to detect phishing websites, based on the term frequency-inverse document frequency (TF-IDF) information retrieval algorithm. CANTINA examines the content of the page to determine whether the site is phished website or not.

CANTINA included several rules in this proposed model.

Age of Domain

This heuristic is used to check whether the age of the domain name is greater than 12 months or not. Initially, the phishing site’s lifespan is 4.5 days but now the heuristic does not account for phishing sites based on existing websites where criminals have broken into the web server, nor does it account for phishing sites hosted on otherwise legitimate domains, for example in space provided by an ISP for personal homepages.

Suspicious URL

In this heuristic check whether the page’s URL contains the symbol ‘@’ or ‘-‘ because ‘@’ symbol in the URL indicates that the string in its left side can be discarded and consider only right part 59 of the string after the symbol. An ‘-symbol is rarely used in the legitimate sites.

Suspicious Links

This heuristic checks whether the links in the page satisfies the above condition or not. If it satisfies the condition then it is marked as a suspicious link.

IP Address

It will check whether the given URL contains IP address as its domain or not.


All images on the website including website logo should load from the same URL of the website, not from another website, so all links should be internal links, not external links. Therefore, we check the links to detect any external links inside the source code.


TF – IDF stands for Term Frequency-Inverse Document Frequency, and the TF-IDF weight is a weight often used in information retrieval and text mining. This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus. The importance increases proportionally to the number of times a word appears in the document but is offset by the frequency of the word in the corpus. Variations of the TF-IDF weighting scheme are often used by search engines as a central tool in scoring and ranking a document’s relevance given a user query.

Typically, the TF-IDF weight is composed by two terms:

The first computes the normalized Term Frequency (TF). The number of times a word appears in a document, divided by the total number of words in that document;

The second term is the Inverse Document Frequency (IDF), computed as the logarithm of the number of the documents in the corpus divided by the number of documents where the specific term appears.

TF: Term Frequency.

The (TF) which measures how frequently a term occurs in a document. Since every document is different in length, it is possible that a term would appear much more times in long documents than shorter ones. Thus, the term frequency is often divided by the document length (aka. the total number of terms in the document) as a way of normalization:

TF (t) = (Number of times term t appears in a document) / (Total number of terms in the document).

IDF: Inverse Document Frequency

The (IDF) which measures how important a term is. While computing TF, all terms are considered equally important. However, it is known that certain terms, such as “is”, “of”, and “that”, may appear a lot of times but have little importance. Thus we need to weigh down the frequent terms while scaling up the rare ones, by computing the following: IDF (t) = log_e (Total number of documents / Number of documents with term t in it).


WHOIS (pronounced as the phrase who is) is a query and response protocol that is widely used for querying databases that store the registered users or assignees of an Internet resource, such as a domain name, an IP address block, or an autonomous system, but is also used for a wider range of other information. The protocol stores and delivers database content in a human-readable format.

Locating Phishing Server:

URL is nothing but IP Address.

Using IP address our system will locate phishing server.

Phishing is a significant problem involving fraudulent email and websites that trick unsuspecting users into revealing private information. Here, we present the design, implement, and evaluated the CANTINA and TF-IDF techniques for detecting phishing websites. The first module i.e. user module has been put in work and required changes have implemented.

Remember: This is just a sample from a fellow student.

Your time is important. Let us write you an essay from scratch

experts 450+ experts on 30 subjects ready to help you just now

delivery Starting from 3 hours delivery

Find Free Essays

We provide you with original essay samples, perfect formatting and styling

Cite this Essay

To export a reference to this article please select a referencing style below:

Detecting Phishing Website Using Associative Classification. (2018, April 21). GradesFixer. Retrieved May 21, 2022, from
“Detecting Phishing Website Using Associative Classification.” GradesFixer, 21 Apr. 2018,
Detecting Phishing Website Using Associative Classification. [online]. Available at: <> [Accessed 21 May 2022].
Detecting Phishing Website Using Associative Classification [Internet]. GradesFixer. 2018 Apr 21 [cited 2022 May 21]. Available from:
copy to clipboard

Sorry, copying is not allowed on our website. If you’d like this or any other sample, we’ll happily email it to you.

    By clicking “Send”, you agree to our Terms of service and Privacy statement. We will occasionally send you account related emails.


    Attention! This essay is not unique. You can get a 100% Plagiarism-FREE one in 30 sec

    Receive a 100% plagiarism-free essay on your email just for $4.99
    get unique paper
    *Public papers are open and may contain not unique content
    download public sample

    Sorry, we could not paraphrase this essay. Our professional writers can rewrite it and get you a unique paper.



    Please check your inbox.

    Want us to write one just for you? We can custom edit this essay into an original, 100% plagiarism free essay.

    thanks-icon Order now

    Hi there!

    Are you interested in getting a customized paper?

    Check it out!
    Don't use plagiarized sources. Get your custom essay. Get custom paper

    Haven't found the right essay?

    Get an expert to write you the one you need!


    Professional writers and researchers


    Sources and citation are provided


    3 hour delivery