Pssst… we can write an original essay just for you.
Any subject. Any type of essay.
We’ll even meet a 3-hour deadline.Get your price
121 writers online
Internet has occupied a larger space in the life of mankind. It has become prominent in almost all sectors worldwide. The basic advantage of Internet is fast communication and quick transfer of information through various modes.
As with the changing technology ,Internet is used not only for gaining knowledge but is also used for communication purpose. It has become a medium for exchanging or expressing ones ideas. Mostly the current scenario is that people use Social Networking sites as medium of connecting with other people and to share information with them. A Social Network is a wide network of individuals which are interconnected by interpersonal relationships.
As a lot of data is used by the individuals for exchanging information in the form of pictures, videos etc. The generated data is known as social network data. This data helps to determine various aspects of the society. Data mining is the process of inspecting data from different perspective for finding the unknown things. One of the significant task of data mining, helps in the discovery of associations, correlations, statistically relevant patterns, causality, emerging patterns in social networks is known as Association Rule mining
Earlier people used to communicate either verbally or non verbally.Non verbal communication takes place through writing letters in newspapers or making draft etc. This communication has certain limitation and is a bit confined.
There were less or not much means for nonverbal communication. The effect of Internet which is also known as network of networks made people to gain information globally in various aspects.Initially the only use of web was to gather information and to share it.Now a days, Internet has occupied a larger space in the life of mankind. It has become prominent in almost all sectors worldwide.
The basic advantage of Internet is fast communication and quick transfer of information through various modes. In due course of time the need of collecting the information to share, contribute and to impact went on increasing and ultimately gave a spark in gathering, analyzing and channelizing the huge data in a precise manner. Data creation, data collection, storage, retrieving and presentation became part and parcel of the people related to knowledge society. Eventually
Internet is not only a medium for gaining knowledge but is now used as medium of communication too. Presently, millions of people use Internet as a medium to express their ideas and share information. Most of the people use Social Networking Sites or blogs for connecting with people and sharing information. Social networking has thus escalated around the world with noteworthy speed. Many Social Networking sites are now available such as Facebook, Twitter etc. Facebook had more than 1.44 billion active users in the year 2015.
This results in a drastic boom in emerging of Social Sites. For Example Twitter is one among such social networking sites which became popular in a short span of time due to its simple and innovative features like tweets which are short text messages. These tweets are much faster and are used to collect various information. There are millions of tweets everyday which are used for gathering information that could help in making decision.
A Social Network is basically a network of individuals connected by interpersonal relationships. A social network data refers to the data generated from people socializing on this social media. This user generated data helps to examine several assets of the socializing community when analyzed and mined. This can be accomplished by Social Network Analysis. Relationship mapping and measuring is known as Social network analysis[SNA]. Thus SNA plays a decisive role in portraying various assets of the socializing community.
Various data from various social networking sites are stored in files and other repositories, which helps us to analyze and interpret such a huge amount of data together which gives us many interesting knowledge which could help us in making further decisions . Data Mining also known as Knowledge discovering process is the process for finding the unknown insights by analyzing data from different perspective. Here the patterns are discovered in large datasets. The information is extracted from a dataset and remolded..Thus data mining and knowledge discovery in databases (or KDD) are used as substitutes to each other but data mining is the actual process of the knowledge discovery process.
One of the significant task of data mining,which helps in the discovery of associations, correlations, statistically related patterns, causality, emerging patterns in social networks is done by Association Rule mining.
Another mining technique known as Frequent items sets mining play an important role in many Data Mining tasks.
Frequent item sets mining plays a significant role in many Data Mining tasks that try to discover interesting patterns from databases such as association rules, correlations, sequences, classifiers and clusters. The mining of association rules is one of the prominent problems of all these. The recognition of sets of items, products, manifestation and peculiarity, which often appear together in the given database, can be seen as one of the most primitive tasks in Data Mining.
For example, the association rule bread, potatoes->sandwich would reveal that if a customer buys bread and potatoes together, they are likely to also buy sandwich. Here, bread and potatoes is support and sandwich is confidence. Such knowledge can be used for decision making purpose. Consider a social network environment that collects and shares user-generated text documents (e.g. discussion threads, blogs, etc.). It would be profitable to know what are the words people use generally in a discourse related to a specific topic, or what set of words are often used together. For example, in a discussion topic related to ‘American Election’, the frequent use of word ‘Economy’ shows that economy is the most important aspect in the bureaucratic habitat.
Hence, a frequent item set of count one could be a good marker of central discussion topic. Likewise, frequent item set having count or length two can show what the other important factors are. Therefore, a frequent item set mining algorithm run on set of text documents produced over a social network can display the central topic of discussion and pattern of usage of words in discussion threads and blogs. With the exponential growth of social network data towards a terabyte or more,it has been more difficult to analyze the data on a single machine. Thus the Apriori algorithm  which is one of most well-known methods for mining frequent itemsets in a transactional database is proving inefficient to handle the ever increasing data. To deal with this problem, the MapReduce framework , which is a technique for cloud computing, is used.
Hadoop is an open-source platform licensed under the Apache v2 license that provides analytical technologies and computational power required to work with large volumes of data. Hadoop framework is built in such a way that it allows user to store and process big data in distributed environment across many computers connected in cluster using simple programming models.
It is designed in such a way that one can manage thousands of machines from a single server, with a facility of storage and local computation. It breaks data into manageable chunks, replicates them and distributes multiple copies across all the nodes in a cluster so that one can get its data processed quickly and reliably later. Rather than relying on hardware to deliver high-availability, the Apache Hadoop software library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers.Hadoop is also used to conduct analysis of data .The core components of the Apache Hadoop consist of a storage part, known as Hadoop Distributed File System(HDFS) and a processing part called as MapReduce.
Methods of discovering relation between variables in large databases is called Association Rule Mining. It was introduced for checking regularity between products in large scale transaction by point-of-scale (POS) system by Rakesh Agrawal. This was based on Association rule.
For eg:-bread, tomatoes, mayonnaise directly refer to a sandwich. According to various sales data of supermarket implies that if a customer purchases tomato and mayonnaise together , he might also buy a sandwich. For Making decisions this data can be used.
T. Karthikeyan and N. Ravikumar,in their paper concluded after reviewing and observing .They give a conclusion that a lot of attention and focus was given to performance and scalability of algorithms, but not given to the quality of the rule generated. According to them ,the algorithm could be enhanced for deducting the execution time, complexity and would also improve the accuracy. Further, it is concluded that more focus is needed in the direction of designing an efficient algorithm with decreased I/O operation by reducing database scanning in the association rule mining process.
This paper gives a theoretical survey on some existing algorithms of association rule mining. The concept behind this is provided at the beginning followed by some overview of research works
This paper aims for giving a theoretical survey on some of the algorithm of association rule mining. The pros and cons of the same are discussed and concluded with an inference.
Rakesh Agrawal and Ramakrishnan Srikant proposed a concept of seed set for generating new large item sets which were called candidate itemsets which counted the actual support for these at the end of the pass until no new large sets were found. These two algorithms for finding the association rules between items in a large database of sales transaction were named as Apriori and AprioriTid.
J. Han, J. Pei, and Y. Yi developed a systematic FP-tree-based mining method called as FP-growth for mining recurring patterns which were based on fragment growth concept. The problem was tackled in 3 aspects: mainly the data structure called as FP-tree, where only recurring length items will have nodes in the tree, They also originated FP-tree based pattern which surveyed its conditional base and then constructed its FP-tree and performed mining periodically with a such a tree. Also, the divide and conquer method was used instead of bottom-up approach search technique.
A new strategy for mining frequent itemsets from terabyte scale datasets on cluster systems was developed by S. Cong, J. Han, J. Hoeflinger, and D. Padua which concentrated on the idea of a sampling based framework for parallel data mining.
The entire idea of the purposeful data mining was included in the algorithm. The processors performance, memory hierarchy and available network was taken into account. This developed algorithm was the fastest sequential algorithm which could extend its work in parallel setting and hence it utilized all the resources which were made available in an effective way.
A new narration for data mining was introduced by P. V. Sander ,W. Fang, K. K. Lau which utilized new-generation graphic processing units(GPU ) known as GPUMiner. The system was dependent on the massively multi-threaded SIMD (Single Instruction, Multiple-Data) architecture given by GPUs. GPU miner consists of three components: Buffer manager and CPU-based storage that handled data and I/O transfer between the Graphical Processing Unit and Central Processing Unit, It also integrated CPU-GPU co-parallel processing mining module, and at last it included a mining visualization module which was based on GPU.
The two FP-Tree based techniques, a lock free dataset tiling parallelization and a cache-conscious FP-array were proposed in “Optimization of recurring itemset mining on multiple-core processor”, which dealt with low utilization of multi core system and effectively improve the data locality performance and uses hardware and software prefetching.Also the FP-tree building algorithm can be reapproached by lock free parallelization algorithm.
To divide the recurring itemset mining task in the top down approach, C. Aykanat, E. Ozkural, and B. Ucar developed a distribution scheme which was based on database transaction.This method operates on a graph where vertices match the recurring item and edges match the recurring itemsets of size 2.A vertex separator separates this graph so that the distribution of the items can be decided and mined independently . The two new mining algorithm were developed from this scheme. The items that corresponds to the separator are recreated by these algorithms. One of the algorithm recreates the work and the second computes the same.
The MapReduce mode based algorithm studied on which was used by the association rules. The Algorithm performance is made ineffective because of single memory and limited CPU resources. The paper developed by S. Ghemawat and J. Dean describes the improved Apriori Algorithm which can manage massive datasets with a huge number of nodes on Hadoop platform and can study many problems which are usually larger and multi-dimensional datasets.
For Cloud Computing , Jongwook Woo and Yuhang Xu,proposed a Market Basket Analysis (key,value) pair algorithm and the code of which can be executed on Map/Reduce platform.
The joining function technique was used by this algorithm to produce paired items. The transaction is sorted in the alphabetical order before generating the (key,value) pair to avoid errors.
A new effective miming of recurring itemsets based on Map/Reduce framework was proposed by Nick Cercone and Zahra Farzanyar,  this framework was thus used in Social Network Data. This improved Map/Reduced Apriori Algorithm reduces the no. of partial recurring itemsets and improves the processing time of the itemsets generated during the Map and Reduce phase.
Using genetic algorithm Nirupma Tivari and Anutha Sharama presented a survey of Association Rule Mining in which the techniques are classified based on different approaches. Extreme robustness in mining the Association rules of GAs was used. Desired rules were included into results which were generated when the technique was applied on the synthetic database.Desrired Rules constitutes of the rules containing the general rules and the negation of attributes yielded from the Association Rule Mining. Major changes were needed to enhance the complexity reduction of the above mentioned algorithms using distributed computing.
The paper proposed by D. Kerana Hanirex and K.P. Kaliyamurthie improved the effectiveness of finding recurring itemsets with the help of genetic algorithm. Here, initially the population is created which consists of random transactions which were generated .This Algorithm then continually transforms the population by executing the steps of fitness evaluation, replacement, selection and recombination.
Till date, Arvind Jaiswal and Gaurav Dubey proposed the best association rule and their optimization using generic algorithm. Here the population is transformed continually by executing the following steps: Firstly Fitness evaluation is done to calculate the fitness of each individual , then Selection is done to choose the individual from the current population as parents are involved in the recombination ,then in recombination, new offspring (individuals) are produced from parents with the help of generic operators : crossover and mutation, At last , replacement of some offspring is done with some other individuals, usually their parents.
Algorithm which is used for automated extraction of large datasets of the Association Rule. The output of the paper was optimized with multiple quality measures like interestingness, confidence, support and comprehensibility.
Improved Cluster Based Association Rules (ICBAR) was proposed by Reza Sheibani and Amir Ebrahimzadeh. This mining algorithm can effectively explore large itemsets, This method lessens the large number of candidate itemsets . Also it compares the data with partial cluster table.
To extract coalition between the details of Social Network Data.
To design a programming model which can work in parallel form using MapReduce
The MapReduce Framework is used by Apriori Algorithm to find recurring itemsets which is generated by Social Network data and Genetic Algorithm is used for generating association rules from the recurring itemsets.
With the rapid increase in Telecom Industry, most of the users can now be able to easily access the Internet which indirectly increasing the popularity of Social Networking sites. Social networking is increasing at a tremendous rate. These sites contains a huge amount of data. So, Mining of data is useful. Our Developed System is fast as it exhibits parallel processing. The developed system finds association rules by using EAMRGA Algorithm. For Optimization purpose it uses Genetic Algorithm for finding optimized and relevant association rules. The experimental work shows that the efficiency of the implemented algorithm was improved by 39% and accuracy of the given rule obtained was increased by 25%. As a future work
We will face the data in range of terabytes, handling of that much amount of data and reducing the processing time will be the main goal. Hierarchical or parallel approach can be used to handle this job and can be implemented with large scale use of the features produced by Hadoop.
To export a reference to this article please select a referencing style below:
Sorry, copying is not allowed on our website. If you’d like this or any other sample, we’ll happily email it to you.
Your essay sample has been sent.
Want us to write one just for you? We can custom edit this essay into an original, 100% plagiarism free essay.Order now
Are you interested in getting a customized paper?Check it out!