Pssst… we can write an original essay just for you.
Any subject. Any type of essay.
We’ll even meet a 3-hour deadline.
121 writers online
Cloud Data Storage is a service where data is remotely maintained, managed, and backed up. The service allows the users to store files online so that they can access them from any location via the Internet. Cloud computing and many users expect that cloud computing will reshape information technology processes. The huge amount of data is stored in the cloud which needs to be retrieved efficiently. The retrieval of information from cloud takes a lot of time as the data is not stored in an organized way. Data mining is thus important in cloud computing. We can integrate data mining and cloud computing (Integrated Data Mining and Cloud Computing– IDMC) which will provide agility and quick access to the technology. With the cloud computing technology, users use a variety of devices, including PCs, laptops, smartphones, and PDAs to access programs, storage, and application-development platforms over the Internet, via services offered by cloud computing providers. Advantages of the cloud computing technology include cost savings, high availability, and easy scalability. Thus in this presented work, a survey is introduced for cloud data storage and their cluster analysis for utilizing the data into various business intelligence applications. This paper suggests a new model of cluster analysis of data is proposed which provides the clustering as service.
The large volume of data is stored in the cloud environment and needs to be retrieved efficiently. The retrieval of information from cloud takes a lot of time as the data is not stored in an organized way.
Data Clustering is a technique of analyzing data and extraction of meaningful patterns from the raw sets of data. The meaningful is termed here to indicate the patterns or knowledge recovered from the training samples which is further used to identify the similar pattern which belongs to the learned pattern. In the data clustering, two main kinds of learning techniques are observed namely supervised learning technique and unsupervised learning technique. These learning models are used to evaluate data and create a mathematical model for utilizing to identify the similar data patterns arrived for classifying them in some pre-fined groups.
In supervised learning technique the data is processed with their class labels and here the class labels are working as a teacher for learning algorithm. On the other hand in unsupervised learning technique the data not contain the class labels to utilize as the teacher. Therefore using the similarity and dissimilarity of the input training samples the data is categorized. Therefore the supervised learning processes are known as the classification of data and the unsupervised learning techniques are supporting the cluster analysis of data. In this presented work the unlabelled data is used for analysis, therefore, the data analysis technique is used as the cluster analysis. Clustering is the unsupervised classification of patterns or input samples. That can use classify observations, data items, or feature vectors into groups. These groups are in data mining is known as the cluster analysis of data. In the case of clustering, the problem is to group a given collection of unlabelled patterns into meaningful clusters. In a sense, labels are associated with clusters also, but these category labels are data-driven; that is, they are obtained solely from the data.
Clustering technique background.
Clustering is a most popular data mining technique used to find a useful unknown pattern from data in the large repository. Clustering is Grouping of data into different clusters such that elements belong to the same cluster are most similar while elements belong to the different cluster are dissimilar. Basically, Clustering methods are divided into two broad categories. i) Hard clustering ii) Soft Clustering. In Hard Clustering, each document can belong to only one Cluster. Hard Clustering is also known as exclusive clustering. In Soft Clustering, the Same document can belong to more than one group. It is also known as Overlapping Cluster technique.
Raw versus clustered data.
This section provides the overview of the introduction of data clustering and the selected domain for study in data storage. In the next section, the different kinds of clustering algorithms are learned for understanding the technique behind the cluster analysis.
Types of clustering technique.
There are a significant amount of clustering algorithms and methods are available some essential techniques are described:
Partitioning Method. In this clustering approach then numbers of data or objects are provided, and k number of partitions are required from the data but the number of partition is such that k=n. This means the partitioning algorithm will generate k partitions satisfying below condition: a. Each group have minimum one object. b. Each object should be a member of exactly one group. 2. Hierarchical Methods. Hierarchical method generates hierarchically manner of clusters organization. That can be achieved using the following manner:
Agglomerative Approach.It follows the bottom-up approach. Firstly, it generates separate group for each object of data. Next, it merges these groups on the basis of closer similarities. This process is repeated till the entire crowd of groups are not combined in a single or until the termination condition holds.
Divisive Approach. It follows the top-down approach. Process starts with a single cluster having all data objects. Then, it continues splitting the bigger clusters into smaller ones. This process continues until the termination condition holds. This method is inflexible that is after merge or split is finished, It can never be negated.
Density-Based Methods.This technique uses the perception of density. The main design is to keep expanding the cluster until the density of neighborhood reaches certain threshold i.e. within a given cluster, the radial span of a cluster must possess a certain number of points for each data points.
Grid-Based Method.This method quantizes the object space into a large no. of cells which together nurture a grid. The method having the flowing advantages: • Primary benefit the method provides is its fast processing. •The only dependability is relying upon the no. of cells in object space.
Model-Based Methods. In the Model-based scheme, a model can be conjectured for every cluster along with that; it then identifies data fitting best into that model. This method supplies a means to automatically reveal a number of clusters derived from the standard statistics, considering outlier or noise. As a result, it creates robust clustering methods.
Constraint-Based Method. It performs clustering on the basis of constraints either application oriented or user-oriented. These constraints are actually the prospect or properties of the desired clustering results. These constraints make communication with the clustering process easy.
One of the cloud services that are being offered is a storage method for the data. Earlier to the concept of cloud computing important industrial data used to be stored internally on the storage media . From music files to pictures to sensitive documents, the cloud invisibly backs up all the files and folders and removes the need for an endless and costly search for extra storage space. When there is enormous data, storage cloud alleviates buying an external hard drive or deleting old files to make room for the new ones. Thus many organizations have entered in the cloud environment for the storage service. These organizations pay for the amount of space they use in the cloud. Cloud storage is convenient and cost-effective. It works by storing the files on a server somewhere on the internet rather than on the local hard drive. This allows backing up, sync, and accessing data across multiple devices as long as users have internet capability.
In cloud computing, various researches have been made to improve the performance of cloud computing. Various data mining algorithms have been applied in various ways to manage the huge amount of data in the cloud. The related works in this field are: Bhupendra Panchal and R.K Kapoor  proposed clustering and caching methodologies for improving the performance. The main idea is to make replicas of data available at each data centers, so even if one data center goes down, everything in the second data center is clustered with the first. Kashish Ara Shakil and Mansaf Alam  proposed an approach that provides management of cloud data through clustering and uses a k-median as clustering technique. A. Mahendiran et al  proposed the implementation of a k-means clustering algorithm in cloud computing for large datasets. Kriti Srivastava  proposed the implementation of agglomerative hierarchical clustering algorithm to enable the benefits such as scalability, elasticity and handling large datasets.
PROPOSED MODEL IMPROVING SUPERVISED LEARNING ALGORITHMS WITH CLUSTERING
Clustering is an unsupervised machine learning approach, but can it be used to improve the accuracy of supervised machine learning algorithms as well by clustering the data points into similar groups and using these cluster labels as independent variables in the supervised machine learning algorithm. Let’s check out the impact of clustering on the accuracy of our model for the classification problem using 3000 observations with 100 predictors of stock data to predicting whether the stock will go up or down using R. This dataset contains 100 independent variables from X1 to X100 representing profile of a stock and one outcome variable Y with two levels: 1 for rise in stock price and -1 for drop in stock price.
We have discussed what are the various ways of performing clustering. It finds applications for unsupervised learning in a large no. of domains. You also saw how you can improve the accuracy of your supervised machine learning algorithm using clustering.
Although clustering is easy to implement, you need to take care of some important aspects like treating outliers in your data and making sure each cluster has sufficient population. The proposed method has advantages like it provides fast access to data, provides the statistics of usage of cloud storage space, scalability and helps in mining large data sets which are heterogeneous in nature. Future works for the proposed model is to apply other clustering algorithms in the cloud storage and compare the results to find the best clustering algorithm for cloud storage.
We provide you with original essay samples, perfect formatting and styling
To export a reference to this article please select a referencing style below:
Sorry, copying is not allowed on our website. If you’d like this or any other sample, we’ll happily email it to you.
Attention! this essay is not unique. You can get 100% plagiarism FREE essay in 30sec
Sorry, we cannot unicalize this essay. You can order Unique paper and our professionals Rewrite it for you
Your essay sample has been sent.
Want us to write one just for you? We can custom edit this essay into an original, 100% plagiarism free essay.Order now
Are you interested in getting a customized paper?Check it out!