By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email
No need to pay just yet!
About this sample
About this sample
Words: 1602 |
Pages: 4|
9 min read
Published: Jul 10, 2019
Words: 1602|Pages: 4|9 min read
Published: Jul 10, 2019
Abstract
In recent years data clustering by metaheuristic method become popular in the data mining field. All these methods suffer from an optimization problem that addressed in this paper. The problem occurs when the cluster centroids come from an individual of population (particle in this paper) do not play role of center of cluster. We use gravity law to solve this problem. After each particle clusters data, the centroids are moving to center of mass of data in a desire cluster by gravity law process. In gravity law process each data in a cluster force to cluster centroid and pull it to center of mass of cluster. Particles are evaluated after this enhancement by a selected internal Clustering Validation Index (CVI). We examine some CVIs and found Xu, Du and WB are the most accurate CVIs. Proposed method compared with some clustering methods include Particle Swarm clustering methods and familiar clustering methods by Jacard index. The result shows our method works more accurate.
Introduction
The purpose of clustering is to group the same samples together into a cluster and different samples in different clusters. Various methods have been proposed for data clustering. These methods are divided into different branches. Partitioning, hierarchical, density-based, and network based approaches can be used as the main clustering methods. Partitioning methods are considered much, and the most popular method is clustering is the K-mean method(Jain, 2010). K-means method has some disadvantages, the most basic of which is: every desired objective function cannot be used, there is a possibility of getting in the local optima and numbers of clusters must be specified from the beginning. The objective function of K-Means only takes into account the distance within the cluster, but does not care about the distance between clusters. On the other side many cluster validity indices (CVIs) have been introduced which takes into account both inter cluster distance and between cluster distances. So we can use these CVIs as objective function of a clustering method for the first problem mentioned above. For second problem we can use general optimizer that rarely stuck in local optima. If an optimizer can chose the best number of clusters according to objective function, the last problem is solved. Meta heuristic methods like Particle Swarm Optimization (PSO)(van der Merwe & Engelbrecht, 2003) and its variations (Cura, 2012; Valente de Oliveira, Szabo, & de Castro, 2017), Genetic Algorithm (GA) (Maulik & Bandyopadhyay, 2000) and its variations, Bee Colony Optimization (ACO) (Ozturk, Hancer, & Karaboga, 2015; Yan, Zhu, Zou, & Wang, 2012) and Gravity Search Algorithm(GSA) (Dowlatshahi & Nezamabadi-pour, 2014) were proposed for these problems. All these methods suffer from another problem that we addressed in this paper.
The problem happens when the cluster centroids are from an individual of population do not play role of centers of cluster. For example in fig1 you can see 3 clusters, and 2 type of centroids (squares and circles) which are extracted from 2 different particles in PSO. If we use these particles, the clustering result of two particles exactly similar, but the finesses of two particles are different. It means particles do a particular clustering may get different fitness and it cusses some problem in optimization. This problem effects on population diversity even exploration and exploitation of optimizer.
1-2 Psc
The basic form of the PSO algorithm was introduced in (Kennedy & Eberhart, 1995) and later modified in (Shi & Eberhart, 1998). In the algorithm, a swarm of S particles flies stochastically through an N-dimensional search space, where each particle's position represents a potential solution to an optimization problem. Each particle p with current position xp and current velocity vp remembers its personal best solution so far, bp. The swarm remembers the best solution globally achieved so far, bS. The particles experience attraction toward the best solutions and, after some time, the swarm typically converges to an optimum.
Due to its stochastic nature, PSO can avoid some local optima. However, for the basic form of the PSO algorithm, premature convergence to a local optimum is a common problem. Therefore, several modifications or extensions of the basic form have been introduced (Poli, Kennedy, & Blackwell, 2007), like the Perturbed PSO (Xinchao, 2010), Orthogonal Learning PSO (Zhan, Zhang, Li, & Shi, 2011), or different local neighborhood topologies, e.g., the Fully Informed PSO (Mendes, Kennedy, & Neves, 2004).
In clustering, like other PSO applications, each particle's position should represent a potential solution to the problem. Most often this is realized by encoding the position of particle p as xp = {mp,1, …, mp,j, …, mp,K}, where mp,j represents the jth (potential) cluster centroid in an N-dimensional data space and K is numbers of clusters. Each element of the K-dimensional position of the particle, xp, is now an N-dimensional position in the data space. Also, different particle encoding have been proposed like partition-based encoding (Jarboui, Cheikh, Siarry, & Rebai, 2007), where each particle is a vector of n integers, n is numbers of data items to be clustered, and the ith element represents the cluster label assigned to item i, i ∈ {1, …, n}.
The major limitation of the proposed method was the need to manually define numbers of clusters, K, a prior. Another clustering technique proposed in (Omran, Salman, & Engelbrecht, 2006) overcame this limitation by using binary PSO to select which of the potential particle centroids should be included in the final solution, but in this technique the K-means algorithm was used to refine centroid positions.
The particle encoding used for PSO clustering was proposed in(Das, Abraham, & Konar, 2008). Given a user defined maximum number of clusters, Kmax, the position of particle p is encoded as a Kmax + Kmax * N vector xp ={Tp,1, …, Tp,kmax, mp,1, …, mp,j, …, mp,Kmax } , where Tp,j, j∈ {1, …, Kmax } is an activation threshold in the range of [0, 1] and mp,j represents the jth (potential) cluster centroid. If Tp,j> 0.5, the corresponding jth centroid is included in the solution. Otherwise, the cluster defined by the jth centroid is inactive. The minimum cluster number is defined to be two. If there are less than two active clusters in a solution, one or two randomly selected activation thresholds, Tp,j
1-3 gravitational clustering
A method is introduced in paper (Bahrololoum, Nezamabadi-Pour, & Saryazdi, 2015) that used Newton's law of universal gravitation for clustering. It is assumed each data point, Xi = (Xi,1, …, Xi,n) located in an N-dimensional space, where N is numbers of features. The clusters are compact and a point representative (centroid) is used to present each cluster. The main idea in the proposed algorithm is to consider a movable gravity object (agent) as the centroid of a cluster and each data point as a fixed gravity object. In this gravity system, the fixed objects apply the gravity force to the agents and change their positions in the feature space. We expect to obtain the optimum positions for cluster centroids when the forces applied to the agents approach zero. The algorithm can deal with the noisy data and outliers and it has a good performance to handle unbalanced groups. This is due to Newton's gravity law in which the gravity force between two objects is inversely proportional to the square of the distance between them. Therefore, the noisy data points and outliers which are far from the groups have a fewer effects in changing the positions of cluster centroids (agents). Agent displacement (centroid movement) is proportional to the value of the total force exerted on the agent by the fixed objects. It is expected agents move towards the center of gravity and stop in an area where the gravity force field approaches zero. It should be noted fixed objects are not allowed to apply force to each other
2- Proposed method
The proposed method combine PSC and gravitational clustering to solve the problem mentioned in the previous section. The method improves each particles in PSO using gravitational law to make sure particles which are grouping data into same clusters give same fitness and same position.
The algorithm starts with some random particles. Particle’s structure is designed to optimize objective function along with number of cluster (fig 2). After initialization the algorithm decodes each particle and extracts centroids. Data is clustered by centroid are extracted from particle. Now the algorithm takes advantages of gravitational law. Each cluster has a cluster centroid that plays role of agent in gravitational clustering. All data in a cluster force the agent and agent moves towards the data. The force of each data to agent is calculated by formula 1 in each iteration.
μ(t+1)←μ(t)+ txj-μ(t)‖xj-μ(t)‖2, ∀x∈C
Where is centroid vector, is discount factor which controls speed of agent, xj is jth data in the cluster. After some iteration agent get to a place in data space which total force become zero. That point is called cluster’s center of mass. You can see the process of this adjustment in fig2.
After performing gravitational algorithm, adjusted cluster centroids paste to particle and evaluate by fitness function. We sure that if two or more particle groups data in same clusters, they are become similar. Every particle updates its position and velocities by the following formula like PSO.
In fig 4 you see ordinary PSO converge soon to local optima however when we use gravitational clustering to improve particles, it explores more.
Browse our vast selection of original essay samples, each expertly formatted and styled