Study on Simple K Mean and Modified K Mean Clustering Technique


Please note! This essay has been submitted by a student.

Download PDF

Abstract: The main purpose of this review document is to provide a comprehensive overview of simple grouping and modified grouping techniques k. Clustering is used as active research in various areas such as statistics, pattern recognition and machine learning, etc. Cluster analysis is a data mining tool for a large and multi-purpose database. Grouping is that of data mining techniques in which the data is divided into groups of similar objects and dissimilar objects in another group. Clustering is an appropriate example of unsupervised classification.


Essay due? We'll write it for you!

Any subject

Min. 3-hour delivery

Pay if satisfied

Get your price

Due to the increased availability of computer hardware and software and the rapid businessization of businesses, large volumes of data have been collected and stored in databases. Researchers estimate that the amount of information in the world doubles every 20 months [1]. However, raw data cannot be used directly. Its true value is provided by the extraction of information useful for decision making. In most areas, data analysis has traditionally been a manual process. When the size of data processing and exploration exceeds human capabilities, people are looking for information technologies to automate the process. Data mining is one of the newest research activities in computer science and is defined as the extraction of interesting (not trivial, tacit, previously unknown and potentially useful) patterns or knowledge from a huge amount of data. Data mining is the process of analyzing data from different angles and compiling it into useful information [2].

Data mining involves exporting, transforming and loading transaction data into the data warehouse system. Data mining includes anomaly detection, link rule learning, sorting, regression, summary, and clustering. Data mining is one of the major research areas due to the expansion of computer technology and software technology, which has forced organizations to rely heavily on these technologies. Data mining concepts and methods can be applied in a variety of areas such as marketing, medicine, real estate, customer relationship management, engineering, web mining, etc. Different clustering algorithms have been designed and implemented according to different techniques various data mining problems successfully. In this paper, the clustering analysis is performed using simple group clustering and modified k mean clustering. Normalization and indexing are an important pre-processing step to standardize the values of all variables from the dynamic region to a specific region. Cluster analysis is a type data mining technique used for data mining and data segmentation. By collecting data, people receive data distribution, observe the nature of each cluster, and conduct further studies on specific clusters. The purpose of cluster analysis is that the objects in one group must be similar to each other and different from the objects in other groups. Grouping is much better when there is greater similarity within a group and greater difference between groups. Thus, we can say that raw data must be used with the algorithm to extract useful information from it. Different clustering algorithms according to different techniques have been successfully designed and applied to various data mining problems. The most commonly used algorithms in clustering are hierarchical, segmental, densitometric and grid-based algorithms. The hitherto popular aggregation techniques proposed are either segmentation-based or hierarchical clustering but both approaches have their own advantages and disadvantages in terms of cluster number, cluster shape, and cluster overlap. When applying any clustering algorithm to the primary data, only then can we obtain clusters.

Partitioning Clustering

The data objects are separated into non-overlapping clusters so that each object is exactly in a subset. The reason for dividing data into multiple subsets is that it is not computationally possible to control all possible subset systems. there are some greedy schemes of heuristics used in the form of iterative optimization. This means different relocation schemes that overlap dots between k groups [3].

Simple K-Means Clustering

It is a method of sharing that finds mutually exclusive spherical clusters. It creates a specific number of branched, flat (non-hierarchical) groups. The K-Means algorithm recognizes objects in k segments, where each compartment represents a cluster. We start with the original set of media and sort cases by their distances to their centers. Then, we recalculate the cluster media, using the cases corresponding to the clusters. then we reclassify all cases based on the new set of media. We keep repeating this step until the cluster media does not change between successive steps. Finally, we recalculate the team’s assets and assign the cases to their permanent clusters. [4]

Method for Simple K means clustering

  1. Input: k= no. of clusters. D= data set that contains n objects.
  2. Output: Set of k clusters.


  1. Randomly choose k objects from D as the initial cluster centre.
  2. Repeat.
  3. Reassign each object to the cluster to which the object is most similar, based on the mean value of the objects in the cluster.
  4. Update the cluster means, i.e. calculate the mean value of the objects for each cluster.
  5. Until no change.

Modified K Mean Clustering

The modified ‘K Mean’ approach is designed to improve time, no. of the repetitions and the sum of the rounded errors and this provides much better results than the Simple K Mean clustering done using the K Mean tool. This is simple to use and also provides a graphical user interface to the user. At Modified Value K The median data grouping is reduced by a normalized method followed by parameters such as the time taken, no. of repetitions, the sum of squared errors is improved by using a normalized technique. Normalization index method used for modification. In this method the input data must be minimized using indexing so that we can get the raw data in sequence after calculating the Euclidean distance between different groups. After that the smoothing is done to minimize the sum of squared errors and not of repetition that results in a shorter runtime for clustering. The first step according to the given flowchart is that we must provide the raw data for data conversion using indexing and calculate the Euclidean distance so that clusters can be grouped into similar and dissimilar categories.


K Mean clustering is the most important type of partition clustering. The grouping of the partitions is one in which the flocks are divided according to their distances. The mean clustering is that the technique in which cluster K is selected and the cluster less distant from cluster k are selected in one group and others that are longer than cluster K are placed in a different cluster. In this document the simple batch k accumulation is described using the WEKA tool and downloading the medical data set namely Pima, AIDS, Breast Cancer. On the other hand, modified k mean clustering has been described based on the normalization and indexing approach using. NET which requires less time with minimal number. the sum of the squared errors to execute the cluster.

writers online
to help you with essay
banner clock
Clock is ticking and inspiration doesn't come?
We`ll do boring work for you. No plagiarism guarantee. Deadline from 3 hours.

We use cookies to offer you the best experience. By continuing, we’ll assume you agree with our Cookies policy.