Data mining is the process of sorting through large data sets to identify patterns and establish relationships to solve problems through data analysis. Data mining tools allow enterprises to predict future trends. Data mining techniques are used in many research areas including mathematics, cybernetics, genetics and marketing. While data mining techniques are means to drive efficiencies and predict customer behavior, if used correctly. Data mining is in fact, considered to be as a crucial technique for the business attempting to focus on customer satisfaction. The following are some of the data mining techniques.
Association: association is one of the best known data mining technique in an association; a pattern is discovered based on relationship between items in the same transaction. That is the reason why association of techniques is also known as relation technique. This is widely used in market based analysis to identify a set of products that customers secretly purchase together.
Classification: classification is a classic data mining technique based on machine learning. Basically they are used to classify each item in a data into one of a predefined set of classes or groups. Classification methods make use of mathematical techniques such as decision trees, linear programming, neural work and statistics. In classification, we develop the software that can learn how to classify data into groups. For example, we can apply classification in application that “given all records of employees who left the company, predict who will probably leave the company in a future period.” In this case, we divide the records of employees into two groups that named “leave” and “stay”. And then we can ask our data mining software to classify the employees into different groups.
Clustering: clustering is a data mining technique that makes meaningful or useful cluster of objects which have similar characteristics using automatic technique. The clustering techniques defines the classes and puts objects in each class, while in the classification techniques, objects are assigned into pre-defined classes. To make the concept clearer, we can take book management in the library as an example. In the library, there is wide range of books on various topics available. The challenge is how to keep those books in a way that readers can take several books on a particular topic without hassle. By using the clustering technique, we can keep books that have some kinds of similarities in one cluster or one shelf and label it with a meaningful name. If readers want to grab books in that topic, they would only have to go to that shelf instead of looking for the entire library.
Prediction: the prediction, as its name implied, is one of the data mining techniques that discover the relationship between independent variables and relationship between dependent and independent variables. For instance, the prediction analysis technique can be used in the scale to predict profit for the future if we consider the sale as an independent variable; profit could be a dependent variable. Then based on historical sale and profit data, we can draw a fitted regression curve used for profit prediction.
Sequential patterns: sequential pattern analysis is one of the data mining techniques that seek to discover or identify similar patterns, regular events or trends in transaction data over a business periods. In sales, with historical transaction data, business can identify a set of items that customers buy together different times in a year. Then business can use this information to recommend customers buy it with betters deals based on their purchasing frequency in their past.
Decision trees: the decision tree is one of the most commonly used data mining techniques because the models are very easy for the users to understand. In decision tree technique, the root of a decision tree is simple question or condition that has multiple answers. Each answer then leads to a set off questions or conditions that helps us determine the data so that we can make the final decision based on it.
An algorithm in data mining (or machine learning) is a set heuristics and calculations that create models in a data. To create a model, the algorithm first analysis the data you provide looking for specific types of patterns or trends. This algorithm uses the results of the analysis over much iteration to find optimal parameters for creating the mining model. These parameters are then applied across the entire data set of extract actionable patterns and detailed statistics. The mining model that an algorithm creates from your data can take various forms, including:
- Set of clusters that describe how the cases in a dataset are related
- A decision tree that predicts an outcome, and describes how different criteria affects that outcome
- A mathematical model that forecasts sales
- A set of rules that describes how products are grouped in a transaction, and the probabilities that the products are purchased together.
Choosing the right algorithm to use for a specific analytical task can be a challenge, while you can use different algorithm to perform the same business task. For example, you can use Microsoft decision trees algorithm not only for prediction, but also as a way to reduce the number of columns in a dataset, because the decision tree can only identify columns that do not affect the final mining model. Choosing an algorithm by type: Classification algorithm: predicts one or more discrete variables, based on the other attributes in a dataset. Regression algorithm: predict one or more continuous numeric variables, such as profit or loss based on other attributes in the dataset. Segmentation algorithms: divides the data into group, or clusters, of items that have similar properties. Association algorithms: find correlates between different attributes in a dataset. The most common application of this kind of algorithm is for creating association rules, which can be used in a market based analysis. Sequence analysis algorithms: summarize frequent sequences, or an episode in a data such has series of clicks in a web site, or a series of log events predicting machine maintenance. However, there is no reason that you should be limited to one algorithm in your solutions. Experience analysts will sometimes use one algorithm to determine the most effective inputs and then apply it for a different algorithm to predict a specific outcome based on the data. You might also use multiple algorithms within a single solution to perform separate tasks.