The banking sector consists of public sector, private sector and foreign banks, apart from smaller regional and cooperative banks. IT-based banking products, services and solutions are available in market. Phone Banking; ATM facility; Credit, Debit and Smart Cards; Internet Banking & Mobile Banking; SWIFT Network & INFINET Network are most common in banking.
Every year fraud in banking is rising. Fraud presents significant cost to our economy. For customer segmentation and productivity, most of the banks are using data mining, and also for credit scores and approval, predicting payment default, marketing, detecting fraudulent transactions, etc.
Banking industry is spreading its branches vastly over a multiple area of various industries with providing more functionality to its customers. Implementing such a large industry requires to stores its huge information in a secured and proper format. Hence in addition it is developing its services rapidly in various operations like providing online transaction or offering various policies to their customer also giving access to multiple other functions. So currently banks have huge data to store which is increasing rapidly.
Handling such a large data leads to security if data which is the most important task of Data mining. Data mining have introduced various techniques and algorithms that will help to focus on important pattern of data from the database. And also helps to take important decisions.
The banking industry across the world has undergone tremendous changes in the way the business is conducted. With the recent implementation, greater acceptance and usage of electronic banking, the capturing of transactional data has become easier and simultaneously, the volume of such data has grown considerably. It is beyond human capability to analyses this huge amount of raw data and to effectively transform the data into useful knowledge for the organization. Data Mining can help by contributing in solving business problems by finding patterns, associations and correlations which are hidden in the business information stored in the data bases.
Over the past few years, a number of review articles have appeared in conference or journal publications. Bolton and Hand, for example, have reviewed statistical methods of detecting fraud, including credit card fraud, money laundering, telecommunications fraud, etc. Phua et al. present a survey of data mining-based fraud detection research, including credit transaction fraud, telecoms subscription fraud, automobile insurance fraud and the like. Others have reviewed insurance fraud and financial statement fraud.
The method AdaCost was developed from Adaboost for credit card fraud detection, and resulted in the metaheuristic Cost Sensitive, which can be applied for many applications where there are different costs for false positive and false negative. Comparative studies between Neural Networks (NN) and Bayesian Networks (BN) in credit card fraud detection were reported, which favored the result of BN.
Statistical fraud detection methods have been divided into two broad categories: supervised and unsupervised. In supervised fraud detection methods, models are estimated based on the samples of fraudulent and legitimate transactions, to classify new transactions as fraudulent or legitimate. In unsupervised fraud detection, outliers or unusual transactions are identified as potential cases of fraudulent transactions. Both these fraud detection methods predict the probability of fraud in any given transaction.
Predictive models for credit card fraud detection are in active use in practice.
Considering the profusion of data mining techniques and applications in recent years, however, there have been relatively few reported studies of data mining for credit card fraud detection. Several techniques, including support vector machines and random forests for predicting credit card fraud. Their study focuses on the impact of aggregating transaction level data on fraud prediction performance. It examines aggregation over different time periods on two real-life datasets and finds that aggregation can be advantageous, with aggregation period length being an important factor. Aggregation was found to be especially effective with random forests. Random forests were noted to show better performance in relation to the other techniques, though logistic regression and support vector machines also performed well. Support vector machines and random forests are sophisticated data mining techniques which have been noted in recent years to show superior performance across different applications.