Data mining
Data mining is an area of using intelligent information management tool to discover the knowledge and extract information to help support the decision making process in an organization. Data mining is an approach to discovering data behavior in large data sets by exploring the data, fitting different models and investigating different relationships in vast repositories.
The information extracted with a data mining tool can be used in such areas as decision support, prediction, sales forecasts, financial and risk analysis, estimation and optimization.
Sample real-world business use of data mining applications includes:
CRM - aids customers classification and retention campaigns
Web site traffic analysis - guest behavior prediction or relevant content delivery
Public sector organizations may use data mining to detect occurences of fraud such as money laundering and tax evasion, match crime and terrorist patterns, etc.
Genomics research - analysis of the vast data stores
The most widely known and encountered Data Mining techniques:
Statistical modeling which uses mathematical equations to do the analysis. The most popular statistical models are: generalized linear models, discriminant analysis, linear regression and logistic regression.
Decision list models and decision trees
Neural networks
Genetic algorithms
Screening models
Data mining tools offer a number data discovery techniques to provide expertise to the data and to help identify relevant set of attributes in the data:
Data manipulation which consists of construction of new data subsets derived from existing data sources.
Browsing, auditing and visualization of the data which helps identify non-typical, suspected relationships between variables in the data.
Hypothesis testing
A group of the most significant data mining tools is represented by:
SPSS Clementine
SAS Enterprise Miner
IBM DB2 Intelligent Miner
STATISTICA Data Miner
Pentaho Data Mining (WEKA)
Isoft Alice