The following is a copy of the seminar summary that will be distributed in the library. For your convenience, I've placed an on-line copy here.What is data mining?
- Data mining is the process of applying statistical techniques to databases in order to find hidden patterns or relationships
- It is a methodology for discovering information and a tool for business
- It is not a magic tool. It requires knowledge of business and familiarity with statistics, though many commercial data mining packages are lessening the requirements.
- Commercial packages often use simpler models to describe findings (see below).
Statistical Data mining models
- Two basic types of ways to use data mining:
- Predictive models, which are used to predict future trends based on present or known data.
- Descriptive models, which are used to look at current information and come up with relationships and trends (which are not obvious.)
- Predictive models are based on classification, regression, time-series forecasting, or clustering.
- Description models are based on association analysis and sequence discovery.
- Simpler, more frequently used models
- Neural networks
- A collection of nodes (inputs, or parameters), that impact an output variable.
- Each input variable has a "weighting"
- Neural networks are "trained" by comparing estimated outputs with real outputs. Then weights are adjusted to improve the model.
- Problems: difficult to decide on input factors and trained nets tend to favour test data
- Decision Trees
- A set of rules or conditions that lead to a decision.
- Limitations: tree structure doesnt allow for upper level splits to affect lower or "future splits"; not all cases are always enumerable.
Data Mining as a Practical Business Tool
- Data mining is an evolutionary development. It is a natural extension of information technology.
- Evolutionary stages in data management: data collection (1960s), data access (1980s), data warehousing and decision support (1990s), data mining (emerging today)
- Data mining is a new technology and is still in its infancy, but is continuing to mature and grow as part of the standard business information systems suite of tools.
Practical Applications
- Blockbuster Entertainment, Wal-Mart Stores, NBA, Mellon Bank
- Many other areas of possible application: retail and marketing, banking/finance, insurance, manufacturing, energy, service industries.
- Case Study: Wal-Mart and Barbie Dolls
- Data mining result: customers who buy Barbie dolls (Wal-Mart sells one every 20 seconds) have a 60% likelihood of buying one of three types of candy bars.
- No immediate action from Wal-Mart, but several possibilities suggested: rearranging the positioning of barbie dolls and candy bars in the store, co-packaging (putting candy bars and barbie dolls together in one package), promotional pricing (modify pricing to increase sales; savings to consumer really illusory), promotion programs (giving away free barbie-related products with candy bar purchases), and creating barbie-shaped candy.
- Application: Improving Call Centre Service
- New technology (improved voice recognition) exports phone conversations to text databases.
- Using data minng techniques can reveal hidden information about phone conversations not previously known.
Future Trends
- Increased usage of data mining, maturity of technology (more companies will use data mining in production information systems), analysis of multimedia data (film footage, voice communication, photography analysis).