The following is a copy of the seminar summary that will be distributed in the library.  For your convenience, I've placed an on-line copy here.

What is data mining?

  • Data mining is the process of applying statistical techniques to databases in order to find hidden patterns or relationships
  • It is a methodology for discovering information and a tool for business
  • It is not a magic tool. It requires knowledge of business and familiarity with statistics, though many commercial data mining packages are lessening the requirements.
  • Commercial packages often use simpler models to describe findings (see below).

Statistical Data mining models

  • Two basic types of ways to use data mining:
    • Predictive models, which are used to predict future trends based on present or known data.
    • Descriptive models, which are used to look at current information and come up with relationships and trends (which are not obvious.)
    • Predictive models are based on classification, regression, time-series forecasting, or clustering.
    • Description models are based on association analysis and sequence discovery.
  • Simpler, more frequently used models
    • Neural networks
      • A collection of nodes (inputs, or parameters), that impact an output variable.
      • Each input variable has a "weighting"
      • Neural networks are "trained" by comparing estimated outputs with real outputs. Then weights are adjusted to improve the model.
      • Problems: difficult to decide on input factors and trained nets tend to favour test data
    • Decision Trees
      • A set of rules or conditions that lead to a decision.
      • Limitations: tree structure doesn’t allow for upper level splits to affect lower or "future splits"; not all cases are always enumerable.

Data Mining as a Practical Business Tool

  • Data mining is an evolutionary development. It is a natural extension of information technology.
  • Evolutionary stages in data management: data collection (1960s), data access (1980s), data warehousing and decision support (1990s), data mining (emerging today)
  • Data mining is a new technology and is still in its infancy, but is continuing to mature and grow as part of the standard business information systems suite of tools.

Practical Applications

  • Blockbuster Entertainment, Wal-Mart Stores, NBA, Mellon Bank
  • Many other areas of possible application: retail and marketing, banking/finance, insurance, manufacturing, energy, service industries.
  • Case Study: Wal-Mart and Barbie Dolls
    • Data mining result: customers who buy Barbie dolls (Wal-Mart sells one every 20 seconds) have a 60% likelihood of buying one of three types of candy bars.
    • No immediate action from Wal-Mart, but several possibilities suggested: rearranging the positioning of barbie dolls and candy bars in the store, co-packaging (putting candy bars and barbie dolls together in one package), promotional pricing (modify pricing to increase sales; savings to consumer really illusory), promotion programs (giving away free barbie-related products with candy bar purchases), and creating barbie-shaped candy.
  • Application: Improving Call Centre Service
    • New technology (improved voice recognition) exports phone conversations to text databases.
    • Using data minng techniques can reveal hidden information about phone conversations not previously known.

Future Trends

  • Increased usage of data mining, maturity of technology (more companies will use data mining in production information systems), analysis of multimedia data (film footage, voice communication, photography analysis).

next_button.gif (5216 bytes)prev_button.gif (5245 bytes)links_button.gif (4578 bytes)refs_button.gif (4588 bytes)summary_button.gif (4596 bytes)