Introduction to IBM SPSS Modeler and Data Mining is a three day instructor-led classroom course that provides an overview of data mining and the fundamentals of using IBM SPSS Modeler.

Course Outline

The course provides an overview of data mining and the fundamentals of using IBM SPSS Modeler and the principles and practice of data mining are illustrated using the CRISP-DM methodology.

The course structure follows the stages of a typical data mining project, from reading data, to data exploration, data transformation, modeling, and effective interpretation of results. The course provides training in the basics of how to read, explore, and manipulate data with IBM SPSS Modeler, and then create and use successful models.

Pre-requisites

  • General computer literacy (No statistical background is necessary)
  • An understanding of your organization’s data, as well as any of your organization’s business issues that is relevant to the use of data mining

High-level Curriculum

Lesson 1: Course Introduction

  • Explain the course outline

Lesson 2: Introduction to Data Mining

  • Explain the stages of the CRISP-DM process model
  • Describe successful data mining projects and the reasons why projects fail
  • Describe the skills needed for data mining

Lesson 3: Working with Streams

  • Describe the different areas of the Modeler User Interface
  • Work with Nodes and Supernodes
  • Run, open and save a stream
  • Access the help function within Modelern

Lesson 4: Data Mining Tour

  • Explain the primary concepts used in data mining
  • Build, evaluate and deploy a model
  • Use the Sort and Filter nodes

lesson 5: Collecting Initial Data

  • Explain the concepts of “data structure”, “records”, “fields”, “unit of analysis”, “storage”
  • Read data from and export data to various file formats

Lesson 6: Data Understanding

  • Examine the distributions of categorical and continuous fields
  • Explain the most common ways of handling missing data
  • Explain how to set Modeler to check data quality and select valid records

Lesson 7: Setting the Unit of Analysis

  • Remove duplicate records
  • Aggregate data

Lesson 8: Integrating Data

  • Add records from multiple datasets into one dataset
  • Add fields from multiple datasets into one dataset
  • Use sampling for testing purposes

Lesson 9: Deriving and Filling Fields

  • Use CLEM to transform data
  • Use the Derive node to create a new field
  • Use the Reclassify node
  • Use the Reorder node to reorder fields

Lesson 10: Looking for Relationships

  • Examine the relationship between two categorical fields
  • Examine the relationship between two continuous fields
  • Examine the relationship between one continuous field and one categorical field

Lesson 11: Introduction to Modeling

  • Modeling objectives
  • Introduction to Classification
  • Introduction to Segmentation

Lesson 12: Course Summary

  • Course Objectives Review