The Data Mining Template includes three slides. Data Mining process introduction, firstly is Cross Industry Standard Process for Data Mining with six phases, then is the five Phases of SEMMA, and lastly, five stages for Knowledge Discovery in Databases (KDD) process.
Data mining is the computing process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. You can learn more about this concept from Wikipedia.
Next we will explore three typical processes for data analytics, which are CRISP-DM, SEMMA, and KDD.
Slide 1, Cross Industry Standard Process for Data Mining.
The Cross Industry Standard Process for Data Mining (CRISP-DM) is a widely adopted methodology that provides a structured approach to planning a data mining project. It is a six-phase model that includes:
- Business Understanding: This phase focuses on understanding the project objectives and requirements from a business perspective, and then converting this knowledge into a data mining problem definition.
- Data Understanding: Involves collecting initial data, describing it, exploring it, and verifying its quality to uncover potential issues.
- Data Preparation: The data is cleaned, preprocessed, and transformed into the proper format for mining.
- Modeling: Various modeling techniques are selected and applied, and their parameters are calibrated to optimal values.
- Evaluation: Once a model is built, it is important to evaluate it to ensure it meets the business objectives defined in the first phase.
- Deployment: The deployment of a data mining solution can be as simple as generating a report or as complex as implementing a repeatable data mining process across the organization.
Slide 2, Data Mining-Phases of SEMMA.
SEMMA, which stands for Sample, Explore, Modify, Model, and Assess, is a sequence of steps developed by SAS Institute Inc. for carrying out data mining projects.
- Sample: Select a representative portion of the data for analysis.
- Explore: Conduct exploratory data analysis to discover initial insights, trends, or patterns.
- Modify: Prepare the data for modeling by selecting variables and transforming data into formats suitable for mining.
- Model: Apply various statistical and machine-learning algorithms to model the data.
- Assess: Evaluate the model for accuracy and reliability in representing the data and meeting the defined objectives.
Slide 3, Data Mining-Knowledge Discovery in Databases (KDD) process.
The Knowledge Discovery in Databases (KDD) process is an overarching term for the process of discovering useful knowledge from a collection of data. This is a multi-step process that includes:
- Selection: Data relevant to the analysis task are retrieved from the database.
- Preprocessing: Data are cleaned and preprocessed to remove noise and inconsistencies.
- Transformation: Data are transformed or consolidated into forms appropriate for mining.
- Data Mining: The actual process of applying algorithms to extract patterns from the data.
- Interpretation/Evaluation: The discovered patterns are interpreted to ensure they are truly interesting and can be considered new knowledge.
- Deployment: The knowledge is incorporated into the system for further action or simply documented and reported to stakeholders.
All three of these processes aim to guide data professionals through the complex and often iterative steps involved in extracting meaningful patterns and insights from large datasets. While each has its unique framework, they share common goals: ensuring that the data used is of high quality, the methods and models are suitable for the objectives, and the results are actionable and aligned with business or research goals.
Aspect Ratio: Standard 4:3
Click the blue button to download it.
Download the 4:3 Template
Aspect Ratio: Widescreen 16:9
Click the green button to download it.
Download the 16:9 Template