Quick Learnology

​ The Steps in Data Mining: ​ ​

  1. Data Preparation​
  2. Data Understanding​
  3. Data Cleaning
  4. Missing data
  5. Coding Systems​
  6. Data Transformation
  7. Univariate Data Analysis
  1. Data preparation converting data into a form that the data mining tool that you are going to be using will understand properly .
  2. Data Understanding  is the well defined problem statement that you are trying to address data . Data understanding means there data to be able to answer this question where is the data in what form the data is it might be in multiple locations in multiple forms and fashions which you need to bring together integrate consolidate turn it into a unified platform of data to conduct data mining .
  3. Data Cleaning: To remove noise and inconsistent data.
    Example parsing the data. Cleaning is performer for detection of syntax error Parser decides weather the given string of data is acceptable within data specification
  4. Missing Data 

    (i) Fill The Missing Data:Missing data can be filled by methods such as:

    • Ignoring the tuple.
    • Filling the missing value manually.
    • Use the measure of central tendency, median or
    • Filling in the most probable value

    ii) Remove The Noisy Data: Random error is called noisy data.

    Binning: Binning methods are applied by sorting values into buckets or bins. Smoothening is performed by consulting the neighboring values. Binning is done by smoothing by bin.

  5. Data TransformationIn this process, data is transformed into a form suitable for the data mining process. Data is consolidated so that the mining process is more efficient and the patterns are easier to understand. Data Transformation involves Data Mapping and code generation process.

    Strategies for data transformation are:

    Smoothing: Removing noise from data using clustering, regression techniques, etc.

    Aggregation: Summary operations are applied to data.

    Normalization: Scaling of data to fall within a smaller range.

    Discretization: Raw values of numeric data are replaced by intervals

  6. Univariate Data Analysis

    • Univariate analysis is the simplest form of analyzing data. Uni means one, so in other words the data has only one variable. 
    • Univariate data requires to analyze each variable separately. Data is gathered for the purpose of answering a question, or more specifically, a research question.
Data Analysis