Quick Learnology

Data Mining : 

Data mining refers to extracting knowledge from large amount of data.

Data mining engine is very essential to the data mining system. It consists of a set of functional modules that perform the following functions −

  • Characterization
  • Association and Correlation Analysis
  • Classification/ clustering
  • Prediction
  • Cluster analysis
  • Outlier analysis
  • Evolution analysis

Data Mining Process: Models, Process Steps & Challenges Involved​

  1.  Data Cleaning.
  2.  Data Integration.
  3.  Data Reduction.
  4.  Data Transformation.
  5.  Data Mining.
  6.  Pattern Evaluation.
  7.  Knowledge Representation

KDD (Knowledge Discovery from Data)

It defines the broad process of discovering knowledge in data and emphasizes the high-level applications of definite data mining techniques.

Knowledge Discovery in Databases (KDD) Model​

1.) Data cleaning:  To remove noise and inconsistent data.
Example parsing the data. Cleaning is performer for detection of syntax error Parser decides weather the given string of data is acceptable within data specification

2.) Data integration: where multiple data sources are combined

3. Data Selection: Where data relevant to the analysis task are retrieved from the database.

4. Data transformation: Where data are transformed or consolidated into forms appropriate for mining by instance performing. summary or aggregation operations, for instance

5.) Data Mining: An essential process where intelligent methods are applied in order to extract data patterns.

6. Pattern Evaluation: To identify base the truely interesting patterns knowledge base on some interesting new measures.

7. Knowledge Representation: where visualization & knowledge representation techniques are used to present the mined knowledge to the user.

CRISP-DM Methodology

  • The CRISP-DM methodology that stands for Cross Industry Standard Process for Data Mining, 
  • It is a cycle that describes commonly used approaches that data mining experts use to tackle problems in traditional data mining. 

Each step has several sub steps

  • Step 1 is where the business problem is defined and characterized.
  • Step2 Data Understanding is the well defined problem statement that you are trying to address data . Data understanding means there data to be able to answer this question where is the data in what form the data is it might be in multiple locations in multiple forms and fashions which you need to bring together integrate consolidate turn it into a unified platform of data to conduct data mining .
  • Step3 Data preparation converting data into a form that the data mining tool that you are going to be using will understand properly .
  • STEP 2&3 consume 85% of the total project time.
  • Step 4 where the fun starts
  • In this, build the model ,acuracy the model
    Here the knowledge and patterns are discovered inteprated and analyzed and validated once you have developed the model . When you satisfied with accuracy level of the model then we move 5th step
  • Step 5 is the validation of the findings against the business model . It is the reuse of the discovered pattern.
  • Step 6 is Deploy the model.
  • (not every pattern is deployed)

​ Knowledge Base System​ ​ ​

  • An intelligent agent needs knowledge about the real world for taking decisions and reasoning to act efficiently.
  • Knowledge-based agents are those agents who have the capability of maintaining an internal state of knowledge, reason over that knowledge, update their knowledge after observations and take actions. 
  • These agents can represent the world with some formal representation and act intelligently.

​ Knowledge Base System Architecture​ ​ ​

The components of KBS include −​

  • Knowledge Base​
  • Inference Engine​
  • User Interface
  • It contains domain-specific and high-quality knowledge.
  • Knowledge is required to exhibit intelligence. The success of any KBS majorly depends upon the collection of highly accurate and precise knowledge.

Components of Knowledge Base

The knowledge base data is store of bothfactual and heuristic knowledge.

Factual Knowledge − It is the information widely accepted by the Knowledge Engineers and scholars in the task domain.

Heuristic Knowledge − It is about practice, accurate judgement, one’s ability of evaluation, and guessing.

Exp: Hypothesis, if else rule, thumb rule.

Inference Engine​

In the field of artificial intelligence, an inference engine is a component of the system that applies logical rules to the knowledge base to deduce new information.

EXP: fuzzy logic system

Inference Engine
Inference Engine