Decision Tree is a tool which suggests or tells about the decision. It represents the decision and its consequences in a tree like graphical form. It is basically a binary tree displaying an algorithm called ID3 which was developed by J.Quilan. Here, the search appears in the form of a branching tree with just only two possible outcomes.

This tool is widely being used in operations research, to develop strategy, to achieve goals, as it requires a systematic process to arrive at an analysis. It formalizes the brainstorming process in the form of a document. It is majorly being used in mechanical type of learning in order to interpret the data.

1. What is Decision Tree?

Decision Tree is a regression model represented in the form of tree. The algorithm divides the data into various subsets and simultaneously develops related decision tree incrementally in a step by step manner. The tree has two nodes, one is the decision node, which has further branches and the other is leaf nodes, which does not branch out but depicts the final, desired output or result. The last among the nodes is referred to as the end node while the initial decision node is referred as root node. As it is the root, it is considered to be the best predictor.

2. Example of Decision Tree

Suppose there is sample of 30 High School students. They are classified on three attributes. For Example: Gender, height (5 to 6 ft.) and Class (IX or X). There are 15 students, among these 30 students who like playing cricket. How can we develop a model describing the students who like playing cricket?

In the above problem, there is a need to segregate cricket playing students based on highly significant input variable among all three.

Decision Tree-1

3. Types of Decision Trees

Based on the targeted values, the decision trees are of the following two types:

  • Categorical Variable Decision Tree: The data which has categorical values as target are dealt with here. For example, deciding on whether to say Yes or No in a task or a game.
  • Continuous Variable Decision Tree: The data which has continuous values as target are considered here. For example, having age as a target value.

4. Why Decision Trees are Regarded Superior to other Algorithm?

Decision Tree is an instrument used to analyse multiple variables. It allows predicting, explaining, classifying and describing the various possibilities of an outcome or event to occur. It goes beyond simple one to one cause-effect relationship. The superiority of the models goes to the ease and strength with the variable type of data and its levels of measurement. They find strong relationship between input and target values.

Decision Tree-2

5. Advantages and Disadvantages of Decision Tree

Decision Tree is used to depict visually all the decisions with their relevant factors and consequences, which aims to ease your analysis. It has some advantages and disadvantages which are as under:

Advantages

Disadvantages

Easy to understand and code

It is a high variance classifier

Easily handled, skewed variables as they do not assume on the basis of variable distribution

Over fitting

Unlike other algorithm , it explains non-linearity in intuitive way

There is some biasness and variance

It allows  for forward and backward calculation along the decision path

 

The tree helps in collapsing a set of categorical values into the range of the selected target

 

6. Application

It has been widely used as integrity checking mechanism to validate the data provided by the providers. There are many soft wares that provide the Decision tree for the data. R and Python users have many soft wares with packages that allow you to develop a tree in order to arrive at a fair decision.