Understanding Decision Trees

Edna Figueira Fernandes
2 min readOct 19, 2020

Decision trees are algorithms that partition the sample space into sets containing similar points. The top feature in the decision tree is called root node, the nodes in between are called internal nodes, and the final nodes are called leaf nodes. Essentially, each node partitions the space into two subspaces and checks for a condition to determine the number of observations that fall under that condition. Finally, every lead node represents a discrete class.

How does the decision tree determine which node should be at the top?

Different selection criterions can be used to determine which feature should be at the top node such as entropy, gini index and information gain.

. Entropy measures uncertainty or impurity in a dataset.

. Information gain is used to determine which feature in the dataset provides the highest information about the classification class.

. Gini Index calculates the probability of feature being classified when selected randomly.

Classification and regression trees (CART) use the Gini index as a metric, while iterative dichotomiser 3 (ID3) use entropy and information gain as metrics.

How does the selection criterion work?

Let’s take the Gini Index as the selection criterion.

  1. Each feature is checked against the target feature to see which one does a better classification.
  2. The Gini impurity for each of the features is calculated.
  3. The feature that stays at the root node is the one with the lowest gini impurity value.

Continue with the same approach to determine which feature should be in the internal nodes. While checking which feature should be in the internal node, if using any of the features results in a higher gini impurity value, then that node is terminated and it becomes a leaf node.

References:

--

--