Luis Ashurei

Thinking in Data

| | email

Gini Impurity

Per WIKI:
Measure how often a randomly chosen element from the set would be incorrectly labeled.

I_{G}(f) = \sum_{i=1}^{m} f_i (1-f_i) = \sum_{i=1}^{m} (f_i - {f_i}^2) = \sum_{i=1}^m f_i - \sum_{i=1}^{m} {f_i}^2
 = 1 - \sum^{m}_{i=1} {f_i}^{2} = \sum_{i\neq k}f_i f_k

It's another way to measure impurity degree, alternative of Entropy.
Used in Decision tree learning algorithm - by the CART (classification and regression tree) algorithm.

Example

An example from revoledu:
Given that Prob (Bus) = 0.4, Prob (Car) = 0.3 and Prob (Train) = 0.3, we can now compute Gini index as

Gini Index = 1 – (0.4^2 + 0.3^2 + 0.3^2) = 0.660

Calculator

Input data: each line represent the probability/frequency of a group.

09 Apr 2016