Decision Tree Classifier, Explained: A Visual Guide with Code Examples for Beginners
CLASSIFICATION ALGORITHM

⛳️ More [CLASSIFICATION ALGORITHM](https://medium.com/@samybaladram/list/classification-algorithms-b3586f0a772c), explained: · [Dummy Classifier](https://towardsdatascience.com/dummy-classifier-explained-a-visual-guide-with-code-examples-for-beginners-009ff95fc86e) · [K Nearest Neighbor Classifier](https://towardsdatascience.com/k-nearest-neighbor-classifier-explained-a-visual-guide-with-code-examples-for-beginners-a3d85cad00e1) · [Bernoulli Naive Bayes](https://towardsdatascience.com/bernoulli-naive-bayes-explained-a-visual-guide-with-code-examples-for-beginners-aec39771ddd6) · [Gaussian Naive Bayes](https://towardsdatascience.com/gaussian-naive-bayes-explained-a-visual-guide-with-code-examples-for-beginners-04949cef383c) ▶ [Decision Tree Classifier](https://towardsdatascience.com/decision-tree-classifier-explained-a-visual-guide-with-code-examples-for-beginners-7c863f06a71e) · [Logistic Regression](https://towardsdatascience.com/logistic-regression-explained-a-visual-guide-with-code-examples-for-beginners-81baf5871505) · [Support Vector Classifier](https://towardsdatascience.com/support-vector-classifier-explained-a-visual-guide-with-mini-2d-dataset-62e831e7b9e9) · [Multilayer Perceptron](https://towardsdatascience.com/multilayer-perceptron-explained-a-visual-guide-with-mini-2d-dataset-0ae8100c5d1c)
Decision Trees are everywhere in machine learning, beloved for their intuitive output. Who doesn't love a simple "if-then" flowchart? Despite their popularity, it's surprising how challenging it is to find a clear, step-by-step explanation of how Decision Trees work. (I'm actually embarrassed by how long it took me to actually understand how the algorithm works.)
So, in this breakdown, I'll be focusing on the essentials of tree construction. We'll unpack exactly what's happening in each node and why, from root to final leaves (with visuals of course).

Definition
A Decision Tree classifier creates an upside-down tree to make predictions, starting at the top with a question about an important feature in your data, then branches out based on the answers. As you follow these branches down, each stop asks another question, narrowing down the possibilities. This question-and-answer game continues until you reach the bottom – a leaf node – where you get your final prediction or classification.

Dataset Used
Throughout this article, we'll use this artificial golf dataset (inspired by [1]) as an example. This dataset predicts whether a person will play golf based on weather conditions.

# Import libraries
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import pandas as pd
import numpy as np
# Load data
dataset_dict = {
'Outlook': ['sunny', 'sunny', 'overcast', 'rainy', 'rainy', 'rainy', 'overcast', 'sunny', 'sunny', 'rainy', 'sunny', 'overcast', 'overcast', 'rainy', 'sunny', 'overcast', 'rainy', 'sunny', 'sunny', 'rainy', 'overcast', 'rainy', 'sunny', 'overcast', 'sunny', 'overcast', 'rainy', 'overcast'],
'Temperature': [85.0, 80.0, 83.0, 70.0, 68.0, 65.0, 64.0, 72.0, 69.0, 75.0, 75.0, 72.0, 81.0, 71.0, 81.0, 74.0, 76.0, 78.0, 82.0, 67.0, 85.0, 73.0, 88.0, 77.0, 79.0, 80.0, 66.0, 84.0],
'Humidity': [85.0, 90.0, 78.0, 96.0, 80.0, 70.0, 65.0, 95.0, 70.0, 80.0, 70.0, 90.0, 75.0, 80.0, 88.0, 92.0, 85.0, 75.0, 92.0, 90.0, 85.0, 88.0, 65.0, 70.0, 60.0, 95.0, 70.0, 78.0],
'Wind': [False, True, False, False, False, True, True, False, False, False, True, True, False, True, True, False, False, True, False, True, True, False, True, False, False, True, False, False],
'Play': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes', 'No', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes']
}
df = pd.DataFrame(dataset_dict)
# Preprocess data
df = pd.get_dummies(df, columns=['Outlook'], prefix='', prefix_sep='', dtype=int)
df['Wind'] = df['Wind'].astype(int)
df['Play'] = (df['Play'] == 'Yes').astype(int)
# Reorder the columns
df = df[['sunny', 'overcast', 'rainy', 'Temperature', 'Humidity', 'Wind', 'Play']]
# Prepare features and target
X, y = df.drop(columns='Play'), df['Play']
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.5, shuffle=False)
# Display results
print(pd.concat([X_train, y_train], axis=1), 'n')
print(pd.concat([X_test, y_test], axis=1))
Main Mechanism
The Decision Tree classifier operates by recursively splitting the data based on the most informative features. Here's how it works:
- Start with the entire dataset at the root node.
- Select the best feature to split the data (based on measures like Gini impurity).
- Create child nodes for each possible value of the selected feature.
- Repeat steps 2–3 for each child node until a stopping criterion is met (e.g., maximum depth reached, minimum samples per leaf, or pure leaf nodes).
- Assign the majority class to each leaf node.

Training Steps
In scikit-learn, the decision tree algorithm is called CART (Classification and Regression Trees). It builds binary trees and typically follows these steps:
- Start with all training samples in the root node.

2.For each feature: a. Sort the feature values. b. Consider all possible thresholds between adjacent values as potential split points.

def potential_split_points(attr_name, attr_values):
sorted_attr = np.sort(attr_values)
unique_values = np.unique(sorted_attr)
split_points = [(unique_values[i] + unique_values[i+1]) / 2 for i in range(len(unique_values) - 1)]
return {attr_name: split_points}
# Calculate and display potential split points for all columns
for column in X_train.columns:
splits = potential_split_points(column, X_train[column])
for attr, points in splits.items():
print(f"{attr:11}: {points}")
- For each potential split point: a. Calculate the impurity (e.g, Gini impurity) of the current node. b. Calculate the weighted average of impurities.


def gini_impurity(y):
p = np.bincount(y) / len(y)
return 1 - np.sum(p**2)
def weighted_average_impurity(y, split_index):
n = len(y)
left_impurity = gini_impurity(y[:split_index])
right_impurity = gini_impurity(y[split_index:])
return (split_index * left_impurity + (n - split_index) * right_impurity) / n
# Sort 'sunny' feature and corresponding labels
sunny = X_train['sunny']
sorted_indices = np.argsort(sunny)
sorted_sunny = sunny.iloc[sorted_indices]
sorted_labels = y_train.iloc[sorted_indices]
# Find split index for 0.5
split_index = np.searchsorted(sorted_sunny, 0.5, side='right')
# Calculate impurity
impurity = weighted_average_impurity(sorted_labels, split_index)
print(f"Weighted average impurity for 'sunny' at split point 0.5: {impurity:.3f}")
- After calculating all impurity for all features and split points, choose the lowest one.

def calculate_split_impurities(X, y):
split_data = []
for feature in X.columns:
sorted_indices = np.argsort(X[feature])
sorted_feature = X[feature].iloc[sorted_indices]
sorted_y = y.iloc[sorted_indices]
unique_values = sorted_feature.unique()
split_points = (unique_values[1:] + unique_values[:-1]) / 2
for split in split_points:
split_index = np.searchsorted(sorted_feature, split, side='right')
impurity = weighted_average_impurity(sorted_y, split_index)
split_data.append({
'feature': feature,
'split_point': split,
'weighted_avg_impurity': impurity
})
return pd.DataFrame(split_data)
# Calculate split impurities for all features
calculate_split_impurities(X_train, y_train).round(3)
- Create two child nodes based on the chosen feature and split point:
- Left child: samples with feature value <= split point
- Right child: samples with feature value > split point

- Recursively repeat steps 2–5 for each child node. You can also stop until a stopping criterion is met (e.g., maximum depth reached, minimum number of samples per leaf node, or minimum impurity decrease).




# Calculate split impurities forselected index
selected_index = [4,8,3,13,7,9,10] # Change it depending on which indices you want to check
calculate_split_impurities(X_train.iloc[selected_index], y_train.iloc[selected_index]).round(3)
from sklearn.tree import DecisionTreeClassifier
# The whole Training Phase above is done inside sklearn like this
dt_clf = DecisionTreeClassifier()
dt_clf.fit(X_train, y_train)
Final Complete Tree
The class label of a leaf node is the majority class of the training samples that reached that node.

import matplotlib.pyplot as plt
from sklearn.tree import plot_tree
# Plot the decision tree
plt.figure(figsize=(20, 10))
plot_tree(dt_clf, filled=True, feature_names=X.columns, class_names=['Not Play', 'Play'])
plt.show()

Classification Step
Here's how the prediction process works once the decision tree has been trained:
- Start at the root node of the trained decision tree.
- Evaluate the feature and split condition at the current node.
- Repeat step 2 at each subsequent node until reaching a leaf node.
- The class label of the leaf node becomes the prediction for the new instance.

# Make predictions
y_pred = dt_clf.predict(X_test)
print(y_pred)
Evaluation Step

# Evaluate the classifier
print(f"Accuracy: {accuracy_score(y_test, y_pred)}")
Key Parameters
Decision Trees have several important parameters that control their growth and complexity:
1 . Max Depth: This sets the maximum depth of the tree, which can be a valuable tool in preventing overfitting.