DECISION TREE
DECISION TREE
A decision tree is a non-parametric supervised learning algorithm used for both classification and regression tasks. It uses a flowchart-like structure to show the predictions that result from a series of feature-based splits. Think of it like a game of "20 Questions." The model asks a question about the data, branches out based on the answer, and continues until it reaches a final conclusion.
1. The Anatomy of a Decision Tree
To understand how they work, you need to know the "family tree" of its components:
Root Node: The very top of the tree. It represents the entire dataset and the first decision point.
Internal (Decision) Nodes: These represent a "test" or a question on a specific attribute (e.g., "Is the temperature > 30°C?").
Branches: These are the "outcomes" of a test (e.g., Yes/No), connecting the nodes.
Leaf Nodes: The end of the line. These nodes hold the final prediction or class label and do not split further.
2. How the Tree "Decides" to Split
The goal of a decision tree is to create the "purest" possible groups. It wants the data in each leaf node to be as similar as possible. It achieves this using specific mathematical metrics:
Entropy & Information Gain: Used primarily in classification. It measures the "disorder" or randomness in the data. The model chooses the split that reduces entropy the most.
Gini Impurity: A faster alternative to Entropy. It measures the frequency at which a randomly chosen element from the set would be incorrectly labeled.
Mean Squared Error (MSE): Used for regression trees to split nodes in a way that minimizes the variance of the target values.
3. Pros and Cons
Advantages
Easy to interpret: Humans can follow the logic easily (White-box model).
Little data prep: No need for feature scaling (normalization).
Versatile: Handles both numerical and categorical data.
Disadvantages
Overfitting: Trees can become overly complex and "memorize" noise in the data.
Instability: Small changes in the data can result in a completely different tree structure.
Bias: Can be biased toward features with more levels/categories.
4. Modern Evolution
Because single decision trees are prone to errors (overfitting), they are rarely used alone in high-stakes environments. Instead, they serve as the building blocks for powerful Ensemble Methods:
Random Forests: Building many trees and averaging their results.
Gradient Boosting (XGBoost/LightGBM): Building trees sequentially, where each new tree tries to fix the errors of the previous one.
Assignment Instructions
Complete all questions provided at the end of the material clearly and accurately.
Write your answers on A4-sized paper.
Submit the assignment before the Machine Learning class begins.
The deadline for submission is Thursday, April 9, 2026.