Trying Out Both Random Forest & Decision Tree Classifier On MNIST Using Scikit-Learn

6 min readMay 8, 2023

It’s just a piece of cake. Let’s go!!

Decision Tree is dependent on a single tree whiles Random Forest uses multiple trees | Rosaria Silipo

In the ‘Beginner’s guide to Random Forest Algorithm Classifier’ post, I explained why Random Forest is a much better algorithm to use when you have data with high dimensionality or a big data.

Today, I want to demonstrate how the two differ in terms of accuracy and robustness using the MNIST dataset which contains 60,000 greyscale images of handwritten digits.

Note : Though the MNIST dataset is quite large with 784 features, it is not regarded to be a high dimensionality dataset in most cases. Most High Dimensionality datasets have their number of features greater than or equal to the number of observations

I will train this dataset using a Decision Tree and a Random Forest so you can appreciate the strength and — of random forest.

First, we will import our needed modules from the scikit-learn library

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier

After importing our needed classes from the scikit library, we now load the MNIST dataset into the notebook

# loading the mnist dataset into the notebook
mnist = fetch_openml('mnist_784',version=1)

We split our dataset into training and test dataset. The X datasets are the features the model will learn from to predict the y’s.

More like how you look at a person’s physical features and guess with certainty who it could be without seeing their face. Because you have stored their features (gait, stature, head shape, etc ) in memory and can use it

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(mnist.data, mnist.target, 
                                                    test_size=0.2, 
                                                    random_state=42)

Digits From the MNIST Dataset

Training The Model Using A Random Forest

Now that we have our train and test datasets. We can instantiate the Random Forest Classifier for training the model to make the right predictions

# Instantiate a random forest classifier
rf = RandomForestClassifier(n_estimators=100, random_state=42)

# Fit the model to the training data
rf.fit(X_train, y_train)

The ‘ fit ’ function instructs the model to learn the patterns in the data and adjust the parameters to minimize the error between the predicted and actual output. This process is called Training the Model. Here, the model is given both X_train and y_train to learn from, finding matching patterns in each observation or datapoint of the X_train that gives it corresponding y value.

It simply learns the patterns in the X that gives them their respective y values

The ‘predict’ function from the RandomForestClassifier class takes the X values of the test dataset then uses it to predict the corresponding y outputs. It learns from the X_test values using the pattern it learnt from the X_train dataset during training of the model.

Then uses the pattern from X_test to predict its output. The output should be same or near the values of the y_test. This is because X_test produces y_test and X_train produces y_train

Now let us calculate the accuracy of the model

The accuracy of the model using Random Forest Classifier is 96.7%

Training The Model Using A Decision Tree

# Instantiate a random forest classifier
dt = DecisionTreeClassifier(random_state=42)

Creating a decision tree model to compare with the accuracy of the Random Forest Algorithm

# Fit the model to the training data
dt.fit(X_train, y_train)

Using Decision Tree to train the model on the train dataset. The classifier model finds the patterns in X_train that makes it produce its corresponding y_train

predicting target y values using the test data

The accuracy of the Decision Tree Model is 86.9%

Both models are trying to predict the values in y_test in the end with certainty. So let’s see which one was more accurate. From the accuracy of the two models, the Random Forest classifier should do a better job at predicting than the Decision Tree.

The RF scores an accuracy of 96.7% whiles the DT scores an accuracy of 86.7%. This clearly shows in their prediction, the first three values in the y_test(the values we are predicting) are [8, 4, 8,…]. The RF model predicted [8, 4, 8, … ] and the DT predicted [8, 4, 5, …]. Clearly showing the RF was more accurate.

For small datasets, RF and DT may have the same or almost the same accuracy.

Confusion Matrix showing the Performance of Each Model

*Confusion Matrix for the Random Forest*

Breakdown of the Confusion Matrix of the Random Forest Algorithm

The rows of the matrix represent the true classes and the columns represent the predicted classes. The numbers in the matrix represent the counts of true positive (TP), false positive (FP), true negative (TN), and false negative (FN) classifications for each class.

The first row represents the digit 0. The numbers in this row represent the number of times that the model correctly predicted digit 0 (TP = 1325), incorrectly predicted one of the other digits as digit 0 (FP = 0 + 4 + 1 + 4 + 1 + 4 + 3 + 1 + 4= 22), and incorrectly predicted digit 0 as one of the other digits (FN = 0 + 4 + 0 + 1 + 1 + 3 + 1 + 6 + 2= 18).
The second row represents the digit 1. The numbers in this row represent the number of times that the model correctly predicted digit 1 (TP = 1573), incorrectly predicted one of the other digits as digit 1 (FP = 5 + 0 + 1 + 3 + 1 + 5 + 6 + 6 = 27), and incorrectly predicted digit 1 as one of the other digits (FN = 6 + 8 + 2 + 0 + 0 + 6 + 3 + 2= 27).

The tenth row represents the digit 9. The numbers in this row represent the number of times that the model correctly predicted digit 9(TP = 1344), incorrectly predicted one of the other digits as digit 9(FP = 2 + 2+ 2+ 8 + 22 + 2 + 0 + 15 + 7= 60), and incorrectly predicted digit 9 as one of the other digits (FN = 4 + 6 + 5 + 16 + 20 + 5 + 1 + 10 + 9= 76).

Same applies to the confusion matrix of the Decision Tree.

NB: The true positives are always the diagonal figures in the matrix.

Yeah!! so that’s the end but not everything to it.

Where Do Data Scientist Go Camping?

A Random Forest!! Huh!!

Trying Out Both Random Forest & Decision Tree Classifier On MNIST Using Scikit-Learn

Digits From the MNIST Dataset

Training The Model Using A Random Forest

Training The Model Using A Decision Tree

Confusion Matrix showing the Performance of Each Model

Breakdown of the Confusion Matrix of the Random Forest Algorithm

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Kaku

No responses yet

More from Kaku

Machine Learning Fundamentals: The Bias and Variance Trade-off Clearly Explained

Understanding Bias and Variance is necessary to training a robust and effective model. Without a good knowledge of bias and variance, your…

WHAT ARE LLMs(Large Language Models, eg: ChatGPT)? Applications And Their Potential Risks

Large Language Models , Their Applications And How They Will Affect The Work Industry

Machine Learning, Artificial Intelligence, and Deep Learning. How Are They Related?

Let’s dive into how these fields leading the 4th industrial revolution are closely related

Machine Learning Fundamentals: Standardization and Normalization

Standardization and normalization are both feature scaling techniques used a lot in machine learning. These two concepts are often used…

Recommended from Medium

Support Vector Machine (SVM) Algorithm

Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression tasks. While it can handle…

Interpreting Support Vector Machine Coefficients: A Comprehensive Analysis

In the rapidly advancing landscape of artificial intelligence (AI) and machine learning (ML), specific methodologies and their…

Part -02: Understanding Path Parameters in FastApi

What are Path Parameters?

Bagging vs. Boosting: The Power of Ensemble Methods in Machine Learning

How to maximize predictive performance by creating a strong learner from multiple weak ones

Feature Importance Methods Part 1. Mean Decrease in Impurity Feature Importance

This is the feature importance that comes for free for decision trees and ensembles of trees! No additional computation needed :)

K-Means Clustering vs KNN Classification (K-Nearest Neighbors)

When learning machine learning, it’s common to come across algorithms like K-Means and K-Nearest Neighbors (KNN). While their names might…