The Machine Learning Blog

How to train a Machine to recognize handwritten numbers in a picture (or anything else for that matter)

Recognizing hand written numbers

In this post, I will explain how to use Machine Learning to build a piece of software that is able to recognize handwritten numbers. I will also explain what other applications this could have for you and your business. This being said, I will not go into implementation details, nor into the mathematics of how this works, the post will just explain the general idea behind this principle.

For a machine, recognizing something in a picture is not an easy task.
Machines are quite stupid and cannot understand images. They can only understand the existence or the absence of pixels. We therefore have to put everything in terms they can understand if we want them to make such a simple task...

Analyzing images pixel by pixel

If we use 20x20 pixels images, each one will have a total of 400 pixels... And if they are in black and white, each pixel will have a level of gray, going from 0 (completely white) to 100 (completely black).
Every one of the images in my training set can be represented by a series of 400 numbers, each one with values ranging from 0 to 100.

If we replace the image picture with all the numeric values of its pixels, we will be able to feed them to the machine and apply some mathematics to train our Neural Network.

It takes 400 inputs to give 1 output

We will use supervised learning with a neural network that will be able to classify any given image. If you haven't read my blog on what Neural Networks are, read it here.
The idea is to create a mathematical model that will take 400 inputs (pixel values) and calculate the influence (weight) each value has on the outcome.
At this point, our Machine Learning model would look something like this:

For the moment, we only have one example, so our model will not able to be very precise at calculating the importance of each pixel of what represents the number 3 in a given image... That is why it is very important to have large amounts of data.
This will allow us to feed our model with more training examples and different variations of the same number 3.

If we have enough examples of how the number "3" is written, we can feed them to our model and we will be able to compute the impact (weight) that each pixel level of blackness had on making that particular image a number 3.

The dataset to train the model

In order to train a piece of software to recognise hand written numbers, we first need to start by providing it with lots of examples. Gladly for us, the MNIST Database of handwritten digits was uploaded to the web by a group of very altruistic people who we cannot thank enough! This dataset contains 60 thousand black and white handwritten images, each one made from 20x20 pixels.
​When plotted in a math software like Octave (widely used for machine learning), they look something like this.

Multilayer neural networks

In my post about neural networks, I explained the concept of multi layer neural networks with hidden inputs.
If you already read my post, you should know by now that linear models do not always fit your data, thus the importance of having hidden layers.
What this means concretely, is that our 400 inputs will generate a given number of "hidden" inputs and that these will be the ones that ultimately decide the output of my model.
Our Machine Learning model now looks something like this:

Each one of the lines interconnecting the inputs are called "weights" or "parameters".
The goal of machine learning is to use the training examples to compute the "parameters" that best predict if a completely new image is a number 3 or not.

One vs all classification

Needles to say, a software that predicts if a number is a 3 or not, is completely worthless. What we need is a piece of software that is able to distinguish any new image and correctly guess what number it is.
Concretely, what we need to do is train 10 different mathematical models.
Or Neural Network would look something like this.

Our Machine Learning model will be able to take any given new image, apply the learning weights (parameters) that we found while training it and give us the 10 different probabilities of it fitting in each model.
For instance, an image can have:

• 10% chances of being a 0
• 15% chances of being a 1
• 86% chances of being a 2
• etc

What we will do is simply pick up the highest probability and make our prediction based on that information.
That is all there is to it :-)

Why go through all this trouble?

Why go through all this trouble to make a machine be able to achieve such a simple task that a human can do in less than a second?
Well, you could pay a human the minimum wage to do this, but there is a finite amount of numbers a person can recognize. A machine on the other hand, can be tasked with repeating the same boring task billions of times, without any rest or vacations. The machine will never get bored, ask for a raise or call in sick...

Are there any applications to this?

You could train a model to understand letters. Here are a couple of examples of what can be achieved by scanning a picture with handwritten text:

• Shipping companies can understand handwritten address in packages
• Documents can be scanned to get digital versions of them
• Sales people can scan business cards
• The blind could "hear" written texts
• etc

Going beyond numbers and text

Numbers and letters are just human inventions. The potential of training a machine to distinguish patterns in images can be applied to virtually anything in the world. One creative application of this is autonomous driving.

In this specific case, the idea is to put a camera on a car and take images of the road every second. When training the model, for each given image, the correct "answer" will be the position of the wheel at that precise moment.
If the wheel can be steered in 180 degrees, we will train 180 models so that Neural Network can predict the best position of the wheel for any given new image that appears on the camera and to react accordingly. This is actually one of the first ideas behind autonomous driving.

We can train a machine to recognize anything !

Ok, so let's get creative here:

2. What if we trained a model to recognize your VIP customers faces when they enter your store?
3. Can you imagine creating a model that is able to sort and classify pieces in an assembly line?
4. What about teaching police cameras to recognize missing people's faces in the street?

With enough training data, we can take anything in an image (or a video) and teach a Machine how to recognize it.​
The possibilities are endless and the only limit is our imagination.