The Machine Learning Blog

Artificial Intelligence and Mathematics. Part II: Illustrative example


The present article corresponds to part 2 of the topic "Artificial Intelligence and Mathematics". For your understanding I recommend to review the first part beforehand.

In the previous article I illustrated the neuronal model with an input layer, a series of hidden layers (hidden layers) and an output layer.

How to feed the model with a classifiable entity and how to interpret the output information?

Let's assume that we have a model trained to recognize images of cats and that we want to use that model to classify the image of figure 1 as a cat or not a cat.




Figure 1:


Image "cat1.jpg"


Figure 1 has several characteristics:

  1. It is a small digital image of 30x30 pixels
  2. It is an image with .jpg format, which means that each pixel in the image has three components or layers of color, Red, Green and Blue (RGB).
  3. Each color is represented by a value between 0 and 255, 0 for absence of pigment and 255 for 100% pigment.

According to the above, the bit-by-bit information of the image can be visualized like this:




Figure 2. Digital information of the file "cat1.jpg".


For illustrative purposes, the image has been reduced to 5X5 pixels.



As can be seen, the first pixel (upper left) has three layers (255, 238, 44), corresponding to the Red, Green and Blue composition values ​​respectively.

Then, if the image has 30x30 pixels, the file size will be (30 X 30 X 3 = 2,700 bytes) one byte per pixel per layer.

To use this image in the model, it is necessary to previously "Vectorize" the information, that is, convert the original array (30X30X3) to a column vector (2700X1). See Figure 3.



Figure 3. Digitalized vector information of the file "cat1.jpg".



The 2,700 bytes obtained correspond to the input layer of the model (Input yesterday), see figure 4.


neural ne twork cat


Figure 4. Information of the input layer (Input Layer). Contrast the indicated values ​​with the information contained in figures 1 and 2.




The number of parameters between nodes layers is defined as the product of nodes of the two layers to be joined. Thus, to join all the nodes of the input layer with the first hidden layer, 6 X 2,700 = 16,200 parameters are required, and so on.

By virtue of being strict, neither figure 4 nor the theory explained include some additional parameters per layer, which specify the "bias" or thresholds of the model. They are omitted only for purposes of general explanation of classification performance.

With the input layer fed and the previously defined parameters, it is then possible to calculate first the values ​​of Hidden Layer 1, then Hidden Layer 2 and finally the Output Layer, an operation known as "Forward Propagation".

However, the output layer has only one node (Figure 4). This node represents the general output of the model, a single value that must represent the classification obtained.

Once calculated its value it is necessary to transform it into a value between zero and one, through a mathematical treatment called "Activation function".

Converting the output to a value between zero and one implies that said value is transformed into a probability, if greater than 0.5 indicating that the image has been classified as "Cat", if less than or equal to 0.5 as "No Cat" .

The classification boundary is subjective, and in some cases it can run towards values ​​higher than 0.5, in others at lower values.

Imagine that the ones you are classifying are not images of cats but images of skin tumors, in malignant-benign classifications.

In this case, a false positive, classifying an image as a malignant tumor, in reality being a benign tumor, has too strong implications for the patient, so that it is probably better to specify the border around 0.6 or even higher.

How many hidden layers should the model have? How many nodes in each layer? This is something that the Data Scientist must define. In general, four to six hidden layers, 200 to 300 nodes per layer is common to obtain results with probability of error lower than 5% in the classification.

Several concepts should be clear at this time of reading:

  1. The model behaves like "A function of functions", with an input (the image) and an output, the classification.
  2. Each model has a characteristic parameter set, which makes up the "fingerprint of the model". Thus, a cat classifier model will have a different set of parameters than a tumor classifier model.
  3. The information is distributed throughout the entire model, and not located in a specific site.
  4. Learning means obtaining the fingerprint that characterizes the model. Classification means using that fingerprint to identify an image.


What is the training?

In general, the model is trained with images whose classification is known, and is based on a set of randomly selected parameters. Based on the classification result of the "Training Images" set, training Set, it is possible to calculate the classification error. With this error an algorithm is fed that recalculates the parameters to new values, Back Propagation, so that in a second run a better approximation (minor error) of classification is obtained.

This operation is repeated 10,000 or more times until an adequate margin of error is obtained.

Once trained the model, you can test the quality of your fingerprint with test images, Test Set. In this way, if everything goes well, you have a properly trained model.


And what does all this machine learning have to do with nature, the central object of my blogs?

Understanding the functioning of neural networks allows us to approach from a new perspective issues inherent in life itself:

  1. How artificial is artificial intelligence?
  2. Will it be possible for the brain to operate in a similar way?
  3. Can a memory be distributed in a set of neuronal synapses?
  4. Can organisms and ecosystems be modeled by neural networks?
  5. Can Neural Networks generate a measure of the complexity of ecosystems?
  6. Where memory can reside in our brain, and more importantly, what can memory be?
  7. How can consciousness be understood? What can memories be?

With the next article, last of the topic, I hope to raise these concerns, not the answers, in the neural network of your brain.


    Artificial Intelligence
    Back propagation
    Deep learning
    Neural Networks