In the real world, images are not 2 pixels by 2 pixels. You might not even be able to see an image that small! Instagram profile pictures – one of the smaller images you see each day – is still only 360 by 360 pixels. An Instagram story is 1080 by 1920 pixels. That’s 1080 x 1920 = 2,073,600 pixels. That’s a lot of m‘s we have to learn!
Another issue with standard approaches with the methods we have covered so far is a problem called translational variance. To demonstrate this concept, consider these two images:
These are the same images, except the “smiley face” is in the bottom right corner in image A and in the top left corner in image B. With a standard neural network like we learned about in Chapter 5, the node for pixel x8 will learn to look for the left eye with image A but will look for the left eye with the node for pixel x1 with image B. Similarly for logistic regression / k-means clustering / decision trees, the input variable to look for the left eye will be different between image A and image B. This translational variance confuses the classifier training process and will result in bad performance unless the objects in the images are in the exact same position in all input images. This is not a realistic assumption!
Convolutional neural networks (CNNs) solve this issue and are therefore called translation invariant.