peter.washington

5 min read 25 views

Classification on Images

Classification on Images_image

Across all of computing, and not just in machine learning, images are represented as a matrix of pixels. A matrix is not complicated – it is just a grid of numbers.  Here is an example of a small matrix with dimensions 2x2:

In the case of computer representations of images, each number in the matrix represents a different pixel. Where the number appears in the matrix corresponds to where that pixel appears in the image.

For example, for black and white images, a pixel value of 0 represents “black” and 255 represents “white”. Pixel values from 1 to 254 represent increasingly lighter shades of gray. Let’s say we have the following (very pixelated, very small, yet very zoomed in) image:

This corresponds to the following 10x7 matrix:

In reality, most images are not black-and-white. To get color image, computers use 3 different matrices: a red matrix, a blue matrix, and a green matrix. These 3 matrices all have the same dimensions, and each represents a “color channel”.

This is similar to how the human brain represents color: we have light receptors which are sensitive to blue light, red light, and green light. Different combinations and intensities of red, blue, and green light can produce any color on the color wheel, and this is why we as humans can see all sorts of colors even though we only have receptors for red, blue, and green light.

We call a multi-dimensional matrix a “tensor”. Here is an example of an RGB (red, green, blue) tensor for a tiny 2x2 image:

Classification and regression with matrices representing images using the methods we have learned about so far works the same as before, with one slight modification: since the inputs to those methods are 1-dimensional vectors, we must transform the 2-dimensional (black and white) or 3-dimensionaal (color) image representation into a 1-dimensional vector. This process is called “flattening”.

The above RGB image represented as a flattened vector is as follows:

Each pixel value above represents a different input variable (or “feature”). If we wanted to run logistic regression with the above representation, then we just do regular logistic regression with each pixel in the image as a different input variable from x1 to x12.