The basic structure for classification is this:
We take input data, feed it into one of many possible classification methods, and get a prediction category as output. In actuality, the output of the machine learning is a probability score assigned by the classifier for each possible, like this:
The final category is chosen as the one with the highest probability score (in this case, “dog”).
The underlying mathematical task for classification is a bit different than for regression in Chapter 1. Instead of learning a line or other function that can be used to predict the value of one variable from another set of variables, we want to predict a line or function that separates categories.
Let’s visualize this with an example:
In the plot above, each data point is a different person, where the average hours of exercise that person gets per day is on the x1 axis and the average calories that person consumes per day is on the x2 axis. If that person is clinically overweight, their data point is represented as a circle. If they are not clinically overweight, their data point is represented as a square.
We want to predict whether that person is overweight or not based on their daily exercise and calorie intake. The mathematical goal here is to learn the line (or other function) that best separates these categories:
After training the model with the training set, if we get a new person (data point), we simply see which side of the line they fall on:
Try to guess what the predictions for each of the question marks are before looking at the next page.
Did you guess the following?
Great work! That’s the main idea of classification. There are many different methods for doing classification.