Imagine that you have recorded the hours spent studying for an exam and the score on the exam for several students. We have a single independent variable (*x*), the hours spent studying for an exam, and a single dependent variable (*y*), the score of a student on the test. We can plot all of these data points on a 2-axis plot, where each point represents an individual student:

We call these points our *training data* because they are used to train our model. This model will be used to make predictions about other students who may take this exam in the future. Linear regression works by finding *the line of best fit*. We will talk in more detail soon about how this line of best fit is determined, but for now, just think about it as the line that best represents the data points.

In our exam example, the line of best fit would be the following:

To make a prediction, all you need to do is draw a line from the x-axis where your input value lies to the line of best fit. The y-axis value associated with the input x-value is the model’s prediction:

If we input *x = 4* hours to the model, we see that the linear regression model predicts an exam score of *y = 65*. We can input any number of hours to the model, and it will predict an exam score.

And that’s really the big idea of machine learning! You *fit* a model, in this case a line, to the training data, and use this line to make predictions on new unseen test data.

The math behind linear regression is centered on the equation for a line: *y = mx + b*. In this equation, the *x* represents the data used to make a prediction and the *y* represents the model’s prediction. The slope (*m*) and y-intercept (*b*) are called the *parameters* of the model because all the machine learning process is doing is learning the value of m and b for the line of best fit.