Blogs/Measuring your Regression Model's Performance

Measuring your Regression Model's Performance Aug 18 2021 2 min read 176 views

Hi, I am a data scientist who believe that 'Torture the data and It'll confess everything'. I am working on image processing and Natural Language Processing.

Fundamentals Prob. and Stats
Measuring your Regression Model's Performance.jpeg

When you develop any machine learning model, it is crucial to measure the performance of this model. Several differentt methods are used for measurement of the performance. Some of the most popular for regression problems are: Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). In this article we'll be getting some insights about root mean square error and how is it calculated.

First let's understand what is RMSE and what is its importance. RMSE is one of the most used metrics for determining the quality of the predictions. It measures how far is the predicted value from the actual one. In machine learning, it is important that you have a single number that can determine the quality of your model during any phase of model learning (gradient descent). RMSE follows a proper scaling rule which is compatible with most common statistical assumptions. 

Let's' see how is this calculated. First of all, the difference between the predicted value and the actual value is taken for each of the data points in the dataset, and then the norm of this value is taken which is further averaged over the number of data points and finally the square root of this value is taken. Mathematically:

\(RMSE = \sqrt{\Sigma_{i=1}^n{||y(i) - x(i)||}^2 \over N}\)

Where y(i) is the actual value of the i-th measurement and x(i) is the prediction for corresponding measurement and N is the number of data points. In the numerator, the expression is analgous to the distance between two vectors. This gives us a basic idea about which RMSE works, i.e., the average magnitude of the distance between predicted and the actual value. 

There are some disadvantages which exist for RMSE. It emphasizes more on the greater errors than the samller ones. Also, it is more sensitive than MAE in presence of the false data. Both of these disadvantages can ruin your model reuslts. Thus, care should be taken while using RMSE. It can easily be computed in python using sklearn. 


Learn and practice this concept here: