Blogs/Dimensionality Reduction Intuition

Dimensionality Reduction Intuition

peterwashington Nov 02 2021 2 min read 0 views
Dimensionality Reduction

Sometimes, there are many input variables for a classifier. In most of the examples we have shown so far, we are only predicting from one or two variables. In many real-world scenarios, you have hundreds, thousands, or even millions of input variables.

For example, let’s say you want to predict something about someone based on their answers to a 100-question multiple choice exam. We could create a classifier with 100 different inputs. Another approach, however, is to reduce these 100 data points into a smaller number of variables that preserve the same separation of data points in the higher dimensional space.

To see this, let’s imagine that we want to perform dimensionality reduction going from 2 variables to 1 variable. The data may look like this in 2 dimensions:

After doing dimensionality reduction, the data may look like this:

Notice how the relative distance between the data points in the 2-dimensional space is preserved in the 1-dimensional space. Dimensionality reduction techniques can do this from any number of starting dimensions (such as 1,000 dimensions) going down to any small number of dimensions (such as 3 dimensions).

A big use for this is to enable visualization of the data points. Other use cases are to enable faster and more efficient model training due to less variables as well as removing unnecessary aspects of the input.