Dimensionality Reduction

 

Features Selection Techniques

When building a Machine Learning (ML) model, it’s trained on features (predictors). These features should be carefully selected since inappropriate features and features that have an insignificant effect on response variable can harm, overfit or decrease the performance of your model. In fact, in order to build a good model, it is really important to choose the right variables that influence the response variable.

 

This is why we use feature elimination, the removal of a subset of detrimental features from your feature list, to obtain a better feature set (meaning, obtain a better performance). Each feature has a degree of importance associated with it, and ech impacts the prediction power of the model to its extent (in a positive and negative way). Therefore, it is important to rank which features are most valuable when building a model, and, if necessary, remove features that can harm the performance of your model.

Optimization of the feature set can vastly improve the prediction power of the model, hence why it is a key aspect of any ML project.

 

There are two types of dimensionality reductions:

  • linear  transformations

  • non-linear transformations.  ​​

 

A linear transformation between two vector spaces V and W is a map T:V−>W such that the following hold: