Feature Reduction #
Why? #
Too many features in a dataset complicates the model’s prediction strategy. Since most clustering models use some sort of distance measure, too many dimensions will result in many isolated clusters.
How many is too many? #
One indication is when there are too many features than the observations.
How? #
- Principal Component Analysis
- Non-Negative Matrix Factorization
- Linear discriminant analysis
- t-SNE