1. Unsupervised feature selection#
We can use FastCan to do unsupervised feature selection.
The basic idea of unsupervised feature selection is to use the learned features,
like PCA (principal component analysis), or the hand-crafted features, like Fourier
transform, as the targets and to select the features which are most correlated with
the targets.
In PCA cases, the unsupervised application of FastCan tries to select
features, which maximize the sum of the squared canonical correlation (SSC) with
the principal components (PCs) acquired from PCA of the feature matrix \(X\) [1].
See the example below.
>>> from sklearn.decomposition import PCA
>>> from sklearn import datasets
>>> from fastcan import FastCan
>>> iris = datasets.load_iris()
>>> X = iris["data"]
>>> pca = PCA(n_components=2)
>>> X_pcs = pca.fit_transform(X)
>>> selector = FastCan(n_features_to_select=2, verbose=0).fit(X, X_pcs[:, :2])
>>> selector.indices_
array([2, 1], dtype=int32)
Note
There is no guarantee that this unsupervised FastCan will select
the optimal subset of the features, which has the highest SSC with PCs.
Because FastCan selects features in a greedy manner, which may lead to
suboptimal results.
However, PCA does not take nonlinearity into consideration.
To solve the problem, targets (learned features) can be generated by manifold
learning [2].
Then, use FastCan to select features, which is the same as the above.
References