2. Multioutput feature selection#
We can use FastCan to handle multioutput feature selection, which means
target y can be a matrix. For regression, FastCan can be used for
MIMO (Multi-Input Multi-Output) data. For classification, it can be used for
multilabel data. Actually, for multiclass classification, which has one output with
multiple categories, multioutput feature selection can also be useful. The multiclass
classification can be converted to multilabel classification by one-hot encoding
target y. The canonical correlation coefficient between the features X and the
one-hot encoded target y has equivalent relationship with Fisher’s criterion in
LDA (Linear Discriminant Analysis) [1]. Applying FastCan to the converted
multioutput data may result in better accuracy in the following classification task
than applying it directly to the original single-label data. See Figure 5 in [2].
2.1. Relationship on multiclass data#
Assume the feature matrix is \(X \in \mathbb{R}^{N\times n}\), the multiclass target vector is \(y \in \mathbb{R}^{N\times 1}\), and the one-hot encoded target matrix is \(Y \in \mathbb{R}^{N\times m}\). Then, the Fisher’s criterion for \(X\) and \(y\) is denoted as \(J\) and the canonical correlation coefficient between \(X\) and \(Y\) is denoted as \(R\). The relationship between \(J\) and \(R\) is given by
or
It should be noted that the number of the Fisher’s criterion and the canonical correlation coefficient is not only one. The number of the non-zero canonical correlation coefficients is no more than \(\min (n, m)\), and each canonical correlation coefficient is one-to-one correspondence to each Fisher’s criterion.
References
Examples
See Fisher’s criterion in LDA for an example of the equivalent relationship between CCA and LDA on multiclass data.