2. Multioutput feature selection#

We can use FastCan to handle multioutput feature selection, which means target y can be a matrix. For regression, FastCan can be used for MIMO (Multi-Input Multi-Output) data. For classification, it can be used for multilabel data. Actually, for multiclass classification, which has one output with multiple categories, multioutput feature selection can also be useful. The multiclass classification can be converted to multilabel classification by one-hot encoding target y. The canonical correlation coefficient between the features X and the one-hot encoded target y has equivalent relationship with Fisher’s criterion in LDA (Linear Discriminant Analysis) [1]. Applying FastCan to the converted multioutput data may result in better accuracy in the following classification task than applying it directly to the original single-label data. See Figure 5 in [2].

2.1. Relationship on multiclass data#

Assume the feature matrix is \(X \in \mathbb{R}^{N\times n}\), the multiclass target vector is \(y \in \mathbb{R}^{N\times 1}\), and the one-hot encoded target matrix is \(Y \in \mathbb{R}^{N\times m}\). Then, the Fisher’s criterion for \(X\) and \(y\) is denoted as \(J\) and the canonical correlation coefficient between \(X\) and \(Y\) is denoted as \(R\). The relationship between \(J\) and \(R\) is given by

\[J = \frac{R^2}{1-R^2}\]

or

\[R^2 = \frac{J}{1+J}\]

It should be noted that the number of the Fisher’s criterion and the canonical correlation coefficient is not only one. The number of the non-zero canonical correlation coefficients is no more than \(\min (n, m)\), and each canonical correlation coefficient is one-to-one correspondence to each Fisher’s criterion.

References

Examples