minibatch#

fastcan.minibatch(X, y, n_features_to_select=1, batch_size=1, tol=0.01, verbose=1)#

Feature selection using fastcan.FastCan with mini batches.

It is suitable for selecting a very large number of features even larger than the number of samples.

The function splits n_features_to_select into n_outputs parts and selects features for each part separately, ignoring the redundancy among outputs. In each part, the function selects features batch-by-batch. The batch size is less than or equal to batch_size. Like correlation filters, which select features one-by-one without considering the redundancy between two features, the function ignores the redundancy between two mini-batches.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Feature matrix.

  • y (array-like of shape (n_samples, n_outputs)) – Target matrix.

  • n_features_to_select (int, default=1) – The parameter is the absolute number of features to select.

  • batch_size (int, default=1) – The upper bound of the number of features in a mini-batch. It is recommended that batch_size be less than n_samples.

  • tol (float, default=0.01) – Tolerance for linear dependence check.

  • verbose (int, default=1) – The verbosity level.

Returns:

indices – The indices of the selected features.

Return type:

ndarray of shape (n_features_to_select,), dtype=int

Examples

>>> from fastcan import minibatch
>>> X = [[1, 1, 0], [0.01, 0, 0], [-1, 0, 1], [0, 0, 0]]
>>> y = [1, 0, -1, 0]
>>> indices = minibatch(X, y, 3, batch_size=2, verbose=0)
>>> print(f"Indices: {indices}")
Indices: [0 1 2]