Tongji Sheffield SHMC DRG

fastcan: A fast canonical-correlation-based search algorithm#

conda Codecov CI Doc PythonVersion PyPi ruff pixi asv ty

fastcan is a search algorithm that supports:

  1. Feature selection

    • Supervised

    • Unsupervised

    • Multioutput

  2. Term selection for time series regressors (e.g., NARX models)

  3. Data pruning (i.e., sample selection)

Key advantages:

  1. Extremely fast – Designed for high performance, even with large datasets

  2. Redundancy-aware – Effectively handles feature or sample redundancy to select the most informative subset

  3. Multioutput – Natively supports matrix-valued targets for multioutput tasks

Check Home Page for more information.

Installation#

Install fastcan via PyPi:

  • Run pip install fastcan

Or via conda-forge:

  • Run conda install -c conda-forge fastcan

Getting Started#

>>> from fastcan import FastCan
>>> X = [[ 0.87, -1.34,  0.31 ],
...     [-2.79, -0.02, -0.85 ],
...     [-1.34, -0.48, -2.55 ],
...     [ 1.92,  1.48,  0.65 ]]
>>> # Multioutput feature selection
>>> y = [[0, 0], [1, 1], [0, 0], [1, 0]]
>>> selector = FastCan(
...     n_features_to_select=2, verbose=0
... ).fit(X, y)
>>> selector.get_support()
array([ True,  True, False])
>>> # Sorted indices
>>> selector.get_support(indices=True)
array([0, 1])
>>> # Indices in selection order
>>> selector.indices_
array([1, 0], dtype=int32)
>>> # Scores for selected features in selection order
>>> selector.scores_
array([0.91162413, 0.71089547])
>>> # Here Feature 2 must be included
>>> selector = FastCan(
...     n_features_to_select=2, indices_include=[2], verbose=0
... ).fit(X, y)
>>> # The feature which is useful when working with Feature 2
>>> selector.indices_
array([2, 0], dtype=int32)
>>> selector.scores_
array([0.34617598, 0.95815008])

NARX Time Series Modelling#

fastcan can be used for system identification. In particular, we provide a submodule fastcan.narx to build Nonlinear AutoRegressive eXogenous (NARX) models. For more information, check this NARX model example.

Support WASM Wheels#

fastcan is compiled to WebAssembly (WASM) wheels using pyodide. You can try it in a REPL directly in a browser, without installation. However, the version of fastcan may be delayed in pyodide. If the latest fastcan WASM wheels are required, you can find them on the assets of GitHub releases, and the installation is required. The WASM wheels of fastcan can be installed by

>>> import micropip
>>> await micropip.install('URL of the wasm wheel (end with _wasm32.whl)')

📝 Note: Due to the Cross-Origin Resource Sharing (CORS) block in web browsers, you may need Allow CORS: Access-Control-Allow-Origin Chrome extension.

📝 Note: The nightly wasm wheel of fastcan’s dependency (i.e. scikit-learn) can be found in Scientific Python Nightly Wheels.

Citation#

fastcan is a Python implementation of the following papers.

If you use the h-correlation method in your work please cite the following reference:

@article{ZHANG2022108419,
   title = {Orthogonal least squares based fast feature selection for linear classification},
   journal = {Pattern Recognition},
   volume = {123},
   pages = {108419},
   year = {2022},
   issn = {0031-3203},
   doi = {https://doi.org/10.1016/j.patcog.2021.108419},
   url = {https://www.sciencedirect.com/science/article/pii/S0031320321005951},
   author = {Sikai Zhang and Zi-Qiang Lang},
   keywords = {Feature selection, Orthogonal least squares, Canonical correlation analysis, Linear discriminant analysis, Multi-label, Multivariate time series, Feature interaction},
}

If you use the eta-cosine method in your work please cite the following reference:

@article{ZHANG2025111895,
   title = {Canonical-correlation-based fast feature selection for structural health monitoring},
   journal = {Mechanical Systems and Signal Processing},
   volume = {223},
   pages = {111895},
   year = {2025},
   issn = {0888-3270},
   doi = {https://doi.org/10.1016/j.ymssp.2024.111895},
   url = {https://www.sciencedirect.com/science/article/pii/S0888327024007933},
   author = {Sikai Zhang and Tingna Wang and Keith Worden and Limin Sun and Elizabeth J. Cross},
   keywords = {Multivariate feature selection, Filter method, Canonical correlation analysis, Feature interaction, Feature redundancy, Structural health monitoring},
}

If you just want to cite the fastcan software, please use the following reference:

@article{WANG2026102598,
   title = {fastcan: A fast canonical-correlation-based searching algorithm},
   journal = {SoftwareX},
   volume = {34},
   pages = {102598},
   year = {2026},
   issn = {2352-7110},
   doi = {https://doi.org/10.1016/j.softx.2026.102598},
   url = {https://www.sciencedirect.com/science/article/pii/S2352711026000919},
   author = {Tingna Wang and Sikai Zhang and Lin Chen and Limin Sun},
   keywords = {Machine learning, Scikit-learn, Feature selection, Data pruning, Time series, System identification, NARX},
}

Architecture Diagram#

@startuml fastcan
skinparam backgroundColor transparent
!theme C4_blue_new from <C4/themes>
!include <C4/C4_Component>
!include <logos/numpy>
!include <logos/python>
!include https://raw.githubusercontent.com/MatthewSZhang/gilbarbara-plantuml-sprites/refs/heads/master/sprites/sklearn-icon.puml
!include https://raw.githubusercontent.com/MatthewSZhang/gilbarbara-plantuml-sprites/refs/heads/master/sprites/cython-icon.puml
!include https://raw.githubusercontent.com/MatthewSZhang/gilbarbara-plantuml-sprites/refs/heads/master/sprites/scipy.puml

AddContainerTag("module", $legendText="module")
AddContainerTag("data", $legendText="input/output", $sprite="numpy", $bgColor="gray", $fontColor="white", $borderColor="gray")
AddComponentTag("python", $legendText="Python code", $sprite="python")
AddComponentTag("cython", $legendText="Cython code", $sprite="cython-icon", $bgColor="gold", $fontColor="brown", $borderColor="gold")
AddComponentTag("sklearn", $legendText="sklearn Estimator", $sprite="sklearn-icon", $bgColor="orange", $fontColor="black", $borderColor="orange")
AddComponentTag("scipy", $legendText="SciPy", $sprite="scipy", $bgColor="#2253A1", $fontColor="white", $borderColor="#2253A1")
UpdateContainerBoundaryStyle($type="module", $legendText="module boundary")


Container_Boundary(fastcan, "fastcan", $descr="A library for fast feature engineering and data preprocessing"){
    Component(cancorr_fast, "cancorr_fast", $tags="cython", $descr="Fast canonical correlation based forward search")
    Component(FastCan, "FastCan", $tags="sklearn", $descr="Feature selector")
    Component(minibatch, "minibatch", $tags="python", $descr="Prunes samples in batch-wise")
    Component(refine, "refine", $tags="python", $descr="Refines selection of FastCan")


    Rel(FastCan, cancorr_fast, "Sends features to", "arrays")
    Rel(minibatch, cancorr_fast, "Sends samples to", "arrays")
    Rel(FastCan, refine, "Sends selected features to", "arrays")
    Rel(refine, cancorr_fast, "Sends features to", "arrays")
}

Container_Boundary(narx, "narx", $descr="A submodule for NARX modelling"){
    Component(make_narx, "make_narx", $tags="python", $descr="Builder for NARX model instances")
    Component(narx_fast, "narx_fast", $tags="cython", $descr="Fast computation of gradient and prediction for NARX models")
    Component(NARX, "NARX", $tags="sklearn", $descr="NARX model")
    Component(time_shift, "time_shift", $tags="python", $descr="Transforming time-series to time-shifted features")
    Component(poly, "poly", $tags="python", $descr="Nonlinearises features with polynomial basis functions")
    ' Component(tp2fd, "tp2fd", $tags="python", $descr="Converts time_shift ids and poly ids to feat ids and delay ids")
    Component(print_narx, "print_narx", $tags="python", $descr="Prints NARX model summary")

    Rel(NARX, print_narx, "Sends NARX model to", "NARX model")
    ' Rel(make_narx, poly, "Makes polynomial features using", "unique id numbers")
    Rel(make_narx, time_shift, "Sends time series to", "arrays")
    ' Rel(make_narx, tp2fd, "Sends time_shift ids and poly ids to", "unique id numbers")
    Rel(time_shift, poly, "Sends time-shifted features to", "arrays")
    Rel(poly, FastCan, "Sends polynomial features to", "arrays")
    ' Rel(tp2fd, NARX, "Sends feat ids and delay ids to", "unique id numbers")
    Rel(NARX, narx_fast, "Sends initial conditions of inputs, prediction and gradients to", "fit, predict")

}

Person(person, "User", $descr="A data scientist or developer using NumPy, SciPy, and scikit-learn")
ContainerDb(output, "fastcan output", $tags="data", $techn="indices", $descr="Selected indices of features or samples")
ContainerDb(input, "fastcan input", $tags="data", $techn="arrays, allow multi-output", $descr="Input data")
ContainerDb(narx_output, "narx output", $tags="data", $techn="arrays", $descr="Prediction and gradients of NARX model")
ContainerDb(narx_input, "narx input", $tags="data", $techn="arrays, allow nan, allow multi-output", $descr="Time-series data")

Rel(input, FastCan, "Sends features to", "arrays")
Rel(input, minibatch, "Sends samples to", "arrays")
Rel(narx_input, make_narx, "Sends time series to", "arrays")
Rel(narx_input, NARX, "Sends time series to", "arrays")

Rel(cancorr_fast, output, "Sends selected indices to", "indices")
Rel(output, refine, "Sends selected indices to", "indices")
Rel(narx_fast, narx_output, "Sends prediction and gradients to", "arrays")
Rel(output, NARX, "Sends selected polynomial features to", "indices")

Rel(person, input, "Processes arrays using", "NumPy, scikit-learn pipeline")
Rel(person, narx_input, "Processes time series using", "NumPy, scikit-learn pipeline")

Container(optimizer, "SciPy Optimiser", "module", $tags="scipy", $descr="Minimises objective functions using prediction errors and gradients")
Rel(narx_output, optimizer, "Sends prediction and gradients to", "arrays")
Rel(optimizer, NARX, "Updates coefficients for", "arrays")

SHOW_LEGEND()
@enduml

API Reference#

fastcan

Fast canonical correlation analysis based search algorithm.

fastcan.narx

Nonlinear autoregressive exogenous (NARX) model for system identification.

fastcan.utils

Utils functions.

API Compatibility#

The API of this library is align with scikit-learn.

sklearn