5. Reduced polynomial NARX model for system identification#

NARX is a class to build a reduced polynomial NARX (Nonlinear AutoRegressive eXogenous) model for system identification. The structure of a polynomial NARX model is shown below:

\[y(k) = 0.5y(k-1) + 0.3u_0(k)^2 + 2u_0(k-1)u_0(k-3) + 1.5u_0(k-2)u_1(k-3) + 1\]

where \(k\) is the time index, \(u_0\) and \(u_1\) are input signals, and \(y\) is the output signal.

The NARX model is composed of three parts:

  1. Terms: \(y(k-1)\), \(u_0(k)^2\), \(u_0(k-1)u_0(k-3)\), and \(u_0(k-2)u_1(k-3)\)

  2. Coefficients: 0.5, 0.3, 2, and 1.5

  3. Intercept: 1

To build a reduced polynomial NARX model, we normally need the following steps:

  1. Build term library
    1. Get time shifted variables: e.g., \(u(t-2)\) and \(y(t-1)\)

    2. Get terms by nonlinearising the time shifted variables with polynomial basis, e.g., \(u_0(k-1)u_0(k-3)\) and \(u_0(k-2)u_1(k-3)\)

  2. Search useful terms by feature selection, i.e., Reduced Order Modelling (ROM)

  3. Learn the coefficients by least-squares

Examples

5.1. Multiple time series and missing values#

For time series modelling, the continuity of the data is important. The continuity means all neighbouring samples have the same time interval. A discontinuous time series can be treated as multiple pieces of continuous time series. The discontinuity can be caused by

  1. Different measurement sessions, e.g., one measurement session is carried out at Monday and lasts for one hour at 1 Hz, and another measurement is carried out at Tuesday and lasts for two hours at 1 Hz

  2. Missing values

As a result, there are two ways to inform NARX the training data from different measurement sessions. From version 0.5, the difference measurement sessions can be set by session_sizes in NARX.fit() and make_narx(). Alternatively, different measurement sessions can also be notified by inserting np.nan to break the data. The following examples show how to notify NARX the training data from different measurement sessions.

>>> import numpy as np
>>> x0 = np.zeros((3, 2)) # First measurement session
>>> x1 = np.ones((5, 2)) # Second measurement session
>>> max_delay = 2 # Assume the maximum delay for NARX model is 2
>>> # Insert (at least max_delay number of) np.nan to break the two measurement sessions
>>> u = np.r_[x0, [[np.nan, np.nan]]*max_delay, x1]
>>> # Or use session_sizes to break the two measurement sessions
>>> session_sizes = [3, 5]

Note

It is important to separate different measurement sessions using either method. Otherwise, the model will incorrectly assume that the time interval between two measurement sessions is the same as the sampling interval within a session.

Actually, the two methods will result in the exact same models when X is not None or multiple-step-ahead prediction is not used for training. If X is None, i.e. Auto-Regression models, and multiple-step-ahead prediction is used for training, only session_sizes can separate the two measurement sessions. Because when using multiple-step-ahead prediction, the optimiser relies on the missing values in inputs X rather than output y, while Auto-Regression models do not have input X.

5.2. One-step-ahead and multiple-step-ahead prediction#

In system identification, two types of predictions can be made by NARX models:

  1. One-step-ahead prediction: predict the output at the next time step given the measured input and the measured output at the past time steps, e.g., \(\hat{y}(k) = 0.5y(k-1) + 0.3u_0(k-1)^2\), where \(\hat{y}(k)\) is the predicted output at time step \(k\)

  2. Multiple-step-ahead prediction: predict the output at the next time step given the measured input and the predicted output at the past time steps, e.g., \(\hat{y}(k) = 0.5\hat{y}(k-1) + 0.3u_0(k-1)^2\)

Obviously, the multiple-step-ahead prediction is more challenging, because the predicted output is used as the input to predict the output at the next step, which may accumulate the error. The prediction method of NARX gives the multiple-step-ahead prediction.

It should also be noted the different types of predictions in model training.

  1. If assume the NARX model is one-step-ahead prediction structure, we know all input data for training in advance, and the model can be trained by the simple ordinary least-squares (OLS) method

  2. If assume the NARX model is a multiple-step-ahead prediction structure, the input data, like \(\hat{y}(k-1)\) is unknown in advance. Therefore, the training data must first be generated by the multiple-step-ahead prediction with the initial model coefficients, and then the coefficients can be updated recursively

5.2.1. ARX and OE model#

To better understand the two types of training, it is helpful to know two linear time series model structures, i.e., ARX (AutoRegressive eXogenous) model and OE (output error) model.

The ARX model structure is

\[(1-A(z))y(t) = B(z)u(t) + e(t)\]

where \(A(z)\) and \(B(z)\) are polynomials of \(z\), e.g., \(A(z) = 1.5 z^{-1} - 0.7 z^{-2}\), \(z\) is a complex number from Z-transformation and \(z^{-1}\) can be view as a time-shift operator, and \(e(t)\) is the white noise.

The OE model structure is

\[y(t) = \frac{B(z)}{(1-A(z))} u(t) + e(t)\]

The ARX model and OE model is very similar. If we rewrite ARX model to \(y(t) = \frac{B(z)}{(1-A(z))} u(t) + \frac{1}{(1-A(z))} e(t)\), we can see that both the ARX model and the OE model have the same transfer function \(\frac{B(z)}{(1-A(z))}\). The two key differences are:

  1. noise assumption

  2. best linear unbiased estimator (BLUE)

For noise assumption, the noise of the ARX model passes through a filter \(\frac{1}{(1-A(z))}\) while the noise of the OE model not. The structure of ARX models implies that the measurement noise is a very special colored noise, which is not common in practice.

For difference in BLUE, to get BLUE of models, we need to rewrite models to the form of

\[e(t) = y(t) - \hat{y}(t)\]

where \(\hat{y}(t)\) is a predictor, which describes how to compute the predicted output to make the error white. It is easy to find that for a ARX model

\[\hat{y}(t) = A(z)y(t) + B(z)u(t)\]

while for an OE model

\[\hat{y}(t) = \frac{B(z)}{(1-A(z))} u(t) = A(z)\hat{y}(t) + B(z)u(t)\]

The difference between the two predictors is the one for ARX models uses both measured input and output to make prediction, while the one for OE models only uses measured input to make prediction. In other words, the predictor of the BLUE for ARX computes one-step-ahead prediction, while the predictor of the BLUE for OE computes multiple-step-ahead prediction.

In summary, using multiple-step-ahead prediction in model training is more aligned with the real noise condition, but harder to train. However, using one-step-ahead prediction in model training is easier to train, but the noise assumption is too strong to be true. The fit method of NARX uses the one-step-ahead prediction for training by default, but the multiple-step-ahead prediction can be used by setting the parameter coef_init.

Examples