The Kalman Gain

The final equation is the Kalman Gain Equation.

The Kalman Gain in matrix notation is given by:

\[ \boldsymbol{K}_{n} = \boldsymbol{P}_{n,n-1}\boldsymbol{H}^{T}\left(\boldsymbol{HP}_{n,n-1}\boldsymbol{H}^{T} + \boldsymbol{R}_{n} \right)^{-1} \]

Where:

\( \boldsymbol{K}_{n} \)	is the Kalman Gain
\( \boldsymbol{P}_{n,n-1} \)	is the prior estimate covariance matrix of the current state (predicted at the previous step)
\( \boldsymbol{H} \)	is the observation matrix
\( \boldsymbol{R}_{n} \)	is the measurement noise covariance matrix

Example-driven guide to Kalman Filter

Get the book

Intuitive Explanation of the Kalman Gain

Let's start by examining the 1D state update equation:

\[ \hat{x}_{n,n} = \hat{x}_{n,n-1} + {k}_{n}({z}_{n} - \hat{x}_{n,n-1}) \]

This equation shows how the updated state estimate \( \hat{x}_{n,n} \) is obtained by adjusting the prior estimate \( \hat{x}_{n,n-1} \) using the innovation (the difference between the measurement \( {z}_{n} \) and the prior estimate). The Kalman Gain \( {k}_{n} \) determines how much we trust the measurement versus the prior estimate.

Extending to the Multivariate Case

In the one-dimensional case, the system state \( x_{n} \) and the measurement \( z_{n} \) lie in the same domain. For example, if \( x_{n} \) represents the true height of a building, then \( z_{n} \) is a direct measurement of that same height.

In the multivariate case, things are a bit different. The measurement vector \( \boldsymbol{{z}}_{n} \) and the system state vector \( \boldsymbol{x}_{n} \) often exist in different domains. Thus, we cannot directly subtract the two to compute the innovation. Instead, we must project the system state into the measurement domain using the observation matrix \( \boldsymbol{H} \):

\[ \text{innovation} = (\boldsymbol{{z}}_{n} - \boldsymbol{H}\boldsymbol{\hat{x}}_{n,n-1}) \]

As discussed in the Measurement Equation chapter, the observation matrix \( \boldsymbol{H} \) maps the system state vector \( \boldsymbol{{x}}_{n} \) from the state domain into the measurement domain.

For example, consider a vertically launched rocket whose state is defined by its altitude \( h_{n} \) and vertical velocity \( v_{h_{n}} \). The system state vector is:

\[ x_n = \begin{bmatrix} h_n \\ v_{h_n} \end{bmatrix} \]

Suppose that only the altitude is measured. In this case, the measurement \( \boldsymbol{z}_{n} \) is related to the system state by:

\[ \boldsymbol{z}_{n} = \boldsymbol{H} \boldsymbol{x}_{n} + v_n = \begin{bmatrix} 1 & 0 \end{bmatrix} \begin{bmatrix} h_n \\ v_{h_n} \end{bmatrix} + v_n \]

Where:

\( \boldsymbol{H} = \begin{bmatrix} 1 & 0 \end{bmatrix} \)
\( v_{n} \) is a measurement noise.

Now let us take a closer look at the state update equation in the multidimensional case, as discussed in the State Update chapter.

\[ \boldsymbol{\hat{x}}_{n,n} = \boldsymbol{\hat{x}}_{n,n-1} + \boldsymbol{K}_{n}(\boldsymbol{{z}}_{n} - \boldsymbol{H}\boldsymbol{\hat{x}}_{n,n-1}) \]

Since the system state \( \boldsymbol{\hat{x}}_{n,n-1} \) and the measurement \( \boldsymbol{z}_{n} \) may lie in different domains, we project the state estimate into the measurement domain using the observation matrix \( \boldsymbol{H} \).

Let us examine the Kalman Gain in the one-dimensional case, as derived in the Kalman Filter in One Dimension chapter:

\[ k_{n} = \frac{p_{n,n-1}}{p_{n,n-1} + r_{n}} \]

Where:

\( p_{n,n-1} \) is a variance of the prior estimate \( \hat{x}_{n,n-1} \)
\( r_{n} \) is the variance of the measurement noise

This expression shows that the Kalman Gain \( k_{n} \) balances the uncertainty in the prior estimate against the uncertainty in the measurement. When the prior is more uncertain (i.e., \( p_{n,n-1} \) is large), the gain shifts more weight to the measurement. Conversely, when the measurement is noisy (i.e., \( r_{n} \)is large), the gain favors the prior estimate.

To make the filter optimal, we aim to minimize the variance \( p_{n,n} \) of the current (posterior) estimate \( \hat{x}_{n,n} \), which is computed based on the predicted (prior) estimate \( \hat{x}_{n,n-1} \) and the measurement \( \hat{z}_{n} \). The Kalman Gain \( k_{n} \) minimizes this posterior variance. The detailed derivation is provided in the Kalman Filter in One Dimension chapter.

Let us rewrite the one-dimensional Kalman Gain as follows:

\[ k_{n} = p_{n,n-1}(p_{n,n-1} + r_{n})^{-1} \]

In the multivariate case, we work with the prior estimate uncertainty (covariance) matrix \( \boldsymbol{P}_{n,n-1} \) and the measurement noise uncertainty (covariance) matrix \( \boldsymbol{R}_{n} \). We must project \( \boldsymbol{P}_{n,n-1} \) into the measurement domain before it can be combined with the measurement noise covariance \( \boldsymbol{R}_{n} \). This projection is achieved using the observation matrix \( \boldsymbol{H} \).

Since \( \boldsymbol{P}_{n,n-1} \) is a covariance matrix, and variance is a squared term, it must be projected using the transformation:

\[ \boldsymbol{H}\boldsymbol{P}_{n,n-1}\boldsymbol{H}^{T} \]

(See the proof in Chapter 7.5 of the book.)

This term represents the uncertainty of the predicted measurement, analogous to \( p_{n,n-1} \) in the one-dimensional case.

After projecting the prior estimate uncertainty into the measurement domain, the Kalman Gain* in the multivariate case takes the form:

\[ \color{green}{\boldsymbol{H}\boldsymbol{P}_{n,n-1}\boldsymbol{H}^{T}}(\color{green}{\boldsymbol{H}\boldsymbol{P}_{n,n-1}\boldsymbol{H}^{T}} + \color{blue}{\boldsymbol{R}_{n}})^{-1} \]

(* This is not the final form of the Kalman Gain)

Kalman Gain* multiplied by the innovation is:

\[ \color{green}{\boldsymbol{H}\boldsymbol{P}_{n,n-1}\boldsymbol{H}^{T}}(\color{green}{\boldsymbol{H}\boldsymbol{P}_{n,n-1}\boldsymbol{H}^{T}} + \color{blue}{\boldsymbol{R}_{n}})^{-1}(\boldsymbol{{z}}_{n} - \boldsymbol{H}\boldsymbol{\hat{x}}_{n,n-1}) \]

The expression above is in the measurement domain, but in the State Update Equation, we need to update the system state estimate \( \boldsymbol{\hat{x}}_{n,n-1} \), which resides in the state domain:

\[ \boldsymbol{\hat{x}}_{n,n} = \boldsymbol{\hat{x}}_{n,n-1} + \boldsymbol{K}_{n}(\boldsymbol{{z}}_{n} - \boldsymbol{H}\boldsymbol{\hat{x}}_{n,n-1}) \]

By removing the leading \( \boldsymbol{H} \) from the Kalman Gain* expression, we shift the result back into the state domain. The updated expression of the Kalman Gain multiplied by innovation becomes:

\[ \color{green}{\boldsymbol{P}_{n,n-1}\boldsymbol{H}^{T}}(\color{green}{\boldsymbol{H}\boldsymbol{P}_{n,n-1}\boldsymbol{H}^{T}} + \color{blue}{\boldsymbol{R}_{n}})^{-1}(\boldsymbol{{z}}_{n} - \boldsymbol{H}\boldsymbol{\hat{x}}_{n,n-1}) \]

Now we can express the state update equation in its full multivariate form:

\[ \boldsymbol{\hat{x}}_{n,n} = \boldsymbol{\hat{x}}_{n,n-1} + \color{red}{\boldsymbol{P}_{n,n-1}\boldsymbol{H}^{T}(\boldsymbol{H}\boldsymbol{P}_{n,n-1}\boldsymbol{H}^{T} + \boldsymbol{R}_{n})^{-1}}(\boldsymbol{{z}}_{n} - \boldsymbol{H}\boldsymbol{\hat{x}}_{n,n-1}) \]

The corresponding final form of the Kalman Gain is:

\[ \color{red}{\boldsymbol{K}_{n} = \boldsymbol{P}_{n,n-1}\boldsymbol{H}^{T}(\boldsymbol{H}\boldsymbol{P}_{n,n-1}\boldsymbol{H}^{T} + \boldsymbol{R}_{n})^{-1}} \]

This form ensures that the innovation term is properly mapped from the measurement domain back into the state domain, allowing for a consistent and optimal update of the system state estimate.

This section provides an intuitive understanding of the multivariate Kalman Gain. The following section presents its formal derivation.

Kalman Gain Equation Derivation

This chapter includes the derivation of the Kalman Gain Equation. You can jump to the next topic if you don't care about the derivation.

First, let's rearrange the Covariance Update Equation:

	Notes
\( \boldsymbol{P}_{n,n} = \left(\boldsymbol{I} - \boldsymbol{K}_{n}\boldsymbol{H} \right) \boldsymbol{P}_{n,n-1} \color{blue}{\left(\boldsymbol{I} - \boldsymbol{K}_{n}\boldsymbol{H} \right)^{T}} + \boldsymbol{K}_{n} \boldsymbol{R}_{n}\boldsymbol{K}_{n}^{T} \)	Covariance Update Equation
\( \boldsymbol{P}_{n,n} = \left(\boldsymbol{I} - \boldsymbol{K}_{n}\boldsymbol{H} \right) \boldsymbol{P}_{n,n-1} \color{blue}{\left(\boldsymbol{I} - \left(\boldsymbol{K}_{n}\boldsymbol{H}\right)^{T}\right)} + \boldsymbol{K}_{n} \boldsymbol{R}_{n} \boldsymbol{K}_{n}^{T} \)	\( \boldsymbol{I}^{T} = \boldsymbol{I} \)
\( \boldsymbol{P}_{n,n} = \color{green}{\left(\boldsymbol{I} - \boldsymbol{K}_{n}\boldsymbol{H} \right) \boldsymbol{P}_{n,n-1}} \color{blue}{\left(\boldsymbol{I} - \boldsymbol{H}^{T}\boldsymbol{K}_{n}^{T}\right)} + \boldsymbol{K}_{n} \boldsymbol{R}_{n} \boldsymbol{K}_{n}^{T} \)	Apply the matrix transpose property: \( (\boldsymbol{AB})^{T} = \boldsymbol{B}^{T}\boldsymbol{A}^{T} \)
\( \boldsymbol{P}_{n,n} = \color{green}{\left(\boldsymbol{P}_{n,n-1} - \boldsymbol{K}_{n}\boldsymbol{H}\boldsymbol{P}_{n,n-1} \right)} \left(\boldsymbol{I} - \boldsymbol{H}^{T}\boldsymbol{K}_{n}^{T}\right) + \boldsymbol{K}_{n} \boldsymbol{R}_{n} \boldsymbol{K}_{n}^{T} \)
\( \boldsymbol{P}_{n,n} = \boldsymbol{P}_{n,n-1} - \boldsymbol{P}_{n,n-1}\boldsymbol{H}^{T}\boldsymbol{K}_{n}^{T} - \boldsymbol{K}_{n}\boldsymbol{H}\boldsymbol{P}_{n,n-1} + \\ + \color{#7030A0}{\boldsymbol{K}_{n}\boldsymbol{H}\boldsymbol{P}_{n,n-1}\boldsymbol{H}^{T}\boldsymbol{K}_{n}^{T} + \boldsymbol{K}_{n} \boldsymbol{R}_{n} \boldsymbol{K}_{n}^{T} } \)	Expand
\( \boldsymbol{P}_{n,n} = \boldsymbol{P}_{n,n-1} - \boldsymbol{P}_{n,n-1}\boldsymbol{H}^{T}\boldsymbol{K}_{n}^{T} - \boldsymbol{K}_{n}\boldsymbol{H}\boldsymbol{P}_{n,n-1} + \\ + \color{#7030A0}{\boldsymbol{K}_{n} \left( \boldsymbol{H} \boldsymbol{P}_{n,n-1}\boldsymbol{H}^{T} + \boldsymbol{R}_{n} \right) \boldsymbol{K}_{n}^{T} } \)	Group the last two terms

The Kalman Filter is an optimal filter. Thus, we seek a Kalman Gain that minimizes the estimate variance.

To minimize the estimate variance, we need to minimize the main diagonal (from the upper left to the lower right) of the covariance matrix \( \boldsymbol{P}_{n,n} \).

The sum of the main diagonal of the square matrix is the trace of the matrix. Thus, we need to minimize \( tr(\boldsymbol{P}_{n,n}) \). To find the conditions required to produce a minimum, we differentiate the trace of \( \boldsymbol{P}_{n,n} \) with respect to \( \boldsymbol{K}_{n} \) and set the result to zero.

Get the book for the full derivation