Matrix product trace differentiation

In this appendix, I provide proof for two statements:

\( (1) \) \( \frac{d}{d\boldsymbol{A}} \left( tr\left( \boldsymbol{AB} \right) \right) = \boldsymbol{B}^{T} \)
\( (2) \) \( \frac{d}{d\boldsymbol{A}} \left( tr\left( \boldsymbol{ABA}^{T} \right) \right) = 2\boldsymbol{AB} \) (for symmetric \( \boldsymbol{B} \))

Statement 1

Given two matrices \( \boldsymbol{A} (m\times n)\) and \( \boldsymbol{B} (n\times m)\). The product of two matrices:

\[ \boldsymbol{AB}= \left[ \begin{matrix} a_{11} & \cdots & a_{1n} \\ \vdots & \ddots & \vdots \\ a_{m1} & \cdots & a_{mn} \\ \end{matrix} \right] \left[ \begin{matrix} b_{11} & \cdots & b_{1m} \\ \vdots & \ddots & \vdots \\ b_{n1} & \cdots & b_{nm} \\ \end{matrix} \right] = \left[ \begin{matrix} \sum_{i=1}^{n}a_{1i}b_{i1} & \cdots & \sum_{i=1}^{n}a_{1i}b_{im} \\ \vdots & \ddots & \vdots \\ \sum_{i=1}^{n}a_{mi}b_{i1} & \cdots & \sum_{i=1}^{n}a_{mi}b_{im} \\ \end{matrix} \right] \]

The trace of \( \boldsymbol{AB} \) is the sum of the main diagonal:

\[ tr(\boldsymbol{AB}) = \sum_{i=1}^{n}a_{1i}b_{i1} + \cdots + \sum_{i=1}^{n}a_{mi}b_{im} = \sum_{i=1}^{n}\sum_{j=1}^{m}a_{ji}b_{ij} \]

Differentiate using the function of gradient:

\[ \frac{\partial f(\boldsymbol{X})}{\partial\boldsymbol{X}} = \left[ \begin{matrix} \frac{\partial f(\boldsymbol{X})}{\partial x_{11}} & \cdots & \frac{\partial f(\boldsymbol{X})}{\partial x_{1n}} \\ \vdots & \ddots & \vdots \\ \frac{\partial f(\boldsymbol{X})}{\partial x_{m1}} & \cdots & \frac{\partial f(\boldsymbol{X})}{\partial x_{mn}} \\ \end{matrix} \right] \]

\[ \frac{\partial tr(\boldsymbol{AB})}{\partial\boldsymbol{A}} = \left[ \begin{matrix} \frac{\partial (\sum_{i=1}^{n}\sum_{j=1}^{m}a_{ji}b_{ij})}{\partial a_{11}} & \cdots & \frac{\partial (\sum_{i=1}^{n}\sum_{j=1}^{m}a_{ji}b_{ij})}{\partial a_{1n}} \\ \vdots & \ddots & \vdots \\ \frac{\partial (\sum_{i=1}^{n}\sum_{j=1}^{m}a_{ji}b_{ij})}{\partial a_{m1}} & \cdots & \frac{\partial (\sum_{i=1}^{n}\sum_{j=1}^{m}a_{ji}b_{ij})}{\partial a_{mn}} \\ \end{matrix} \right] = \]

\[ = \left[ \begin{matrix} b_{11} & \cdots & b_{n1} \\ \vdots & \ddots & \vdots \\ b_{1m} & \cdots & b_{nm} \\ \end{matrix} \right] = \boldsymbol{B}^{T} \]

Statement 2

Given two matrices \( \boldsymbol{A} (m\times n)\) and \( \boldsymbol{B} (n\times n)\).

\[ \boldsymbol{ABA}^{T} = \left[ \begin{matrix} a_{11} & \cdots & a_{1n} \\ \vdots & \ddots & \vdots \\ a_{m1} & \cdots & a_{mn} \\ \end{matrix} \right] \left[ \begin{matrix} b_{11} & \cdots & b_{1n} \\ \vdots & \ddots & \vdots \\ b_{n1} & \cdots & b_{nn} \\ \end{matrix} \right] \left[ \begin{matrix} a_{11} & \cdots & a_{m1} \\ \vdots & \ddots & \vdots \\ a_{1n} & \cdots & a_{mn} \\ \end{matrix} \right] = \]

\[ = \left[ \begin{matrix} \sum_{i=1}^{n}a_{1i}b_{i1} & \cdots & \sum_{i=1}^{n}a_{1i}b_{in} \\ \vdots & \ddots & \vdots \\ \sum_{i=1}^{n}a_{mi}b_{i1} & \cdots & \sum_{i=1}^{n}a_{mi}b_{in} \\ \end{matrix} \right] \left[ \begin{matrix} a_{11} & \cdots & a_{m1} \\ \vdots & \ddots & \vdots \\ a_{1n} & \cdots & a_{mn} \\ \end{matrix} \right] = \]

\[ = \left[ \begin{matrix} \sum_{j=1}^{n}\sum_{i=1}^{n}a_{1i}b_{ij}a_{1j} & \cdots & \sum_{j=1}^{n}\sum_{i=1}^{n}a_{1i}b_{ij}a_{mj} \\ \vdots & \ddots & \vdots \\ \sum_{j=1}^{n}\sum_{i=1}^{n}a_{mi}b_{ij}a_{1j} & \cdots & \sum_{j=1}^{n}\sum_{i=1}^{n}a_{mi}b_{ij}a_{mj} \\ \end{matrix} \right] \]

The trace of \( \boldsymbol{ABA}^{T} \) is the sum of the main diagonal:

\[ tr(\boldsymbol{ABA}^{T}) = \sum_{j=1}^{n}\sum_{i=1}^{n}a_{1i}b_{ij}a_{1j} + \cdots + \sum_{j=1}^{n}\sum_{i=1}^{n}a_{mi}b_{ij}a_{mj} = \sum_{k=1}^{m}\sum_{j=1}^{n}\sum_{i=1}^{n}a_{ki}b_{ij}a_{kj} \]

\[ \frac{\partial tr(\boldsymbol{ABA}^{T})}{\partial\boldsymbol{A}} = \left[ \begin{matrix} \frac{\partial (\sum_{k=1}^{n}\sum_{j=1}^{n}\sum_{i=1}^{n}a_{ki}b_{ij}a_{kj})}{\partial a_{11}} & \cdots & \frac{\partial (\sum_{k=1}^{n}\sum_{j=1}^{n}\sum_{i=1}^{n}a_{ki}b_{ij}a_{kj})}{\partial a_{1n}} \\ \vdots & \ddots & \vdots \\ \frac{\partial (\sum_{k=1}^{n}\sum_{j=1}^{n}\sum_{i=1}^{n}a_{ki}b_{ij}a_{kj})}{\partial a_{m1}} & \cdots & \frac{\partial (\sum_{k=1}^{n}\sum_{j=1}^{n}\sum_{i=1}^{n}a_{ki}b_{ij}a_{kj})}{\partial a_{mn}} \\ \end{matrix} \right] = \]

\[ = \left[ \begin{matrix} \sum_{j=1}^{n}b_{1j}a_{1j} + \sum_{i=1}^{n}a_{1i}b_{i1} & \cdots & \sum_{j=1}^{n}b_{nj}a_{1j} + \sum_{i=1}^{n}a_{1i}b_{in} \\ \vdots & \ddots & \vdots \\ \sum_{j=1}^{n}b_{1j}a_{mj} + \sum_{i=1}^{n}a_{mi}b_{i1} & \cdots & \sum_{j=1}^{n}b_{nj}a_{mj} + \sum_{i=1}^{n}a_{mi}b_{in} \\ \end{matrix} \right] = \]

\[ = \left[ \begin{matrix} \sum_{j=1}^{n}a_{1j}b_{1j} & \cdots & \sum_{j=1}^{n}a_{1j}b_{nj} \\ \vdots & \ddots & \vdots \\ \sum_{j=1}^{n}a_{mj}b_{1j} & \cdots & \sum_{j=1}^{n}a_{mj}b_{nj} \\ \end{matrix} \right] + \left[ \begin{matrix} \sum_{i=1}^{n}a_{1i}b_{i1} & \cdots & \sum_{i=1}^{n}a_{1i}b_{in} \\ \vdots & \ddots & \vdots \\ \sum_{i=1}^{n}a_{mi}b_{i1} & \cdots & \sum_{i=1}^{n}a_{mi}b_{in} \\ \end{matrix} \right] = \]

\[ = \left[ \begin{matrix} a_{11} & \cdots & a_{1n} \\ \vdots & \ddots & \vdots \\ a_{m1} & \cdots & a_{mn} \\ \end{matrix} \right] \left[ \begin{matrix} b_{11} & \cdots & b_{n1} \\ \vdots & \ddots & \vdots \\ b_{1n} & \cdots & b_{nn} \\ \end{matrix} \right] + \left[ \begin{matrix} a_{11} & \cdots & a_{1n} \\ \vdots & \ddots & \vdots \\ a_{m1} & \cdots & a_{mn} \\ \end{matrix} \right] \left[ \begin{matrix} b_{11} & \cdots & b_{1n} \\ \vdots & \ddots & \vdots \\ b_{n1} & \cdots & b_{nn} \\ \end{matrix} \right] = \]

\[ = \boldsymbol{AB}^{T} + \boldsymbol{AB} \]

If \( \boldsymbol{B} \) is symmetric, \( \boldsymbol{B} = \boldsymbol{B}^{T}\):

\[ \frac{\partial tr(\boldsymbol{ABA}^{T})}{\partial\boldsymbol{A}} = \boldsymbol{AB}^{T} + \boldsymbol{AB} = \boldsymbol{AB} + \boldsymbol{AB} = 2\boldsymbol{AB} \]

Back