Perturbation Theory

In DESC, we employ perturbation techniques to find neighboring solutions. The basic ideas and outline of the relevant equations are given below. For full details and examples see our paper

Suppose we have a vector valued function \(f(x,c)\), where we can interpret \(f\) as a measure of equilibrium, \(x\) as the optimization vector and \(c\) as some additional parameter.

Further suppose that we have \(f(x,c) = 0\), so that we are at an equilibrium. We can then ask “if we change the parameter \(c\), how does \(x\) need to change to maintain equilibrium?”

We can expand \(f\) in a Taylor series:

$ \begin{eqnarray} f(x+\Delta x, c+\Delta c) &= f(x,c) + \frac{df}{dx}\Delta x + \frac{df}{dc}\Delta c \\ &+ \frac{1}{2}\frac{d^2f}{dx^2}\Delta x \Delta x + \frac{1}{2}\frac{d^2f}{dc^2}\Delta c \Delta c + \frac{d^2f}{dxdc} \Delta x \Delta c \\ &+ \frac{1}{6}\frac{d^3f}{dx^3}\Delta x \Delta x \Delta x + \frac{1}{6}\frac{d^3f}{dc^3}\Delta c \Delta c \Delta c + \frac{1}{2}\frac{d^3f}{dx^2 dc}\Delta x \Delta x \Delta c + \frac{1}{2} \frac{d^3f}{dxdc^2}\Delta x \Delta c \Delta c + ... \end{eqnarray} $

Where we assume that we know \(\Delta c\), the change in the parameters, and we want to find \(\Delta x\)

In general, this is a tensor polynomial equation for \(\Delta x\) which is difficult/impossible to solve (ie, just storing \(\frac{d^2f}{dx^2}\) and \(\frac{d^3f}{dx^3}\) in memory could take hundreds or thousands of GB). To get around this, we can further expand \(\Delta x\) and \(\Delta c\) in a perturbation series in powers of \(\epsilon\):

\(\Delta x = \epsilon x_1 + \epsilon^2 x_2 + \epsilon^3 x_3\)

\(\Delta c = \epsilon c_1\)

Plugging this expansion in we get:

$ \begin{eqnarray} f(x+\Delta x, c+\Delta c) &= f(x,c) + \frac{df}{dx}\epsilon x_1 + \frac{df}{dx}\epsilon^2 x_2 + \frac{df}{dx}\epsilon^3 x_3 + \frac{df}{dc}\epsilon c_1 \\ &+ \frac{1}{2}\frac{d^2f}{dx^2}(\epsilon^2 x_1^2 + 2 \epsilon^3 x_1 x_2 + 2 \epsilon^4 x_1 x_3) + \frac{1}{2}\frac{d^2f}{dc^2} (\epsilon^2 c_1^2) + \frac{d^2f}{dxdc} (\epsilon^2 x_1 c_1 + \epsilon^3 x_2 c_1 + \epsilon^4 x_3 c_1) \\ &+ \frac{1}{6}\frac{d^3f}{dx^3}(\epsilon^3 x_1^3 + 3 \epsilon^4 x_1^2 x_2 + 3 \epsilon^5 x_1^2 x_3) + \frac{1}{6}\frac{d^3f}{dc^3}\epsilon^3 c_1^3 + \frac{1}{2}\frac{d^3f}{dx^2 dc}(\epsilon^3 x_1^2 c_1 + 2 \epsilon^4 x_1 x_2 c_1 + 2 \epsilon^5 x_1 x_3 c_1) + \frac{1}{2}\frac{d^3f}{dxdc^2}(\epsilon^3 x_1 c_1^2 + \epsilon^4 x_2 c_1^2 + \epsilon^5 x_3 c_1^2) \end{eqnarray} $

Collecting terms of order \(\epsilon\) and setting equal to zero gives the first order term \(x_1\):

\(0 = f(x,c) + \frac{df}{dx}\epsilon x_1 + \frac{df}{dc}\epsilon c_1\)

\(x_1 = - \left(\frac{df}{dx}\right)^{-1} \left(\frac{df}{dc}c_1 \right)\)

Where the inverse is meant in the least squares sense.

Similary for 2nd order, \(\epsilon^2\):

\(0 = \frac{df}{dx}\epsilon^2 x_2 + \frac{1}{2}\frac{d^2f}{dx^2}\epsilon^2 x_1^2 + \frac{1}{2}\frac{d^2f}{dc^2}\epsilon^2 c_1^2 + \frac{d^2f}{dxdc}\epsilon^2 x_1 c_1\)

\(x_2 = -\left( \frac{df}{dx} \right)^{-1} \left(\frac{1}{2}\frac{d^2f}{dx^2} x_1^2 + \frac{1}{2}\frac{d^2f}{dc^2} c_1^2 + \frac{d^2f}{dxdc} x_1 c_1 \right)\)

Order \(\epsilon^3\):

\(0 = \frac{df}{dx}\epsilon^3 x_3 + \frac{d^2f}{dx^2} \epsilon^3 x_1 x_2 + \frac{d^2f}{dxdc}\epsilon^3 x_2 c_1 + \frac{1}{6}\frac{d^3f}{dx^3}\epsilon^3 x_1^3 + \frac{1}{6}\frac{d^3f}{dc^3}\epsilon^3 c_1^3 + \frac{1}{2}\frac{d^3f}{dx^2 dc}\epsilon^3 x_1^2 c_1 + \frac{1}{2}\frac{d^2f}{dx dc^2}\epsilon^3 x_1 c_1^2\)

\(x_3 = - \left( \frac{df}{dx} \right) ^{-1} \left( \frac{d^2f}{dx^2} x_1 x_2 + \frac{d^2f}{dxdc}x_2 c_1 + \frac{1}{6}\frac{d^3f}{dx^3}x_1^3 + \frac{1}{6}\frac{d^3f}{dc^3} c_1^3 + \frac{1}{2}\frac{d^3f}{dx^2 dc} x_1^2 c_1 + \frac{1}{2}\frac{d^2f}{dx dc^2} x_1 c_1^2 \right)\)

The advantage of this approach is that the higher order derivatives only ever appear multiplied by known quantities, so those terms can be efficiently computed as jacobian vector products (effectively a directional derivative) without needing to calculate the full derivative tensors. Additionally, the only operator that needs to be inverted is \(\frac{df}{dx}\), which is the regular jacobian of the objective and is already computed and inverted during optimization.