# The Newton Schulz iteration for matrix inversion

The Newton Schulz method is well-known, and the proof of convergence is widely available on the internet. Yet the derivation of the method itself is more obscure. Here it is:

We seek the zero of $f: \mathbb{R}^2 \rightarrow \mathbb{R}^2$, defined as follows: $\begin{array}{l} f(X) = X^{-1} - A. \end{array}$

The derivative of $f$ at $X$, applied to matrix $B$, operates on $B$ as follows: $\begin{array}{l} f'(X)[B] = -X^{-1} B X^{-1}. \end{array}$

We can prove that $f'^{-1}$ at $X$, applied to matrix $B$, operates on $B$ as follows: $\begin{array}{l} f'^{-1}(X)[B] = -X B X. \end{array}$

To see this, notice that $\begin{array}{l} B \\ = f'^{-1}(X)\Big[f'(X)[B]\Big] \\ = -X \Big[-X^{-1} B X^{-1}\Big] X \\ = B. \end{array}$

The Newton method for root finding has at each iterate: $\begin{array}{l} X_{t+1} \\ = X_t - f'^{-1}(X_t)\Big[f(X_t)\Big] \\ = X_t - f'^{-1}(X_t)\Big[X^{-1} - A\Big] \\ = X_t - X_t[X^{-1}_t-A] X_t \\ = X_t - [-X_t + X_t A X_t] \\ = 2 X_t - X_t A X_t \end{array}$