# Mapping between two Gaussians using optimal transport and the KL-divergence

Suppose you have two multivariate Gaussian distributions $S$ and $T$, parameterized as $N(\mu_S, \Sigma_S)$ and $N(\mu_T, \Sigma_T)$. How do you linearly transform $x \sim S$ so that the transformed vectors have distribution $T$? Is there an optimal way to do this? The field of optimal transport (OT) provides an answer. If we choose the transport cost as the type-2 Wasserstein distance $W^2_2$ between probability measures, then we apply the following linear function: $x \rightarrow \mu_T + A(x-\mu_S) = Ax + (\mu_T - A\mu_Q)$

where $A =\Sigma^{-1/2}_S (\Sigma^{1/2}_S \Sigma_T \Sigma^{1/2}_S )^{1/2}\Sigma^{-1/2}_S = A^\top.$

For more details, see Remark 2.31 in “Computational Optimal Transport” by Peyre & Cuturi (available on arXiv here).

But we might instead want to find the transformation which minimizes the Kullback-Leibler divergence between $T$ and the transformed $S$. We will use the fact that the transformed vector will also come from a Gaussian distribution, with mean and covariance given by $E[Mx+b] = ME[x]+b = M\mu_S + b$ and $Cov[Mx+b] = M Cov[x] M^\top = M\Sigma_S M^\top$.

We then set up an optimization problem: $\min_{M, b} D_{KL}(N(\mu_T, \Sigma_T) || N(M\mu_S + b, M\Sigma_S M^\top))$

This leads to the following nasty-looking objective: $\min_{M, b} \log(|M \Sigma_S M^\top|) - \log(|\Sigma_T|) + tr([M \Sigma_S M^\top]^{-1} \Sigma_T) + (M\mu_S + b - \mu_T)^\top [M \Sigma_S M^\top]^{-1} (M\mu_S + b - \mu_T)$

But we don’t actually need to work through all this algebra, because the optimal transport solution also minimizes the KL-divergence. The KL-divergence $D_{KL}(P || Q)$ reaches a minimum of 0 when $P$ and $Q$ are equal, so we only need to verify that the first optimal transport transformation produces samples with distribution $T$.

First checking the mean, we verify that $E[\mu_T + A(x-\mu_S)] = \mu_T + A(\mu_S-\mu_S) = \mu_T.$ Next, checking the covariance, we have $Cov[\mu_T + A(x-\mu_S)] = A Cov[x] A^\top = A Cov[x] A = A \Sigma_S A = \\ \Sigma^{-1/2}_S (\Sigma^{1/2}_S \Sigma_T \Sigma^{1/2}_S )^{1/2}\Sigma^{-1/2}_S \Sigma_S \Sigma^{-1/2}_S (\Sigma^{1/2}_S \Sigma_T \Sigma^{1/2}_S )^{1/2}\Sigma^{-1/2}_S = \Sigma_T$.

We’ve verified that $\mu_T + A(x-\mu_S) \sim N(\mu_T, \Sigma_T)$, which means that our optimal transport solution also gives us the KL-divergence minimizer.

I’m using this fact in my ongoing research on domain adaptation under confounding. See the arXiv preprint here.