Suppose you have two multivariate Gaussian distributions and , parameterized as and . How do you linearly transform so that the transformed vectors have distribution ? Is there an optimal way to do this? The field of optimal transport (OT) provides an answer. If we choose the transport cost as the type-2 Wasserstein distance between probability measures, then we apply the following linear function:

where

For more details, see Remark 2.31 in “Computational Optimal Transport” by Peyre & Cuturi (available on arXiv here).

But we might instead want to find the transformation which minimizes the Kullback-Leibler divergence between and the transformed . We will use the fact that the transformed vector will also come from a Gaussian distribution, with mean and covariance given by

and .

We then set up an optimization problem:

This leads to the following nasty-looking objective:

But we don’t actually need to work through all this algebra, because the optimal transport solution also minimizes the KL-divergence. The KL-divergence reaches a minimum of 0 when and are equal, so we only need to verify that the first optimal transport transformation produces samples with distribution .

First checking the mean, we verify that Next, checking the covariance, we have

.

We’ve verified that , which means that our optimal transport solution also gives us the KL-divergence minimizer.

I’m using this fact in my ongoing research on domain adaptation under confounding. See the arXiv preprint here.