Suppose you have two multivariate Gaussian distributions and
, parameterized as
and
. How do you linearly transform
so that the transformed vectors have distribution
? Is there an optimal way to do this? The field of optimal transport (OT) provides an answer. If we choose the transport cost as the type-2 Wasserstein distance
between probability measures, then we apply the following linear function:
where
For more details, see Remark 2.31 in “Computational Optimal Transport” by Peyre & Cuturi (available on arXiv here).
But we might instead want to find the transformation which minimizes the Kullback-Leibler divergence between and the transformed
. We will use the fact that the transformed vector will also come from a Gaussian distribution, with mean and covariance given by
and
.
We then set up an optimization problem:
This leads to the following nasty-looking objective:
But we don’t actually need to work through all this algebra, because the optimal transport solution also minimizes the KL-divergence. The KL-divergence reaches a minimum of 0 when
and
are equal, so we only need to verify that the first optimal transport transformation produces samples with distribution
.
First checking the mean, we verify that Next, checking the covariance, we have
.
We’ve verified that , which means that our optimal transport solution also gives us the KL-divergence minimizer.
I’m using this fact in my ongoing research on domain adaptation under confounding. See the arXiv preprint here.