Stochastic neighbor embedding (SNE) is a method of dimensionality reduction that involves softmax similarities measured between all pairs of data points. To build a suitable embedding, SNE tries to reproduce in a low-dimensional space the similarities that are observed in the high-dimensional data space. Previous work has investigated the immunity of such similarities to norm concentration, as well as enhanced cost functions. This paper proposes an additional refinement, in the form of multiscale similarities, namely weighted sum of softmax similarities with growing bandwidths. The objective is to maximize the embedding quality on all scales, with a better preservation of both local and global neighborhoods, and also to exempt the user from having to fix a scale arbitrarily.
Experiments on several data sets show that this multiscale version of SNE, combined with an appropriate cost function (sum of Jensen-Shannon divergences), outperforms all previous variants of SNE.