Back
A standard problem in Computational Biology is the visualization of high dimensional data for making medical decisions. This is typically done by embedding the high dimensional data into two dimensional space. t-SNE is the standard method for doing this. We show on artificial and natural data that t-SNE has a number of problems: It does not exhibit basic invariances that any good embedding method should have. We describe a new method called t-ETE for finding a low-dimensional embedding. We formulate the embedding problem as a joint ranking problem over a set of triplets, where each triplet captures the relative similarities between three objects in the set. Using recent advances in robust ranking, t-ETE produces high-quality embeddings even in the presence of a significant amount of noise and outliers and better preserves local scale and basic invariance properties. In particular, our method produces significantly better results than t-SNE on a wide range of signature datasets while also being faster to compute. Joint work with Ehsan Amid, Nikos Vlassis and John Vivian.
Manfred K. Warmuth (UC Santa Cruz)
Professor