Stefan Steinerberger, Yale University, Parabolic equations, expanders, t-SNE and data analysis
Sprache des Titels:
Englisch
Original Kurzfassung:
I will discuss two recent widely used algorithms in
data analysis (after introducing them, no knowledge necessary).
The emphasis is on mathematical ideas, not on algorithms
or applications.
(1) Spectral Clustering is based on building graphs on the data
and use the Laplacian Eigenfunctions as intrinsic coordinates.
One problem in practice is that building a graph is expensive.
We discuss novel probabilistic/combinatorial technique that
relate to expander graphs and percolation theory that yield
much \lq\lq better" graphs than commonly used constructions.
Joint work with G. Linderman, G. Mishne and Y. Kluger.
(2) t-SNE is a way to visualize massive amounts of data as
nice little clouds in $\mathbf R^2$. It is THE standard visualization technique
in biosciences (citation count $>$ 3000). Despite this, no mathematical
theory existed. I will present an interpretation of the algorithm
as the evolution of a uniformly parabolic discrete system with
large noise -- this proves convergence and, what's really nice,
tells you how to make the algorithm much better. Joint work
with G. Linderman.