Stefan Rass, Sandra König, Shahzad Ahmad, Maksim Goman,
"Metricizing the Euclidean Space towards Desired Distance Relations in Point Clouds"
, arXiv.org, Ithaca, New York, United States, 11-2022
Original Titel:
Metricizing the Euclidean Space towards Desired Distance Relations in Point Clouds
Sprache des Titels:
Englisch
Original Kurzfassung:
Given a set of points in the Euclidean space $\R^\ell$ with $\ell>1$, the pairwise distances between the points are determined by their spatial location and the metric $d$ that we endow $\R^\ell$ with. Hence, the distance $d(\vec x,\vec y)=\delta$ between two points is fixed by the choice of $\vec x$ and $\vec y$ and $d$. We study the related problem of fixing the value $\delta$, and the points $\vec x,\vec y$, and ask if there is a topological metric $d$ that computes the desired distance $\delta$. We demonstrate this problem to be solvable by constructing a metric to simultaneously give desired pairwise distances between up to $O(\sqrt\ell)$ many points in $\R^\ell$. In particular, these distances can be chosen independent of any ``natural'' distance between the given points, such as Euclidean or others. Towards dropping the limit on how many points (at fixed locations) we can put into desired distance from one another, we then introduce the notion of an \emph{$\eps$-semimetric} $\tilde{d}$. This function has all properties of a metric, but allows violations of the triangle inequality up to an additive error $<\eps$. With this (mild) generalization of a topological metric, we formulate our main result: for all $\eps>0$, for all $m\geq 1$, for any choice of $m$ points $\vec y_1,\ldots,\vec y_m\in\R^\ell$, and all chosen sets of values $\set{\delta_{ij}\geq 0: 1\leq i<j\leq m}$, there exists an $\eps$-semimetric $\tilde{\delta}:\R^\ell\times \R^\ell\to\R$ such that $\tilde{d}(\vec y_i,\vec y_j)=\delta_{ij}$, i.e., the desired distances are accomplished, irrespectively of the topology that the Euclidean or other norms would induce. The order of quantifiers is important here: we first can choose the accuracy $\eps$ by which our semi-metric may be violate the triangle inequality (while leaving the other metric axioms to hold as usual for $\tilde{d}$), then fix the spatial locations of points, and after that step, choose the distances that we wish between our points. We showcase our results by using them to ``attack'' unsupervised learning algorithms, specifically $k$-Means and density-based (DBSCAN) clustering algorithms. These have manifold applications in artificial intelligence, and letting them run with externally provided distance measures constructed in the way as shown here, can make clustering algorithms produce results that are pre-determined and hence malleable. This demonstrates that the results of clustering algorithms may not generally be trustworthy, unless there is a standardized and fixed prescription to use a specific distance function.