Hubness? has recently been identified as a general problem of high dimensional data spaces, manifesting
itself in the emergence of objects, so-called hubs, which tend to be among the k nearest
neighbors of a large number of data items. As a consequence many nearest neighbor relations in
the distance space are asymmetric, that is, object y is amongst the nearest neighbors of x but not
vice versa. The work presented here discusses two classes of methods that try to symmetrize nearest
neighbor relations and investigates to what extent they can mitigate the negative effects of hubs.
We evaluate local distance scaling and propose a global variant which has the advantage of being
easy to approximate for large data sets and of having a probabilistic interpretation. Both local and
global approaches are shown to be effective especially for high-dimensional data sets, which are
affected by high hubness. Both methods lead to a strong decrease of hubness in these data sets,
while at the same time improving properties like classification accuracy. We evaluate the methods
on a large number of public machine learning data sets and synthetic data. Finally we present a
real-world application where we are able to achieve significantly higher retrieval quality.