For unit-length vectors, both the cosine similarity and Euclidean distance measures can be used for ranking with the same order. In fact, you can directly convert between the two.
This a pretty simple and well-known property that is utilized in many machine learning packages that utilize embeddings, which are just vectors. My own package Magnitude uses it as well when ranking results retrieved with k-NN. It’s intuitive if you are comfortable with manipulating vectors and can visualize what cosine similarity and Euclidean distance means for unit-length normalized vectors.
Every now and then I am asked about this and I suprisingly am never able to point someone, who might not understand this property, to a web page describing it as Google gives no good search results for an explainer.
Below, I’ll describe what the relationship is and how to convert from Euclidean distance to cosine similarity (and vice versa). Hopefully it’ll rank on Google so others searching for it can find a useful resource.
Intuition
The Euclidean distance is simply the distance between two vectors $x_1$ and $x_2$ in some $k$-dimensional hyperspace. It is the $2\text{D}$-distance formula, typically taught in an algebra class, generalized to $k$ dimensions.
Cosine similarity can be thought of as figuring out how similar two vectors are based on the angle $\theta$ between them. The similarity is measured by $\cos(\theta)$, which is:
- $1$ when $\theta = 0^{\circ}$
- $0$ when $\theta = 90^{\circ}$
- $-1$ when $\theta = 180^{\circ}$
From these two definitions it now should make intuitive sense why, for vectors that are not unit-length normalized, these two measures would give a different relative rank orderings.
Cosine similarity completely ignores the magnitudes of vectors and only looks as the angle between them. If there was a vector with a large magnitude in the same direction as as a vector with a short magnitude, they would have a cosine similarity of $1$, even though the Euclidean distance would be quite large.
But this says nothing about their relationship for unit-length vectors.
Conversion
Imagining unit-length vectors, we know that the Euclidean distance will be $0$ when the $\theta = 0$ and that the Euclidean distance will be $2$ when the $\theta = 180^{\circ}$.
Here is how you convert from Euclidean distance $d$ ($0$ to $2$) to cosine similarity $s$ ($-1$ to $1$):
$$ s = 1 - \left(\frac{d^2}{2}\right) $$
Here is how you convert from cosine similarity $s$ ($-1$ to $1$) to Euclidean distance $d$ ($0$ to $2$):
$$ d = \sqrt{2 - 2s} $$
Because the transformation between these two measures is monotonic they will both give the same ordering when used to rank.
Finally, here is how you convert between them in Python:
def convert_to_sim(dist):
return 1 - ((dist ** 2) / 2.0)
def convert_to_dist(sim):
return (2 - (2 * sim)) ** .5