You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Measures that we use to measure distances (ex: between images in nearest neighbors) include $L_p$ distances.
From real analysis, we know a metric is a bivariate operator $d: X \times Y \mapsto \mathbb{R}^n$ that is nonnegative, has identity, symmetry, and obeys the triangle inequality. AKA
$d(a,b) \geq 0$
$d(a,b) = 0 \iff a = b$
$d(a,b) = a(b,a)$
$d(a,b) \leq d(a,c) + d(b,c)$
We call $d$pseudometric if it is nonnegative, has symmetry, and obeys the triangle inequality.
We call $d$quasimetric if it obeys all conditions except symmetry.
$L_p$ Distances
Then distances we often use in vector math are $L_p$ distances.
You can prove that $L_p$ distances are valid metrics, like this:
Notice that for any $p \in \mathbb{N}$ that you replace in the formula for $L_p$, we have nonnegativity, identity, symmetry, and triangle inequality.
So, $L_p := d(a,b)_p$
Distance Example 01
If we have two images and we want to perform a nearest neighbors classification on them, we use the $L_1$ distance, treating the images as matrices and pixels as entires. Then
$L_1 = d(a,b)_{1}$, so
$$L_1 = \lVert a - b \rVert_1 = \sum_{i=1}^{n}{a_i - b_i}$$
where $a_i \in a, b_i \in b$ are pixel values of image 1 and image 2, respectively.
Distance Example 02
The Euclidian distance is $L_2$, or the vector norm linear algebra.
Hyperparameter Tuning
When training something like a k nearest neighbors classifier, we have a choice for what k is.
Tuning this $k$, as well as tuning the choice for the distance metric $L_p$, is called tuning a hyperparameter.
Usually, selecting the best parameter is a matter of trial and error. You try training the classifier on several hyperparameter combinations before deciding which configuration is best.
Function Gradients
Recall from multivariable calculus that the gradient of a multivariate function $f$ is the vector $\nabla f(a)$ whose compoenents are partial derivaties of $f$ with respect to each variable:
We can also define the gradient on vector-valued functions in an alternative way (this one was popular in problem sets for my real analysis class) which more clearly shows the gradient as a derivative.
Say $f$ maps a column vector $a = (a_1, \dots a_n)^{\intercal}$ from $\mathbb{R}^n$ to $\mathbb{R}$.