简介
Data can be interpreted as vectors. Vectors allow us to talk about geometric concepts, such as lengths, distances and angles to characterise similarity between vectors. This will become important later in the course when we discuss PCA. In this module, we will introduce and practice the concept of an inner product. Inner products allow us to talk about geometric concepts in vector spaces. More specifically, we will start with the dot product (which we may still know from school) as a special case of an inner product, and then move toward a more general concept of an inner product, which play an integral part in some areas of machine learning, such as kernel machines (this includes support vector machines and Gaussian processes). We have a lot of exercises in this module to practice and understand the concept of inner products.
学习目标
- Explain inner products, and compute angles and distances using inner products
- Write code that computes distances and angles between images
- Demonstrate an understanding of properties of inner products
- Discover that orthogonality depends on the inner product
Dot Product $\ne$ Inner Product, the former is just one of the later.
Inner Product is the generalization of dot product.
Inner Product allow us to define and describe some geometric concepts.
Dot Product
Two vectors $\mathbf x, \mathbf y \in \mathbb R^n$.
Definition1
$\color{blue}{\mathbf x \cdot \mathbf y = len(\mathbf x)len(\mathbf y)cos(\theta)}$
In this definition, lengths and angles are given! We use them to define the “dot product.”
Definition2
$\color{blue}{\mathbf x \cdot \mathbf y = \sum_{i=1}^n x_iy_i = \mathbf x^T \mathbf y}$
In this definition, there is no lengths and angle, we only use vector itself and two operations: scalar multiplication and addition. Or, we also can just definite it in a matrix form: transpose and matrix multiplication.
Length of a vector
$len(\mathbf x) = \Vert \mathbf x \Vert = \sqrt { \Vert \mathbf x \Vert ^2} $
$ \Vert \mathbf x \Vert ^2 =\mathbf x \cdot \mathbf x =\mathbf x^T \mathbf x$.
Note, here we are not defining the length, we just derive this property from the definiton of “dot product”. i.e. $\mathbf x \cdot \mathbf x = len(\mathbf x)len(\mathbf x)cos(\theta) =len^2(\mathbf x)$.
Distance between two vectors
$\mathbf d = \mathbf x - \mathbf y$
$\Vert \mathbf d \Vert = \sqrt{\mathbf d^T \mathbf d}$
Angle between two vectors.
$cos(\theta) = \frac{\mathbf x \cdot \mathbf y}{\Vert \mathbf x\Vert \Vert \mathbf y \Vert} = \frac{\mathbf x^T \mathbf y}{\Vert \mathbf x\Vert \Vert \mathbf y\Vert}$
We are not definiting the angle too, we just derive this property from the definition of “dot product”.
Inner Product
Definition
Inner product is a generializeiton of dot product, which can be used to represent the geometric concepts: length, distance and angle, etc.
A vector space with a additional structure–inner product is referred to the inner product space.
从这一刻开始,我们首先定义 Inner Product, 然后任何的几何概念、几何操作 都用 Inner Product 来描述。我们熟悉的 dot product-也仅仅是一种 valid Inner Product 而已。
此后,几何概念(e.g. 长度、距离、夹角等待)均由内积来定义和描述。此刻开始的linear algebra,现有内积,后有几何。
Definition
Formally, an inner product space is a vector space $V$ over the field $\mathbb R$ together with an inner product which is a map
$\color{blue}{\langle \cdot,\cdot \rangle: V\times V \rightarrow \mathbb R}$.
$\color{purple}{\text{i.e. it is a function which take two vectors as input and output a scalar.} }$
that satisfies the following three axioms for all vectors $x,y,z \in V$ and scalars $\alpha \in \mathbb R$:
Commutativity
$\langle \mathbf x, \mathbf y \rangle = \langle \mathbf y,\mathbf z \rangle$
Sometimes, also is called symmteric.
Linearity
$\langle \alpha \mathbf x, \mathbf y \rangle=\alpha \langle \mathbf x, \mathbf y \rangle$
$\langle \mathbf x+\mathbf y, \mathbf z \rangle=\langle \mathbf x, \mathbf z \rangle + \langle \mathbf y, \mathbf z \rangle$
实际上,这一条axiom配合第一条axiom,可以很容易地推导出:$\langle x,y+z \rangle = \langle y+z,x\rangle= \langle y,x\rangle + \langle z ,x\rangle$ ,因此这一条linear axiom实际上可以说是bilinearity.
Positive-definite
$\langle \mathbf x,\mathbf x \rangle > 0$ , $\text{for any}$ $\mathbf x \ne \mathbf 0$.
$\langle \mathbf x, \mathbf x \rangle=\mathbf 0$, $\text{if and only if }$ $\mathbf x =\mathbf 0$.
Example 1:
$\color{blue}{\langle \mathbf x,\mathbf y \rangle=\mathbf x^T \mathbf y}$ Which is dot product, it’s a vaild inner product according to the three axioms.
Example 2:
$\color{blue}{\langle \mathbf x, \mathbf y\rangle = \mathbf x^T\mathbf A \mathbf y}$, where $\mathbf A$ is a symmetric positive matrix,i.e. $\mathbf A\in \mathbb S^{\dagger}$. 我们现在用三条公理验证一下:
- $\langle \mathbf y, \mathbf x\rangle = \mathbf y^T\mathbf A \mathbf x = (\mathbf y^T\mathbf A \mathbf x)^T=\mathbf x^T\mathbf A^T \mathbf y= \mathbf x^T\mathbf A \mathbf y = \langle \mathbf x, \mathbf y \rangle$.
- $\langle \mathbf x+ \mathbf y,\mathbf z\rangle = (\mathbf x+\mathbf y)^T\mathbf A \mathbf z = \mathbf x^T \mathbf A \mathbf z + \mathbf y^T \mathbf A \mathbf z = \langle \mathbf x, \mathbf z \rangle + \langle \mathbf y, \mathbf z \rangle$.
- $\langle \mathbf x, \mathbf x \rangle= \mathbf x^T \mathbf A \mathbf x > \mathbf 0$, for any $\mathbf x\ne \mathbf0.$ (by the definition of quadratic form.)
经过三公理的检验,我们知道:$\langle \mathbf x, \mathbf y \rangle= \mathbf x\mathbf A^T \mathbf y$ 是一种合理的 Inner Product.
由此可见,dot product is just a special kind of vaild inner products.
Length of vectors
这里,我们先有了 Inner Product的定义,基于此我们来定义 length of vector. 注意这个顺序与在dot product 中是相反的。
Definition
$\color{blue}{len(\mathbf x) = \Vert \mathbf x\Vert = \sqrt{\langle \mathbf x,\mathbf x \rangle}}$. ($norm(\mathbf x)$)
- 在这里,长度的概念依赖于 Inner Product.
- Inner Product 不同的具体形式决定了“同一个vector 的长度不是恒定不变的,随着Inner Product 的改变同一个vector的长度可能随之改变”
Example 3: $\mathbf x =\begin{bmatrix} 1 \\1\end{bmatrix}$, and then $len(\mathbf x) =?$
- $\langle \mathbf x,\mathbf y \rangle = \mathbf x^T\mathbf y$ : $len(\mathbf x) = \sqrt{\langle \mathbf x, \mathbf x\rangle}= \sqrt{2}$.
- $\langle \mathbf x, \mathbf y\rangle = \mathbf x^T\begin{bmatrix} 2& -1\\-1 & 2 \end{bmatrix}\mathbf y $ : and then $len(\mathbf x) = \sqrt{\langle \mathbf x, \mathbf x\rangle} = 2$.
Properties
$\text{ for any }\mathbf x,\mathbf y\in \mathbb R^n$
- $\Vert \mathbf x\Vert \ge 0 $.
- $\Vert \lambda \mathbf x \Vert = \vert \lambda \vert \Vert \mathbf x \Vert$. ( homogeneity )
- $\Vert \mathbf x + \mathbf y \Vert \le \Vert \mathbf x \Vert + \Vert \mathbf y \Vert.$ (triangle equality)
- $\vert \langle \mathbf x , \mathbf y\rangle\vert \le \sqrt{\langle \mathbf x, \mathbf x \rangle}\sqrt{\langle \mathbf y,\mathbf y \rangle} = \Vert \mathbf x \Vert \Vert \mathbf y \Vert $ .( Cauchy-Schwart inequality )
Distance between two vectors
同样,也是基于Inner Product 的定义。也就是说,2个向量间的距离取决于 Inner Product 的具体形似;不同的 Inner Product 会导致不同的距离。
$d(\mathbf x, \mathbf y)=\Vert \mathbf{x-y} \Vert = \sqrt{\langle (\mathbf{x-y}),(\mathbf {x -y}) \rangle}$.
Example 4:
$\mathbf x = \begin{bmatrix}2 \\3 \end{bmatrix}, \mathbf y = \begin{bmatrix}4\\1 \end{bmatrix}$.
$\langle \mathbf x, \mathbf y\rangle = \mathbf x^T \mathbf y$ : $d(\mathbf x, \mathbf y) =\sqrt {(\mathbf{x-y})^T(\mathbf {x-y})}=\sqrt{8}$.
$\langle \mathbf x, \mathbf y\rangle =\mathbf x^T \mathbf A \mathbf y$ , where $\mathbf A = \begin{bmatrix}2 & -1\\-1&2 \end{bmatrix}$ :
$d(\mathbf x, \mathbf y) = \sqrt{(\mathbf {x-y})^T\mathbf A (\mathbf {x-y})} = \sqrt{24}$.
我们可以看出,在不同的 Inner Product 下,同样两个向量的 distance是不一样的。
Angles and orthogonality
Angles between two vectors.
$cos(\theta) =\frac{\langle \mathbf x, \mathbf y \rangle}{\sqrt{\langle \mathbf x, \mathbf x\rangle}\sqrt{\langle \mathbf y, \mathbf y\rangle}} = \frac{\langle \mathbf x, \mathbf y \rangle}{\Vert \mathbf x \Vert \Vert \mathbf y \vert}$.
注意这个定义和本节课最初的和角度相关的公式的区别. 之前的角度公式是dot product 的性质;这里我们是用 Inner Product 来定义角度 ; 之前我们是先有了角度,才有dot product,现在我们是先有了 Inner product,然后才定义了角度。
既然angles 由 Inner Product 来定义,那么与之前的length, distance一样:$cos(\theta)$ 取决于 Inner Product的定义,不同的 Inner Product 会导致 angle不同。
Orthogonality
The definition of orthogonality depend on Inner product:
$ \langle \mathbf x, \mathbf y \rangle = 0 \Leftrightarrow $ $\mathbf x$ and $\mathbf y$ is orthogonal $\Leftrightarrow \theta =\frac{\pi}{2}rad =90^\circ.$
Again, 我们再一次强调,之前我们的结论是:先有两向量垂直,再有 dot product 为0; 现在流程反过来:先有Inner product 为 0, 再有两向量垂直。这可以体现出,哪个concept 是主导概念,哪个是附属概念。这里, Inner Product 显然是主导概念。
Example 5:
$\mathbf x = \begin{bmatrix}1\\1 \end{bmatrix}, \mathbf y=\begin{bmatrix} -1\\1 \end{bmatrix}.$
$\langle \mathbf x, \mathbf y\rangle = \mathbf x^T \mathbf y$ : $cos(\theta) = 0 \Rightarrow \theta = 90^\circ$, i.e. $\mathbf x$ and $\mathbf y$ are orthogonal.
$\langle \mathbf x, \mathbf y\rangle = \mathbf x^T \begin{bmatrix}2 & 0\\ 0 & 1 \end{bmatrix} \mathbf y = -1\ne 0 $ : i.e. $\mathbf x$ and $\mathbf y$ are not orthogonal.
同样2个向量,基于不同的 inner product 是否 orthogonal 并不一定是一致的。
几何观点:
从几何观点看待两个向量 orthogonal,即:这两个向量最不相似,除了原点之外没有共同之处。
Inner product of functions and random variable
思考之前 Inner Product 的定义:其本质是一个函数,输入2个vectors 输出一个 scalar. 如果现在的 vector 是函数,那什么函数符合这个原则?积分!
Inner product for function
Definition
现在,$V$ 是一个function space which means that each vector in this vector space is a function .
$\color{blue}{\langle f,g\rangle = \int_a^b f(x)g(x)dx}$
上面定义的 innder product 符合3哥axioms,如下:
- Commutativity
- $\langle f,g\rangle = \int_a^b g(x)f(x)dx = \langle g,f \rangle$.
- Linearity
- $\langle f+h,g \rangle = \int_a^b(f(x)+h(x))g(x)dx = \int_a^bf(x)g(x)dx + \int_a^b h(x)g(x)dx = \langle f,g\rangle + \langle h,g \rangle$
- $\langle\alpha f,g \rangle = \int_a^b \alpha f(x)g(x) = \alpha \int_a^b f(x)g(x)dx = \alpha \langle f,g \rangle$
- Positive-definite
- $\langle f,f\rangle = \int_a^b f^2(x)dx >0$ for any $f(x)\ne 0$ when $x\in[a,b]$.
- $\langle f,f\rangle = \int_a^b f^2(x)dx =0$ if and only if $f(x) = 0$ for $x\in [a,b]$.
Orthgonal/angle
类比我们在传统向量空间中的概念,函数空间中也有类似概念:两个函数的正交性,夹角等。
Example 6:
$f(x) = sin(x), g(x) = cos(x), a=-\pi,b = \pi$ ,so $\langle f,g\rangle = \int_{-\pi}^{\pi} sin(x)cos(x)dx$.
$\langle sin(x), cos(x)\rangle = 0 \Rightarrow \color{purple}{\text{sin(x) and cos(x) are orthogonal.}}$
Example 7:
$\lbrace 1, cosx, cos2x,cos3x,…\rbrace$ Functions in this set Is orthogonal to each other.
Norm/length of function
$len(f(x))=norm(f(x)) = \Vert f(x) \Vert=\sqrt{\langle f,f\rangle}$.
Example 8: innder product is defined as the one in example 6.
$len(sin(x)) = \int_{-\pi}^{\pi} sin(x)sin(x)dx = 4$.
Inner product for random variable
Definition
If two random variables are uncorrelated, then $var[X+Y] = var[X] + var[Y]$.
If we define inner product is :
$\color{blue}{\langle X,Y\rangle =cov[X,Y]}$, 我们现在验证其是否为一个 valid inner product.
- Commutativity
- $\langle X,Y\rangle = cov[X,Y] = cov[Y,X] = \langle Y,X\rangle$.
- Linearity
- $\langle X+Z,Y\rangle = cov[X+Z,Y]= cov[X,Y] + cov[Z,Y] = \langle X,Y \rangle + \langle Z,Y \rangle$.
- $\langle \alpha X,Y\rangle = cov[\alpha X,Y]=\alpha cov[X,Y]=\alpha \langle X,Y \rangle$.
- Positive-definite
- $\langle X,X \rangle = cov[X,X]=var[X] >0$ for any random variable $\sigma(X) \ne 0$.
所以,covariance 是 valid inner product.
Norm/length
$len(X)=norm(X) = \sqrt{\langle X,X\rangle} = \sqrt{cov[X,X]} = \sqrt{var[X]}=\sigma(X)$
随机变量的长度 就是其 standard deviation。
Angle
$cos(\theta) = \frac{\langle X,Y\rangle}{\sqrt{\langle X,X \rangle}\sqrt{\langle Y,Y \rangle}}= \frac{cov[X,Y]}{\sqrt{var[X]}\sqrt{var[Y]}}$. which is the just the correlation coefficient.
两个随机变量间的夹角就是他们的相关系数。并且:
$\theta = 0 \Leftrightarrow$ $X$ And $Y$ are uncorrelated $\Leftrightarrow$ $X$ and $Y$ are orthogonal.
Distance
$$
\begin{align}
\color{blue}{d(X,Y)} &= \Vert\langle X-Y,X-Y\rangle \Vert \\
&= \sqrt{\langle X-Y,X-Y\rangle} \\
&= \sqrt{cov[X-Y,X-Y]}\\
&=\sqrt{var[X-Y]}
= \color{blue}{\sigma(X-Y)}
\end{align}
$$
Basic vectors
Here is a document which describe the basic vector clearly, you can download here.
总结
- 在这一节课,我们先定义了什么是 Inner Product: 一个函数,输入为两个 vectors, 输出为一个 scalar.
- 在此基础之上,我们定义了 length(norm),angle, distance. 也就是说,内积直接决定了这些概念,随着选择不同的内积,这些概念的取值可能发生变化。
- 我们将上述概念扩展到了 function space and random variable space.
- 我们重新认识了 Basic vectors.
一份在线练习,用来巩固关键概念并借此实现一个经典分类算法-KNN 分类, link.