K-Means算法是较为经典的聚类算法,假设训练数据集XXX为:{x1,x2,⋯,xn}{x1,x2,⋯,xn}\left \{ \mathbf{x}_1,\mathbf{x}_2,\cdots , \mathbf{x}_n \right \},其中,每一个样本xjxj \mathbf{x}_j为mmm维的向量。此时的样本为一个m×nm×nm\times n的矩阵:
Xm×n=(x1x2⋯xn)=⎛⎝⎜⎜⎜⎜x1,1x2,1⋮xm,1x1,2x2,2⋮xm,2⋯⋯⋯x1,nx2,n⋮xm,n⎞⎠⎟⎟⎟⎟m×nXm×n=(x1x2⋯xn)=(x1,1x1,2⋯x1,nx2,1x2,2⋯x2,n⋮⋮⋮xm,1xm,2⋯xm,n)m×n
X_{m\times n}=\begin{pmatrix} \mathbf{x}_{1} & \mathbf{x}_{2} & \cdots & \mathbf{x}_{n} \end{pmatrix}=\begin{pmatrix} x_{1,1} & x_{1,2} & \cdots & x_{1,n}\\ x_{2,1} & x_{2,2} & \cdots & x_{2,n}\\ \vdots & \vdots & &\vdots \\ x_{m,1} & x_{m,2} & \cdots & x_{m,n} \end{pmatrix}_{m\times n}
假设有kkk个类,分别为:{C1,⋯,Ck}{C1,⋯,Ck}\left \{ C_1,\cdots ,C_k \right \}。k-Means算法通过欧式距离的度量方法计算每一个样本xjxj\mathbf{x}_{j}到质心之间的距离,并将其划分到较近的质心所属的类别中并重新计算质心,重复以上的过程,直到质心不再改变为止,上述的过程可以总结为:
K-Means的目标使得每一个样本xjxj\mathbf{x}_{j}被划分到离质心uiui\mathbf{u}_i最近的类别中,而质心为:
ui=∑xj∈Cixj#(xj∈Ci)ui=∑xj∈Cixj#(xj∈Ci)
\mathbf{u}_i=\frac{\sum_{\mathbf{x}_j \in C_i}\mathbf{x}_j}{\# \left ( \mathbf{x}_j \in C_i \right )}
其中,∑xj∈Cixj∑xj∈Cixj\sum_{\mathbf{x}_j \in C_i}\mathbf{x}_j表示的是所有CiCiC_i类中的所有的样本的和,#(xj∈Ci)#(xj∈Ci)\# \left ( \mathbf{x}_j \in C_i \right )表示的是类别CiCiC_i中的样本的个数。
最终使得质心不再改变,这就意味着每一个样本被划分到了最近的质心所属的类别中,即:
min∑i=1k∑j=1nzij‖‖xj−ui‖‖2min∑i=1k∑j=1nzij‖xj−ui‖2
min\; \sum_{i=1}^{k}\sum_{j=1}^{n}z_{ij}\left \| \mathbf{x}_j-\mathbf{u}_i \right \|^2
其中,样本xjxj\mathbf{x}_j是数据集Xm×nXm×nX_{m\times n}的第jjj列。uiui\mathbf{u}_i表示的是第iii个类别的聚类中心。假设Mm×kMm×kM_{m\times k}为聚类中心构成的矩阵。矩阵Zk×nZk×nZ_{k\times n}是由zijzijz_{ij}构成的0-1矩阵,zijzijz_{ij}为:
zij={10 if xi∈Ci otherwise zij={1 if xi∈Ci0 otherwise
z_{ij}=\begin{cases} 1 & \text{ if } \mathbf{x}_i\in C_i \\ 0 & \text{ otherwise } \end{cases}
上述的优化目标可以表示成:(在下面会做证明)
min‖X−MZ‖2min‖X−MZ‖2
min\; \left \| X-MZ\right \|^2
对于上述的最小化问题:
min∑i=1k∑j=1nzij‖‖xj−ui‖‖2min∑i=1k∑j=1nzij‖xj−ui‖2
min\; \sum_{i=1}^{k}\sum_{j=1}^{n}z_{ij}\left \| \mathbf{x}_j-\mathbf{u}_i \right \|^2
则有:
∑i,jzij‖‖xj−ui‖‖2=∑i,jzij(xTjxj−2xTjui+uTiui)=∑i,jzijxTjxj−2∑i,jzijxTjui+∑i,jzijuTiui∑i,jzij‖xj−ui‖2=∑i,jzij(xjTxj−2xjTui+uiTui)=∑i,jzijxjTxj−2∑i,jzijxjTui+∑i,jzijuiTui
\begin{matrix} \sum_{i,j}z_{ij}\left \| \mathbf{x}_j-\mathbf{u}_i \right \|^2\\ =\sum_{i,j}z_{ij}\left ( \mathbf{x}_j^T\mathbf{x}_j-2\mathbf{x}_j^T\mathbf{u}_i+\mathbf{u}_i^T\mathbf{u}_i \right )\\ =\sum_{i,j}z_{ij}\mathbf{x}_j^T\mathbf{x}_j-2\sum_{i,j}z_{ij}\mathbf{x}_j^T\mathbf{u}_i+\sum_{i,j}z_{ij}\mathbf{u}_i^T\mathbf{u}_i \end{matrix}
下面分别对上式中的三项进行计算:
∑i,jzijxTjxj=∑i,jzij‖‖xj‖‖2=∑j‖‖xj‖‖2=tr[XTX]∑i,jzijxjTxj=∑i,jzij‖xj‖2=∑j‖xj‖2=tr[XTX]
\begin{align*} \sum_{i,j}z_{ij}\mathbf{x}_j^T\mathbf{x}_j &= \sum_{i,j}z_{ij}\left \| \mathbf{x}_j \right \|^2\\ &= \sum_{j}\left \| \mathbf{x}_j \right \|^2\\ &= tr\left [ X^TX \right ] \end{align*}
已知:∑izij=1∑izij=1\sum_{i}z_{ij}=1。
∑i,jzijxTjui=∑i,jzij∑lxljuli=∑j,lxlj∑iulizij=∑j,lxlj(MZ)lj=∑j∑l(XT)jl(MZ)lj=∑j(XTMZ)jj=tr[XTMZ]∑i,jzijxjTui=∑i,jzij∑lxljuli=∑j,lxlj∑iulizij=∑j,lxlj(MZ)lj=∑j∑l(XT)jl(MZ)lj=∑j(XTMZ)jj=tr[XTMZ]
\begin{align*} \sum_{i,j}z_{ij}\mathbf{x}_j^T\mathbf{u}_i &= \sum_{i,j}z_{ij}\sum_{l}x_{lj}u_{li}\\ &= \sum_{j,l}x_{lj}\sum_{i}u_{li}z_{ij}\\ &= \sum_{j,l}x_{lj}\left ( MZ \right )_{lj}\\ &= \sum_{j}\sum_{l}\left ( X^T \right )_{jl}\left ( MZ \right )_{lj}\\ &= \sum_{j}\left ( X^TMZ \right )_{jj}\\ &= tr\left [ X^TMZ \right ] \end{align*}
∑i,jzijuTiui=∑i,jzij‖ui‖2=∑i‖ui‖2ni∑i,jzijuiTui=∑i,jzij‖ui‖2=∑i‖ui‖2ni
\begin{align*} \sum_{i,j}z_{ij}\mathbf{u}_i^T\mathbf{u}_i &= \sum_{i,j}z_{ij}\left \| \mathbf{u}_i \right \|^2\\ &= \sum_{i} \left \| \mathbf{u}_i \right \|^2n_i \end{align*}
最终:
∑i,jzij‖‖xj−ui‖‖2=tr[XTX]−2tr[XTMZ]+∑i‖ui‖2ni∑i,jzij‖xj−ui‖2=tr[XTX]−2tr[XTMZ]+∑i‖ui‖2ni
\sum_{i,j}z_{ij}\left \| \mathbf{x}_j-\mathbf{u}_i \right \|^2=tr\left [ X^TX \right ]-2tr\left [ X^TMZ \right ]+\sum_{i} \left \| \mathbf{u}_i \right \|^2n_i
对于上述的优化目标的矩阵写法:
min‖X−MZ‖2min‖X−MZ‖2
min\; \left \| X-MZ\right \|^2
则有:
‖X−MZ‖2=tr[(X−MZ)T(X−MZ)]=tr[XTX]−2tr[XTMZ]+tr[ZTMTMZ]‖X−MZ‖2=tr[(X−MZ)T(X−MZ)]=tr[XTX]−2tr[XTMZ]+tr[ZTMTMZ]
\begin{align*} \left \| X-MZ\right \|^2 &= tr\left [ \left ( X-MZ \right )^T\left ( X-MZ \right ) \right ]\\ &= tr\left [ X^TX \right ]-2tr\left [ X^TMZ \right ]+tr\left [ Z^TM^TMZ \right ] \end{align*}
对于tr[ZTMTMZ]tr[ZTMTMZ]tr\left [ Z^TM^TMZ \right ]:
tr[ZTMTMZ]=tr[MTMZZT]=∑i(MTMZZT)ii=∑i∑l(MTM)il(ZZT)li=∑i(MTM)ii(ZZT)ii=∑i‖ui‖2nitr[ZTMTMZ]=tr[MTMZZT]=∑i(MTMZZT)ii=∑i∑l(MTM)il(ZZT)li=∑i(MTM)ii(ZZT)ii=∑i‖ui‖2ni
\begin{align*} tr\left [ Z^TM^TMZ \right ] &= tr\left [ M^TMZZ^T \right ]\\ &= \sum_{i}\left ( M^TMZZ^T \right )_{ii}\\ &= \sum_{i}\sum_{l}\left ( M^TM \right )_{il}\left ( ZZ^T \right )_{li}\\ &= \sum_{i}\left ( M^TM \right )_{ii}\left ( ZZ^T \right )_{ii}\\ &= \sum_{i}\left \| \mathbf{u}_i \right \|^2n_{i} \end{align*}
因此得证,两种优化目标等价。
最终的目标是求得聚类中心,因此,对矩阵MMM求偏导数:
∂∂M‖X−MZ‖2=∂∂M[tr[XTX]−2tr[XTMZ]+tr[ZTMTMZ]]=2(MZZT−XZT)∂∂M‖X−MZ‖2=∂∂M[tr[XTX]−2tr[XTMZ]+tr[ZTMTMZ]]=2(MZZT−XZT)
\begin{align*} \frac{\partial }{\partial M}\left \| X-MZ\right \|^2 &= \frac{\partial }{\partial M}\left [ tr\left [ X^TX \right ]-2tr\left [ X^TMZ \right ]+tr\left [ Z^TM^TMZ \right ] \right ]\\ &=2\left ( MZZ^T-XZ^T \right ) \end{align*}
令其为000:
M=XZT(ZZT)−1M=XZT(ZZT)−1
M=XZ^T\left ( ZZ^T \right )^{-1}
即可得:
ui=∑jzijxj∑jzij=1ni∑xj∈Cixjui=∑jzijxj∑jzij=1ni∑xj∈Cixj
\mathbf{u}_i=\frac{\sum_{j}z_{ij}\mathbf{x}_j}{\sum_{j}z_{ij}}=\frac{1}{n_i}\sum_{\mathbf{x}_j\in C_i}\mathbf{x}_j
K-Means算法等价于求下述问题的最小值:
minZ‖‖X−XZT(ZZT)−1Z‖‖2minZ‖X−XZT(ZZT)−1Z‖2
\underset{Z}{min}\left \| X-XZ^T\left ( ZZ^T \right )^{-1}Z \right \|^2
s.t.zij∈{0,1},∑jzij=1s.t.zij∈{0,1},∑jzij=1
s.t.\; z_{ij}\in \left \{ 0,1 \right \},\; \sum_{j}z_{ij}=1