前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >PRML一书中关于贝叶斯曲线拟合结论的推导细节

PRML一书中关于贝叶斯曲线拟合结论的推导细节

作者头像
Mezereon
发布2022-09-26 16:16:25
2470
发布2022-09-26 16:16:25
举报
文章被收录于专栏:MyBlogMyBlog

PRML一书中关于贝叶斯曲线拟合结论的推导细节

我们令训练数据集为

(X,T)
(X,T)

, 对于一个新的点

x
x

, 我们希望给出一个预测分布

p(t|x, X, T)
p(t|x, X, T)
p(t|x,X,T) =\int p(t|x,w,X,T)dw= \int p(t|x,w)p(w|X,T)dw \\
p(t|x,X,T) =\int p(t|x,w,X,T)dw= \int p(t|x,w)p(w|X,T)dw \\

其中,

w=[w_0,w_1,...,w_M]^T
w=[w_0,w_1,...,w_M]^T

M
M

阶多项式的参数 在PRML一书中,直接给出了这么一个结论

预测分布由高斯分布

\mathcal N(t|m(x),s^2(x))
\mathcal N(t|m(x),s^2(x))

给出 其中均值

m(x) = \beta\phi(x)^TS\sum_{n=1}^{N}\phi(x_n)t_n
m(x) = \beta\phi(x)^TS\sum_{n=1}^{N}\phi(x_n)t_n

方差

s^2(x) = \beta^{-1} + \phi(x)^TS\phi(x)
s^2(x) = \beta^{-1} + \phi(x)^TS\phi(x)

其中

S^{-1} = \alpha I +\beta\sum_{n=1}^{N}\phi(x_n)\phi(x_n)^T
S^{-1} = \alpha I +\beta\sum_{n=1}^{N}\phi(x_n)\phi(x_n)^T

,

\phi(x)=[1,x,x^2,...,x^M]^T
\phi(x)=[1,x,x^2,...,x^M]^T

这个结论给的非常突然,让人无所适从,我决定花点时间分析一下,并记录下来,以供大家参考!

如上式所述,其可以被写成积分的形式,我们利用一些结论来进行分析

  1. 首先对于t的分布应该是一个高斯分布,
p(t|x,w) = \mathcal N(t|y(x,w), \beta^{-1})
p(t|x,w) = \mathcal N(t|y(x,w), \beta^{-1})
  1. 对于分布
p(w|X,T)
p(w|X,T)

, 其正比于先验分布和似然的乘积

p(w|X,T, \alpha, \beta) \varpropto p(T|X, w, \beta)p(w|\alpha)
p(w|X,T, \alpha, \beta) \varpropto p(T|X, w, \beta)p(w|\alpha)
代码语言:javascript
复制
    如果$\alpha, \beta$ 已知, 可以写成:
p(w|X,T) = \frac{p(T|X,w)p(X,w)}{p(X,T)} = \frac{p(T|X,w)p(w|X)p(X)}{p(T|X)p(X)} = \frac{p(T|X,w)p(w|X)}{p(T|X)}
p(w|X,T) = \frac{p(T|X,w)p(X,w)}{p(X,T)} = \frac{p(T|X,w)p(w|X)p(X)}{p(T|X)p(X)} = \frac{p(T|X,w)p(w|X)}{p(T|X)}

所以我们可以继续将积分式子改写:

p(t|x,X,T) = \int \mathcal N(t|y(x,w), \beta^{-1}) \prod_{n=1}^N\mathcal N(t_n|y(x_n, w), \beta^{-1})\frac{p(w|X)}{p(T|X)}dw
p(t|x,X,T) = \int \mathcal N(t|y(x,w), \beta^{-1}) \prod_{n=1}^N\mathcal N(t_n|y(x_n, w), \beta^{-1})\frac{p(w|X)}{p(T|X)}dw
y(x,w) = w_0 + w_1x + w_2x^2 + \cdots + w_Mx^M = \phi(x)^T[w_0, w_1, ..., w_M]^T = \phi(x)^Tw\\ \phi(x) = [1,x,x^2,...,x^M]^T
y(x,w) = w_0 + w_1x + w_2x^2 + \cdots + w_Mx^M = \phi(x)^T[w_0, w_1, ..., w_M]^T = \phi(x)^Tw\\ \phi(x) = [1,x,x^2,...,x^M]^T

从而,对于高斯分布

\mathcal N(t|y(x,w), \beta^{-1})
\mathcal N(t|y(x,w), \beta^{-1})

, 可以写成:

\mathcal N(t|y(x,w), \beta^{-1}) = \frac{1}{\sqrt{2\pi\beta^{-1}}}\exp(-\frac{(t-y(x,w))^2}{2\beta^{-1}}) = \frac{1}{\sqrt{2\pi\beta^{-1}}}\exp(-\frac{(t-\phi(x)^Tw)^2}{2\beta^{-1}})
\mathcal N(t|y(x,w), \beta^{-1}) = \frac{1}{\sqrt{2\pi\beta^{-1}}}\exp(-\frac{(t-y(x,w))^2}{2\beta^{-1}}) = \frac{1}{\sqrt{2\pi\beta^{-1}}}\exp(-\frac{(t-\phi(x)^Tw)^2}{2\beta^{-1}})

同样的,我们可以写出N个高斯分布乘积的形式

\prod_{n=1}^N\mathcal N(t_n|y(x_n, w), \beta^{-1})=(\frac{1}{2\pi\beta^{-1}})^{N/2}\exp(-\frac{\beta}{2}\sum_{n=1}^N(t_n-\phi(x_n)^Tw)^2)
\prod_{n=1}^N\mathcal N(t_n|y(x_n, w), \beta^{-1})=(\frac{1}{2\pi\beta^{-1}})^{N/2}\exp(-\frac{\beta}{2}\sum_{n=1}^N(t_n-\phi(x_n)^Tw)^2)

于是,如下:

\mathcal N(t|y(x,w), \beta^{-1}) \prod_{n=1}^N\mathcal N(t_n|y(x_n, w), \beta^{-1}) = (\frac{1}{2\pi\beta^{-1}})^{(N+1)/2}\exp(-\frac{\beta}{2}((t-\phi(x)^Tw)^2 + \sum_{n=1}^N(t_n-\phi(x_n)^Tw)^2))
\mathcal N(t|y(x,w), \beta^{-1}) \prod_{n=1}^N\mathcal N(t_n|y(x_n, w), \beta^{-1}) = (\frac{1}{2\pi\beta^{-1}})^{(N+1)/2}\exp(-\frac{\beta}{2}((t-\phi(x)^Tw)^2 + \sum_{n=1}^N(t_n-\phi(x_n)^Tw)^2))

如果我们将

p(w|X) = p(w|\alpha) = (\frac{\alpha}{2\pi})^{(M+1)/2}\exp(-\frac{\alpha}{2}w^Tw)
p(w|X) = p(w|\alpha) = (\frac{\alpha}{2\pi})^{(M+1)/2}\exp(-\frac{\alpha}{2}w^Tw)

, 且

p(T|X)=1
p(T|X)=1

则有

p(t|x,X,T) = (\frac{\beta}{2\pi })^{\frac{N+1}{2}} (\frac{\alpha}{2\pi})^{ \frac{M+1}{2}}\int \exp\Big(-\frac{\alpha}{2}w^Tw -\frac{\beta}{2}((t-\phi(x)^Tw)^2 + \sum_{n=1}^N(t_n-\phi(x_n)^Tw)^2) \Big ) dw
p(t|x,X,T) = (\frac{\beta}{2\pi })^{\frac{N+1}{2}} (\frac{\alpha}{2\pi})^{ \frac{M+1}{2}}\int \exp\Big(-\frac{\alpha}{2}w^Tw -\frac{\beta}{2}((t-\phi(x)^Tw)^2 + \sum_{n=1}^N(t_n-\phi(x_n)^Tw)^2) \Big ) dw

注意到,高斯积分的形式

\int_{-\inf}^{+\inf} e^{ -a(x+b)^2 } dx = \sqrt{ \frac{ \pi }{ a } }
\int_{-\inf}^{+\inf} e^{ -a(x+b)^2 } dx = \sqrt{ \frac{ \pi }{ a } }

故,

\int \exp\Big(-\frac{\alpha}{2}w^Tw -\frac{\beta}{2}((t-\phi(x)^Tw)^2 + \sum_{n=1}^N(t_n-\phi(x_n)^Tw)^2) \Big ) dw = \int\exp(-k(w+b)^2+u)dw\\ k = \frac{\alpha}{2} + \frac{\beta}{2}(\phi(x)^T\phi(x) + \sum_{n=1}^{N}\phi(x_n)^T\phi(x_n))
\int \exp\Big(-\frac{\alpha}{2}w^Tw -\frac{\beta}{2}((t-\phi(x)^Tw)^2 + \sum_{n=1}^N(t_n-\phi(x_n)^Tw)^2) \Big ) dw = \int\exp(-k(w+b)^2+u)dw\\ k = \frac{\alpha}{2} + \frac{\beta}{2}(\phi(x)^T\phi(x) + \sum_{n=1}^{N}\phi(x_n)^T\phi(x_n))

相当于对于一个二次式进行配方,我们简单记作:

-m_1w^2-m_2(m_3w^2+m_4w+m_5)\\ m_1 = \frac{\alpha}{2}, m_2=\frac{\beta}{2}\\ m_3 = \phi(x)^T\phi(x) + \sum_{n=1}^N\phi(x_n)^T\phi(x_n)\\ m_4 =-2(t\phi(x)^T + \sum_{n=1}^Nt_n\phi(x_n)^T)\\ m_5 = t^2 + \sum_{n=1}^Nt_n^2
-m_1w^2-m_2(m_3w^2+m_4w+m_5)\\ m_1 = \frac{\alpha}{2}, m_2=\frac{\beta}{2}\\ m_3 = \phi(x)^T\phi(x) + \sum_{n=1}^N\phi(x_n)^T\phi(x_n)\\ m_4 =-2(t\phi(x)^T + \sum_{n=1}^Nt_n\phi(x_n)^T)\\ m_5 = t^2 + \sum_{n=1}^Nt_n^2

从而,

-(m_1+m_2m_3)w^2 - m_2m_4w - m_2m_5=-(m_1+m_2m_3)(w^2+\frac{m_2m_4}{m_1+m_2m_3}w + \frac{m_2m_5}{m_1+m_2m_3})\\ =-(m_1+m_2m_3)[(w+\frac{m_2m_4}{2(m_1+m_2m_3)})^2 + \frac{m_2m_5}{m_1+m_2m_3} - \frac{m_2^2m_4^2}{4(m_1+m_2m_3)^2}]\\ =-(m_1+m_2m_3)[(w+\frac{m_2m_4}{2(m_1+m_2m_3)})^2 + \frac{4(m_1+m_2m_3)m_2m_5 -m_2^2m_4^2}{4(m_1+m_2m_3)^2}]\\ =-(m_1+m_2m_3)(w+\frac{m_2m_4}{2(m_1+m_2m_3)})^2 + \frac{4(m_1+m_2m_3)m_2m_5 -m_2^2m_4^2}{4(m_1+m_2m_3)}
-(m_1+m_2m_3)w^2 - m_2m_4w - m_2m_5=-(m_1+m_2m_3)(w^2+\frac{m_2m_4}{m_1+m_2m_3}w + \frac{m_2m_5}{m_1+m_2m_3})\\ =-(m_1+m_2m_3)[(w+\frac{m_2m_4}{2(m_1+m_2m_3)})^2 + \frac{m_2m_5}{m_1+m_2m_3} - \frac{m_2^2m_4^2}{4(m_1+m_2m_3)^2}]\\ =-(m_1+m_2m_3)[(w+\frac{m_2m_4}{2(m_1+m_2m_3)})^2 + \frac{4(m_1+m_2m_3)m_2m_5 -m_2^2m_4^2}{4(m_1+m_2m_3)^2}]\\ =-(m_1+m_2m_3)(w+\frac{m_2m_4}{2(m_1+m_2m_3)})^2 + \frac{4(m_1+m_2m_3)m_2m_5 -m_2^2m_4^2}{4(m_1+m_2m_3)}

存在一个常数项,即

\frac{4(m_1+m_2m_3)m_2m_5 -m_2^2m_4^2}{4(m_1+m_2m_3)} = \frac{\beta(\alpha + \beta m_3)m_5 - 4^{-1}\beta^2m_4^2}{2(\alpha + \beta m_3)}
\frac{4(m_1+m_2m_3)m_2m_5 -m_2^2m_4^2}{4(m_1+m_2m_3)} = \frac{\beta(\alpha + \beta m_3)m_5 - 4^{-1}\beta^2m_4^2}{2(\alpha + \beta m_3)}

从而,

\int \exp\Big(-\frac{\alpha}{2}w^Tw -\frac{\beta}{2}((t-\phi(x)^Tw)^2 + \sum_{n=1}^N(t_n-\phi(x_n)^Tw)^2) \Big ) dw = \sqrt{\frac{\pi}{k}}\exp( \frac{\beta(\alpha + \beta m_3)m_5 - 4^{-1}\beta^2m_4^2}{2(\alpha + \beta m_3)})
\int \exp\Big(-\frac{\alpha}{2}w^Tw -\frac{\beta}{2}((t-\phi(x)^Tw)^2 + \sum_{n=1}^N(t_n-\phi(x_n)^Tw)^2) \Big ) dw = \sqrt{\frac{\pi}{k}}\exp( \frac{\beta(\alpha + \beta m_3)m_5 - 4^{-1}\beta^2m_4^2}{2(\alpha + \beta m_3)})

代入到原式得

p(t|x,X,T)= (\frac{\beta}{2\pi })^{\frac{N+1}{2}} (\frac{\alpha}{2\pi})^{ \frac{M+1}{2}}\sqrt{\frac{\pi}{k}}\exp( \frac{\beta(\alpha + \beta m_3)m_5 - 4^{-1}\beta^2m_4^2}{2(\alpha + \beta m_3)})\\
p(t|x,X,T)= (\frac{\beta}{2\pi })^{\frac{N+1}{2}} (\frac{\alpha}{2\pi})^{ \frac{M+1}{2}}\sqrt{\frac{\pi}{k}}\exp( \frac{\beta(\alpha + \beta m_3)m_5 - 4^{-1}\beta^2m_4^2}{2(\alpha + \beta m_3)})\\

对于指数部分的系数:

(\frac{\beta}{2\pi })^{\frac{N+1}{2}} (\frac{\alpha}{2\pi})^{ \frac{M+1}{2}}\sqrt{\frac{\pi}{k}}=\Big((\frac{\beta^{N+1}\alpha^{M+1}}{(2\pi)^{N+M+2}} (\frac{\alpha}{2}+\frac{\beta}{2}(\phi(x)^T\phi(x) + \sum_{n=1}^{N}\phi(x_n)^T\phi(x_n))))\Big)^{1/2}
(\frac{\beta}{2\pi })^{\frac{N+1}{2}} (\frac{\alpha}{2\pi})^{ \frac{M+1}{2}}\sqrt{\frac{\pi}{k}}=\Big((\frac{\beta^{N+1}\alpha^{M+1}}{(2\pi)^{N+M+2}} (\frac{\alpha}{2}+\frac{\beta}{2}(\phi(x)^T\phi(x) + \sum_{n=1}^{N}\phi(x_n)^T\phi(x_n))))\Big)^{1/2}

而指数部分为:

\frac{\beta(\alpha + \beta m_3)m_5 - 4^{-1}\beta^2m_4^2}{2(\alpha + \beta m_3)} = \frac{\beta(\alpha + \beta m_3)(t^2 + t_{sum}) - \frac{1}{4}\beta^2(-2(t\phi(x)^T + \sum_{n=1}^Nt_n\phi(x_n)^T))^2}{2(\alpha +\beta m_3)}\\ =\frac{\beta(\alpha + \beta m_3)(t^2 + t_{sum}) - \beta^2(t\phi(x)^T + q)^2}{2(\alpha +\beta m_3)} =\frac{[\alpha \beta + \beta^2 m_3 - \beta^2\phi(x)^T\phi(x)]\cdot t^2 -2\beta^2 \phi(x)^Tq^T t + (\alpha\beta+\beta^2 m_3)t_{sum}-\beta^2qq^T}{2(\alpha +\beta m_3)} \\ =\frac{[\alpha \beta + \beta^2 v]\cdot t^2 -2\beta^2 \phi(x)^Tq^T t + (\alpha\beta+\beta^2 m_3)t_{sum}-\beta^2qq^T}{2(\alpha +\beta m_3)}\\ =(\alpha \beta + \beta^2 v)\frac{t^2 -2\frac{\beta^2 \phi(x)^Tq^T}{\alpha \beta + \beta^2 v} t + \frac{(\alpha\beta+\beta^2 m_3)t_{sum}-\beta^2qq^T}{\alpha \beta + \beta^2 v}}{2(\alpha +\beta m_3)} =(\alpha \beta + \beta^2 v)\frac{ (t -\frac{\beta^2 \phi(x)^Tq^T}{\alpha \beta + \beta^2 v})^2 -\frac{(\beta^2 \phi(x)^Tq^T)^2}{(\alpha \beta + \beta^2 v)^2} + \frac{(\alpha\beta+\beta^2 m_3)t_{sum}-\beta^2qq^T}{\alpha \beta + \beta^2 v}}{2(\alpha +\beta m_3)} \\ t_{sum} = \sum_{n=1}^Nt_n^2\\ q = \sum_{n=1}^Nt_n\phi(x_n)^T\\ v = \sum_{n=1}^N\phi(x_n)^T\phi(x_n)
\frac{\beta(\alpha + \beta m_3)m_5 - 4^{-1}\beta^2m_4^2}{2(\alpha + \beta m_3)} = \frac{\beta(\alpha + \beta m_3)(t^2 + t_{sum}) - \frac{1}{4}\beta^2(-2(t\phi(x)^T + \sum_{n=1}^Nt_n\phi(x_n)^T))^2}{2(\alpha +\beta m_3)}\\ =\frac{\beta(\alpha + \beta m_3)(t^2 + t_{sum}) - \beta^2(t\phi(x)^T + q)^2}{2(\alpha +\beta m_3)} =\frac{[\alpha \beta + \beta^2 m_3 - \beta^2\phi(x)^T\phi(x)]\cdot t^2 -2\beta^2 \phi(x)^Tq^T t + (\alpha\beta+\beta^2 m_3)t_{sum}-\beta^2qq^T}{2(\alpha +\beta m_3)} \\ =\frac{[\alpha \beta + \beta^2 v]\cdot t^2 -2\beta^2 \phi(x)^Tq^T t + (\alpha\beta+\beta^2 m_3)t_{sum}-\beta^2qq^T}{2(\alpha +\beta m_3)}\\ =(\alpha \beta + \beta^2 v)\frac{t^2 -2\frac{\beta^2 \phi(x)^Tq^T}{\alpha \beta + \beta^2 v} t + \frac{(\alpha\beta+\beta^2 m_3)t_{sum}-\beta^2qq^T}{\alpha \beta + \beta^2 v}}{2(\alpha +\beta m_3)} =(\alpha \beta + \beta^2 v)\frac{ (t -\frac{\beta^2 \phi(x)^Tq^T}{\alpha \beta + \beta^2 v})^2 -\frac{(\beta^2 \phi(x)^Tq^T)^2}{(\alpha \beta + \beta^2 v)^2} + \frac{(\alpha\beta+\beta^2 m_3)t_{sum}-\beta^2qq^T}{\alpha \beta + \beta^2 v}}{2(\alpha +\beta m_3)} \\ t_{sum} = \sum_{n=1}^Nt_n^2\\ q = \sum_{n=1}^Nt_n\phi(x_n)^T\\ v = \sum_{n=1}^N\phi(x_n)^T\phi(x_n)

故,我们可以从上式中,直接推出均值

m(x)=\frac{\beta^2 \phi(x)^Tq^T}{\alpha \beta + \beta^2 v} = \frac{\beta \phi(x)^Tq^T}{\alpha + \beta v} = (\alpha+\beta \sum_{n=1}^N\phi(x_n)^T\phi(x_n))^{-1}(\beta\phi(x)^T\sum_{n=1}^{N}\phi(x_n)t_n) = \beta \phi(x)^TS\sum_{n=1}^{N}\phi(x_n)t_n\\ S^{-1} = \alpha+\beta \sum_{n=1}^N\phi(x_n)^T\phi(x_n)
m(x)=\frac{\beta^2 \phi(x)^Tq^T}{\alpha \beta + \beta^2 v} = \frac{\beta \phi(x)^Tq^T}{\alpha + \beta v} = (\alpha+\beta \sum_{n=1}^N\phi(x_n)^T\phi(x_n))^{-1}(\beta\phi(x)^T\sum_{n=1}^{N}\phi(x_n)t_n) = \beta \phi(x)^TS\sum_{n=1}^{N}\phi(x_n)t_n\\ S^{-1} = \alpha+\beta \sum_{n=1}^N\phi(x_n)^T\phi(x_n)

倘若上述配方成功,方差为

s^2(x)=\frac{\alpha + \beta m_3}{\alpha \beta + \beta^2 v} = \frac{\alpha+\beta(\phi(x)^T\phi(x) + \sum_{n=1}^N\phi(x_n)^T\phi(x_n))}{\alpha\beta + \beta^2\sum_{n=1}^N\phi(x_n)^T\phi(x_n) } =\frac{\alpha+\beta\sum_{n=1}^N\phi(x_n)^T\phi(x_n) + \beta \phi(x)^T\phi(x)}{\alpha\beta + \beta^2\sum_{n=1}^N\phi(x_n)^T\phi(x_n) }\\ =\frac{1}{\beta} + \frac{\phi(x)^T\phi(x)}{\alpha + \beta \sum_{n=1}^N\phi(x_n)^T\phi(x_n) }\\ = \beta^{-1} + S\phi(x)^T\phi(x)
s^2(x)=\frac{\alpha + \beta m_3}{\alpha \beta + \beta^2 v} = \frac{\alpha+\beta(\phi(x)^T\phi(x) + \sum_{n=1}^N\phi(x_n)^T\phi(x_n))}{\alpha\beta + \beta^2\sum_{n=1}^N\phi(x_n)^T\phi(x_n) } =\frac{\alpha+\beta\sum_{n=1}^N\phi(x_n)^T\phi(x_n) + \beta \phi(x)^T\phi(x)}{\alpha\beta + \beta^2\sum_{n=1}^N\phi(x_n)^T\phi(x_n) }\\ =\frac{1}{\beta} + \frac{\phi(x)^T\phi(x)}{\alpha + \beta \sum_{n=1}^N\phi(x_n)^T\phi(x_n) }\\ = \beta^{-1} + S\phi(x)^T\phi(x)
本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2022-08-26,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • PRML一书中关于贝叶斯曲线拟合结论的推导细节
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档