专栏首页翻译scikit-learn CookbookDirectly applying Bayesian ridge regression直接使用贝叶斯岭回归

Directly applying Bayesian ridge regression直接使用贝叶斯岭回归

In the Using ridge regression to overcome linear regression's shortfalls recipe, we discussed the connections between the constraints imposed by ridge regression from an optimization standpoint. We also discussed the Bayesian interpretation of priors on the coefficients, which attract the mass of the density towards the prior, which often has a mean of 0 .

在使用岭回归解决线性回归方法的不足之处时,我们讨论了岭回归在最优点与施加限制之间的联系。我们也讨论了贝叶斯对先验概率的系数方面的解,其均值为0并且集中在先验密度附近。

So, now we'll look at how we can directly apply this interpretation though scikit-learn.

所以,现在我们看一看通过scikit-learn,我们如何直接应用这个解。

Getting ready准备工作

Ridge and lasso regression can both be understood through a Bayesian lens as opposed to an optimization lens. Only Bayesian ridge regression is implemented by scikit-learn, but in the How it works... section, we'll look at both cases.

岭回归和Lasso回归都能够通过一个与贝叶斯模型最优模型相反的模型来理解,只有贝叶斯岭回归被scikit-learn执行。但在如何运行部分,我们将观察两种例子:

First, as usual, let's create some regression data:首先,让我们生成回归数据集:

from sklearn.datasets import make_regression
X, y = make_regression(1000, 10, n_informative=2, noise=20)

How to do it...如何做的

We can just "throw" ridge regression at the problem with a few simple steps:在这个问题中,我们只需要丢给岭回归很少的步骤

from sklearn.linear_model import BayesianRidge
br = BayesianRidge()

The two sets of coefficients of interest are alpha_1 / alpha_2 and lambda_1 / lambda_2 .The alphas are the hyperparameters for the prior over the alpha parameter, and the lambda are the hyperparameters of the prior over the lambda parameter.

有趣的两组系数集就是α1/α2和λ1/λ2,其中α是初始值的超参数超过了α参数,其中λ是初始值的超参数超过了λ参数。

First, let's fit a model without any modification to the hyperparameters:

首先,我们来拟合一个不含任何调整的超参数的模型

br.fit(X, y)
br.coef_
array([-0.37073297,  0.16745965, -0.77672044, 29.24241894, -0.69319217,
        0.64905847, 86.9454228 , -0.24738249, -1.63909699,  1.43038709])

Now, if we modify the hyperparameters, notice the slight changes in the coefficients:

现在,如果我们调整超参数,注意系数微小的变化:

br_alphas = BayesianRidge(alpha_1=10, lambda_1=10)
br_alphas.fit(X, y)
br_alphas.coef_
array([-0.36917484,  0.16682313, -0.77961059, 29.21596299, -0.69730227,
        0.64425288, 86.86658136, -0.2477023 , -1.63266313,  1.42687844])

How it works...如何工作的

For Bayesian ridge regression, we assume a prior over the errors and alpha. Both these priors are gamma distributions.The gamma distribution is a very flexible distribution. Here are some of the different shapes the gamma distribution can take given the different parameterization techniques for location and scale. 1e-06 is the default parameterization of BayesianRidge in scikit-learn:

对于贝叶斯岭回归,我们设定一个初始值大于误差和α,他们都是γ分布。γ分布是很有弹性的分布,这里有一些不同形状的γ分布,它们是被给与不同参数化技术的基本值和缩放,1e-06是scikit-learn定义的贝叶斯岭回归的参数。

As you can see, the coefficients are naturally shrunk towards 0 , especially with a very small location parameter.

如你所见,系数都向0自然缩减,尤其是基本参数很小的时候。

There's more...扩展阅读

Like I mentioned earlier, there's also a Bayesian interpretation of lasso regression. Imagine we set priors over the coefficients; remember that they are random numbers themselves.

像我之前提到的,这里也有Lasso回归的一个贝叶斯的解,想象我们设置系数的初始值,记得他是自动随机数。

For lasso regression, we will choose a prior that naturally produces 0s, for example, the double exponential.

对于Lasso回归,我们将选择初始值自然生成,比如双重指数。

Notice the peak around 0. This will naturally lead to the zero coefficients in lasso regression.By tuning the hyperparameters, it's also possible to create 0 coefficients that more or less depend on the setup of the problem.

注意峰值在0附近,这将自然导致Lasso回归有0系数,通过调整超参数,可能或多或少的依据问题生成0系数。

原文链接:http://www.packtpub.com

原文作者:Trent Hauck

相关文章

  • Feature selection特征选择

    This recipe along with the two following it will be centered around automatic fe...

    到不了的都叫做远方
  • Poor man's grid search穷举网格搜索

    In this recipe, we're going to introduce grid search with basic Python, though w...

    到不了的都叫做远方
  • Using truncated SVD to reduce dimensionality使用截断奇异值进行降维

    Truncated Singular Value Decomposition (SVD) is a matrix factorization technique...

    到不了的都叫做远方
  • HDU 1506 Largest Rectangle in a Histogram(单调栈)

    attack
  • 周练19.11.03/10

    它表示一个迷宫,其中的1表示墙壁,0表示可以走的路,只能横着走或竖着走,不能斜着走,要求编程序找出从左上角到右下角的最短路线。

    AngelNH
  • ZOJ 3705 Applications

    Recently, the ACM/ICPC team of Marjar University decided to choose some new memb...

    ShenduCC
  • US oil prices turn negative as demand dries up

    The price of US oil has turned negative for the first time in history.

    仇诺伊
  • hdu 2473 Junk-Mail Filter (并查集之点的删除)

    Junk-Mail Filter Time Limit: 15000/8000 MS (Java/Others)    Memory Limit: 32768/...

    Gxjun
  • HDUOJ----(1031)Design T-Shirt

    Design T-Shirt Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 65536/327...

    Gxjun
  • Playrix Codescapes Cup (Codeforces Round #413, rated, Div. 1 + Div. 2)(A.暴力,B.优先队列,C.dp乱搞)

    A. Carrot Cakes time limit per test:1 second memory limit per test:256 megabytes...

    Angel_Kitty

扫码关注云+社区

领取腾讯云代金券