Using Gaussian processes for regression降维之高斯过程

到不了的都叫做远方

修改于 2020-04-20 15:07:28

9770

修改于 2020-04-20 15:07:28

文章被收录于专栏：翻译scikit-learn Cookbook翻译scikit-learn Cookbook

In this recipe, we'll use the Gaussian process for regression. In the linear models section,we saw how representing prior information on the coefficients was possible using Bayesian Ridge Regression.

在这个方法中，我们将用高斯过程来降维，在线性模型的章节，我们看到了怎样通过对系数使用贝叶斯岭回归代表原本的信息。

With a Gaussian process, it's about the variance and not the mean. However, with a Gaussian process, we assume the mean is 0, so it's the covariance function we'll need to specify.

使用高斯过程，这是关于方差而不是均值，在高斯过程中，我们假定均值为0，所以他的协方差函数需要被详细说明。

The basic setup is similar to how a prior can be put on the coefficients in a typical regression problem. With a GP, a prior can be put on the functional form of the data, and it's the covariance between the data points that is used to model the data, and therefore, must be fit from the data.

最基本的设置和之前用于协方差的典型线性问题很相似，使用GP，最开始可将数据放入模型表单，其中被用于模型的数据点间的协方差必须经过拟合。

Getting ready准备工作

So, let's use some regression data and walkthrough how Gaussian processes work in scikit-learn:所以我们用一些回归数据来了解scikit-learn中的高斯过程是如何工作的：

from sklearn.datasets import load_boston
boston = load_boston()
boston_X = boston.data
boston_y = boston.target
train_set = np.random.choice([True, False], len(boston_y),p=[.75, .25])

How to do it…怎么做

Now that we have the data, we'll create a scikit-learn GaussianProcess object. By default,it uses a constant regression function and squared exponential correlation, which is one of the more common choices:

现在我们有了数据，我们将生成一个scikit-learn中的GaussianProcess对象，根据默认参数，它会使用连续型回归函数和一个常见的选择：平方指数相关模型

>>> from sklearn.gaussian_process import GaussianProcess
>>> gp = GaussianProcess()
>>> gp.fit(boston_X[train_set], boston_y[train_set])
GaussianProcessRegressor(alpha=1e-10, copy_X_train=True, kernel=None,
                         n_restarts_optimizer=0, normalize_y=False,
                         optimizer='fmin_l_bfgs_b', random_state=None)

That's a formidable object definition. The following are a couple of things to point out:这是一个难定义的对象，以下列出一系列参数。

1、 beta0 : This is the regression weight. This defaults in a way such that MLE is used for estimation.

β ：这是回归权重，MLE（极大似然估计）用于估计时，使用该定义。

2、 corr : This is the correlation function. There are several built-in correlation functions.We'll look at more of them in the following How it works... section.

相关，这是相关函数，这有很多个内建的相关函数，以后章节我们将看到更多关于他的内容。

3、regr : This is the constant regression function.线性回归模式，这是连续型回归函数

4、nugget : This is the regularization parameter. It defaults to a very small number.You can either pass one value to be used for each data point or a single value that needs to be applied uniformly.

nugget这是正则化参数，它默认为一个很小的数，你可以把每一个数据点传入使用，或者是应用一个实用的单值。

5、normalize : This defaults to True , and it will center and scale the features.This would be scale is R.

normalize：默认是True，他会中心化和缩放特征值，将要被缩放R

Okay, so now that we fit the object, let's look at it's performance against the test object:

好了，现在我们拟合对象，让我们看一下他在测试对象上的表现。

test_preds = gp.predict(boston_X[~train_set])

Let's plot the predicted values versus the actual values; then, because we're doing regression,it's probably a good idea to look at plotted residuals and a histogram of the residuals:

让我们画出预测值与实际值，然后因为我们做了回归处理，我们最好看一下它的残差和残差的直方图。

>>> from matplotlib import pyplot as plt
>>> f, ax = plt.subplots(figsize=(10, 7), nrows=3)
>>> f.tight_layout()
>>> ax[0].plot(range(len(test_preds)), test_preds,label='Predicted Values')
>>> ax[0].plot(range(len(test_preds)), boston_y[~train_set],label='Actual Values')
>>> ax[0].set_title("Predicted vs Actuals")
>>> ax[0].legend(loc='best')
>>> ax[1].plot(range(len(test_preds)),test_preds - boston_y[~train_set])
>>> ax[1].set_title("Plotted Residuals")
>>> ax[2].hist(test_preds - boston_y[~train_set]);
>>> ax[2].set_title("Histogram of Residuals")

The output is as follows:输出如下：

How it works…它是如何工作的

Now that we've worked through a very quick example, let's look a little more at what some of the parameters do and how we can tune them based on the model we're trying to fit.

现在我们了解了一个简洁的例子，让我们多了解写其中的参数做了什么和我们如何基于我们想要拟合的模型来调试他们。

First, let's try to understand what's going on with the corr function. This function describes the relationship between the different pairs of X. The following five different correlation functions are offered by scikit-learn:

首先，让我们试着理解相关系数函数将要做什么，这个函数描述了不同成对的X之间的的关系，下面有scikit-learn提供的吴忠不同的相关性函数。

1、 absolute_exponential 绝对指数

2、 squared_exponential 平方指数

3、generalized_exponential 广义指数

4、 cubic 立方

5、 linear 线型

For example, the squared exponential has the following form:例如：平方指数为以下形式：

Linear, on the other hand, is just the dot product of the two points in question:线型，就是一种两点集合的点乘

Another parameter of interest is theta0 . This represents the starting point in the estimation of the the parameters.

其他有意思的参数是theta0,它代表估计参数的起始点

Once we have an estimation of K and the mean, the process is fully specified due to it being a Gaussian process; emphasis is put on Gaussian, a reason it's so popular for general machine learning work.

一旦我们有一个估计值K和均值，这一步将确定是一个高斯过程，估计值就可以放入高斯模型，这就是它能在机器学习模型中被广泛使用的原因。

Let's use a different regr function, apply a different theta0 , and look at how the predictions differ:让我们用一个不同的线性回归函数，应用一个不同的theta0，然后看看预测值的不同：

gp = GaussianProcess(regr='linear', theta0=5e-1)
gp.fit(boston_X[train_set], boston_y[train_set]);
linear_preds = gp.predict(boston_X[~train_set])
f, ax = plt.subplots(figsize=(7, 5))

Let's have a look at the fit:让我们看看拟合情况：

>>> f.tight_layout()

>>> ax.hist(test_preds - boston_y[~train_set],label='Residuals Original', color='b', alpha=.5);

>>> ax.hist(linear_preds - boston_y[~train_set],label='Residuals Linear', color='r', alpha=.5);

>>> ax.set_title("Residuals")

>>> ax.legend(loc='best')

The following is the output:以下是输出情况：

Clearly, the second model's predictions are slightly better for the most part. If we want to sum this up, we can look at the MSE of the predictions:

清楚的看到：第二个模型的预测能力总体稍好一些，如果我们想要总结下，我们要看一下预测值的MSE：

>>> np.power(test_preds - boston_y[~train_set], 2).mean()

26.254844099612455

>>> np.power(linear_preds - boston_y[~train_set], 2).mean()

21.938924337056068

There's more…扩展阅读

We might want to understand the uncertainty in our estimates. When we make the predictions, if we pass the eval_MSE argument as True , we'll get MSE and the predicted values. From a mechanics standpoint, a tuple of predictions and MSE is returned:

我们可能想要理解我们估计值的不确定性，当我们做了预测，当我们输入eval_MSE参数是True，我们将得到MSE和预测值，从机械观点来看，返回一组预测值和MSE

>>> test_preds, MSE = gp.predict(boston_X[~train_set], eval_MSE=True)

>>> MSE[:5]

array([ 11.95314572, 8.48397825, 6.0287539 , 29.20844347,0.36427829])

So, now that we have errors in the estimates (unfortunately), let's plot the first few to get an indication of accuracy:

所以现在我们的估计有误差，然我们画出一部分来指示准确度：

>>> f, ax = plt.subplots(figsize=(7, 5))

>>> n = 20

>>> rng = range(n)

>>> ax.scatter(rng, test_preds[:n])

>>> ax.errorbar(rng, test_preds[:n], yerr=1.96*MSE[:n])

>>> ax.set_title("Predictions with Error Bars")