Automatic cross validation自动交叉验证

到不了的都叫做远方

修改于 2020-05-06 11:46:24

6230

修改于 2020-05-06 11:46:24

文章被收录于专栏：翻译scikit-learn Cookbook翻译scikit-learn Cookbook

We've looked at the using cross validation iterators that scikit-learn comes with, but we can also use a helper function to perform cross validation for use automatically. This is similar to how other objects in scikit-learn are wrapped by helper functions, pipeline for instance.

我们已经理解了使用scikit-learn自带的交叉验证迭代器，但是我们也能使用一个辅助函数来自动的展示交叉验证。这和scikit-learn的其他通过辅助函数如何被封装的对象很相似，例如pipeline。

Getting ready准备工作

First, we'll need to create a sample classifier; this can really be anything, a decision tree,a random forest, whatever. For us, it'll be a random forest. We'll then create a dataset and use the cross validation functions.

首先我们需要生成一个简单的分类器，这能是任何形式，一个决策树，随机森林，什么都可以，对我们的例子来说，这是个随机森林，我们将创建一个数据集并且使用交叉验证函数。

How to do it...怎么做

First import the ensemble module and we'll get started:首先导入ensemble，然后可以开始

from sklearn import ensemble
rf = ensemble.RandomForestRegressor(max_features='auto')

Okay, so now, let's create some regression data:好，现在我们可以生成一些回归数据：

from sklearn import datasets
X, y = datasets.make_regression(10000, 10)

Now that we have the data, we can import the cross_validation module and get access to the functions we'll use:

现在我们有了数据，我们导入cross_validation模型并且生成我们需要用的函数

from sklearn.model_selection import cross_val_score
scores = cross_val_score(rf, X, y.astype('int'))
print(scores)
[ 0.86823874 0.86763225 0.86986129]

How it works...如何运行的

For the most part, this will delegate to the cross validation objects. One nice thing is that, the function will handle performing the cross validation in parallel.We can activate verbose mode play by play:

在大部分时候，这将代表交叉验证工程。好消息是，函数将在面板中控制交叉验证的表现。我们能够通过操作激活冗余模型

>>> scores = cross_validation.cross_val_score(rf, X, y, verbose=3,cv=4)
[CV] no parameters to be set
[CV] no parameters to be set, score=0.872866 - 0.7s
[CV] no parameters to be set
[CV] no parameters to be set, score=0.873679 - 0.6s
[CV] no parameters to be set
[CV] no parameters to be set, score=0.878018 - 0.7s
[CV] no parameters to be set
[CV] no parameters to be set, score=0.871598 - 0.6s
[Parallel(n_jobs=1)]: Done 1 jobs | elapsed: 0.7s
[Parallel(n_jobs=1)]: Done 4 out of 4 | elapsed: 2.6s finished

As we can see, during each iteration, we scored the function. We also get an idea of how long the model runs.

如你所见，在每一次迭代，我们给函数评分，我们也能够得到模型运行时长的依据。

It's also worth knowing that we can score our function predicated on which kind of model we're trying to fit. In other recipes, we've discussed how to create your own scoring function.

同样值得了解的是我们能够基于我们尝试拟合的各种模型来给我们的函数打分。在其他章节，我们已经讨论过如何生成我们自己的打分模型。

本文系外文翻译，前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

spring