第 07 课：XGBoost 超参数调整

PM小王

发布于 2019-07-01 11:50:00

2K0

发布于 2019-07-01 11:50:00

文章被收录于专栏：程序员小王

前文回顾：

scikit-learn 框架提供了搜索参数组合的功能。此功能在 GridSearchCV 类中提供，可用于发现配置模型以获得最佳表现的最佳方法。例如，我们可以定义一个树的数量（n_estimators）和树大小（max_depth）的网格，通过将网格定义为：

1n_estimators = [50, 100, 150, 200]
2max_depth = [2, 4, 6, 8]
3param_grid = dict(max_depth=max_depth, n_estimators=n_estimators)

然后使用 10 倍交叉验证评估每个参数组合：

1kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=7)
2grid_search = GridSearchCV(model, param_grid, scoring="neg_log_loss", n_jobs=-1, cv=kfold, verbose=1)
3result = grid_search.fit(X, label_encoded_y)

然后，我们可以查看结果，以确定最佳组合以及改变参数组合的一般趋势。这是将 XGBoost 应用于您自己的问题时的最佳做法。要考虑调整的参数是：

树木的数量和大小（ n_estimators 和 max_depth ）。
学习率和树木数量（ learning_rate 和 n_estimators ）。
行和列子采样率（子样本，colsample_bytree和colsample_bylevel ）。

下面是调整 Pima Indians Onset of Diabetes 数据集中 learning_rate 的完整示例。

 1# Tune learning_rate
 2from numpy import loadtxt
 3from xgboost import XGBClassifier
 4from sklearn.model_selection import GridSearchCV
 5from sklearn.model_selection import StratifiedKFold
 6# load data
 7dataset = loadtxt('pima-indians-diabetes.csv', delimiter=",")
 8# split data into X and y
 9X = dataset[:,0:8]
10Y = dataset[:,8]
11# grid search
12model = XGBClassifier()
13learning_rate = [0.0001, 0.001, 0.01, 0.1, 0.2, 0.3]
14param_grid = dict(learning_rate=learning_rate)
15kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=7)
16grid_search = GridSearchCV(model, param_grid, scoring="neg_log_loss", n_jobs=-1, cv=kfold)
17grid_result = grid_search.fit(X, Y)
18# summarize results
19print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
20means = grid_result.cv_results_['mean_test_score']
21stds = grid_result.cv_results_['std_test_score']
22params = grid_result.cv_results_['params']
23for mean, stdev, param in zip(means, stds, params):
24    print("%f (%f) with: %r" % (mean, stdev, param))

课程已经全部更新完成，花点时间回顾一下你走了多远：

您了解了梯度提升算法和 XGBoost 库。
您开发了第一个 XGBoost 模型。
您学习了如何使用早期停止和功能重要性等高级功能。
您学习了如何配置梯度提升模型以及如何设计受控实验来调整 XGBoost 超参数。

不要轻视这一点，你在很短的时间内走了很长的路。这只是您在 Python 中使用 XGBoost 的旅程的开始。继续练习和发展你的技能。

第 07 课：XGBoost 超参数调整

第 07 课：XGBoost 超参数调整

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐