我用R编程,我正在尝试确定我想要运行的xgboost模型的最佳超参数。我有一个包含约700个变量的数据集(一些数字,其他一个编码)和约25,000个观测值。我试图预测每个观察结果是大(预测= 1)还是小(预测= 0)。问题是,当我运行xgb.cv
的功能,train-error
并test-error
不会在每次迭代后改变。下面是我的代码和随后的打印输出。谁都可以解释为什么错误保持不变?非常感谢!
具体的R代码:
dtrain <- xgb.DMatrix(data = pred[train,], label = resp[train])
xgb.cv(data = dtrain,
params = list(objective = "binary:logistic",
eta = 0.01,
max_depth = 10,
min_child_weight = 20,
colsample_bytree = 0.2),
nfold = 5,
nrounds = 100,
verbose = TRUE,
early_stopping_rounds = 8,
maximize = FALSE)
控制台打印输出:
[1] train-error:0.014422+0.000491 test-error:0.014422+0.001965
Multiple eval metrics are present. Will use test_error for early stopping.
Will train until test_error hasn't improved in 8 rounds.
[2] train-error:0.014422+0.000491 test-error:0.014422+0.001965
[3] train-error:0.014422+0.000491 test-error:0.014422+0.001965
[4] train-error:0.014422+0.000491 test-error:0.014422+0.001965
[5] train-error:0.014422+0.000491 test-error:0.014422+0.001965
[6] train-error:0.014422+0.000491 test-error:0.014422+0.001965
[7] train-error:0.014422+0.000491 test-error:0.014422+0.001965
[8] train-error:0.014422+0.000491 test-error:0.014422+0.001965
[9] train-error:0.014422+0.000491 test-error:0.014422+0.001965
Stopping. Best iteration:
[1] train-error:0.014422+0.000491 test-error:0.014422+0.001965
再次感谢你的帮助!
发布于 2018-12-12 14:47:42
您必须在params列表中添加多个值!使用c()
params = list(objective = "binary:logistic",
eta = c(0.01, 0.05, 0.1, 0.5, 1),
max_depth = 10,
min_child_weight = 20,
colsample_bytree = c(0.1, 0.2, 0.5, 1))
注意 - 您也可以找到与插入符号和mlr包最佳的音调
library(caret)
modelLookup(model = "xgbTree")
# model parameter label forReg forClass probModel
# 1 xgbTree nrounds # Boosting Iterations TRUE TRUE TRUE
# 2 xgbTree max_depth Max Tree Depth TRUE TRUE TRUE
# 3 xgbTree eta Shrinkage TRUE TRUE TRUE
# 4 xgbTree gamma Minimum Loss Reduction TRUE TRUE TRUE
# 5 xgbTree colsample_bytree Subsample Ratio of Columns TRUE TRUE TRUE
# 6 xgbTree min_child_weight Minimum Sum of Instance Weight TRUE TRUE TRUE
# 7 xgbTree subsample Subsample Percentage TRUE TRUE TRUE
# Time computing is very long
tuneGrid <- expand.grid(nrounds = 1000,
max_depth = c(2:14),
eta = c(0.01:0.1),
gamma = c(0:1),
colsample_bytree = c(0:1),
min_child_weight = c(0:1),
subsample = c(0:1))
set.seed(1)
model <-train(form = factor(categ) ~ .,
data = dtrain,
method = "xgbTree",
verbose = TRUE,
metric = "Accuracy",
nthread = 3,
tuneGrid = tuneGrid)
# NB : categ is your categorical feature
MLR :
library(mlr)
lrn <- makeLearner(cl = "classif.xgboost", nrounds=10)
# Get all learner's parameters
getParamSet(x = lrn) # or used lrn$par.set
tsk <- makeClassifTask(data = dtrain, target = "categ")
ps <- makeParamSet(makeNumericParam(id = "eta", lower = 0, upper = 1),
makeNumericParam(id = "lambda", lower = 0, upper = 200),
makeIntegerParam(id = "max_depth", lower = 1, upper = 20))
control <- makeTuneControlMBO(budget = 100)
cv10 <- makeResampleDesc(method = "CV", iters = 10, stratify = TRUE)
# Optimal parameters
set.seed(1)
tr <- tuneParams(learner = lrn,
task = tsk,
resampling = cv10,
measures = acc,
par.set = ps,
control = control)
# Remplace by optimazied paramaters
lrn <- setHyperPars(learner = lrn, par.vals = tr$x)
# Evaluate performance
model <- mlr::train(learner = lrn, task = tsk)
model$learner.model
https://stackoverflow.com/questions/-100006265
复制相似问题