首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >mlr3、基准测试和嵌套重采样:如何从基准对象中提取调优模型以计算特性重要性

mlr3、基准测试和嵌套重采样:如何从基准对象中提取调优模型以计算特性重要性
EN

Stack Overflow用户
提问于 2021-11-03 15:40:39
回答 1查看 449关注 1票数 0

我正在使用mlr3中的benchmark()函数来比较几种ML算法。其中之一是带有超参数调优的XGB。因此,我有一个外部重采样来评估总体性能(保持样本)和内部重采样用于超参数调整(5倍交叉验证)。除了对所有ML算法的准确性进行估计外,我还希望看到调优的XGB特性的重要性。为此,我必须访问调优模型(在基准测试对象中)。我不知道怎么做。benchmark()返回的对象是一个深嵌套的列表,我不理解它的结构。

这个关于堆叠溢出的答案没有帮助我,因为它使用的是不同的设置(在管道中学习,而不是基准对象)。

这个关于github的答复没有帮助我,因为它展示了如何一次提取关于基准测试的所有信息,而不是如何提取基准测试中的一个学习者的一个(调优)模型。

下面是我用来执行嵌套重采样的代码。在基准测试之后,我想按照描述的这里估计特性的重要性,这需要访问调优的XGB模型。

代码语言:javascript
运行
复制
require(mlr3verse)

### Parameters

## Tuning

n_folds = 5

grid_search_resolution = 2

measure = msr("classif.acc")

task = tsk("iris")

# Messages mlr3
# https://stackoverflow.com/a/69336802/7219311
options("mlr3.debug" = TRUE)

### Set up hyperparameter tuning
# AutoTuner for the inner resampling

## inner-resampling design
inner_resampling = rsmp("cv", folds = n_folds)
terminator = trm("none")
 
## XGB: no Hyperparameter Tuning
xgb_no_tuning = lrn("classif.xgboost", eval_metric = "mlogloss")
set_threads(xgb_no_tuning, n = 6)

## XGB: AutoTuner
# Setting up Hyperparameter Tuning

xgb_learner_tuning = lrn("classif.xgboost", eval_metric = "mlogloss")
xgb_search_space = ps(nrounds = p_int(lower = 100, upper= 500),
                      max_depth = p_int(lower = 3, upper= 10),
                      colsample_bytree = p_dbl(lower = 0.6, upper = 1)
                  )
xgb_tuner = tnr("grid_search", resolution = grid_search_resolution)

# implicit parallelisation
set_threads(xgb_learner_tuning, n = 6)

xgb_tuned = AutoTuner$new(xgb_learner_tuning, inner_resampling, measure, terminator, xgb_tuner, xgb_search_space, store_tuning_instance = TRUE)

## Outer re-sampling: hold-out
outer_resampling = rsmp("holdout")
outer_resampling$instantiate(task)

bm_design = benchmark_grid(
  tasks = task,
  learners = c(lrn("classif.featureless"), 
               xgb_no_tuning,
               xgb_tuned 
  ),
  resamplings = outer_resampling
)

begin_time = Sys.time()
bmr = benchmark(bm_design, store_models = TRUE)
duration = Sys.time() - begin_time

print(duration)

## Results of benchmarking
benchmark_results = bmr$aggregate(measure)
print(benchmark_results)


## Overview

mlr3misc::map(as.data.table(bmr)$learner, "model")

## Detailed results

# Specification of learners
print(bmr$learners$learner)

溶液

基于贝-马克的评论

代码语言:javascript
运行
复制
require(mlr3verse)
require(mlr3tuning)
require(mlr3misc)

### Parameters

## Tuning

n_folds = 5

grid_search_resolution = 2

measure = msr("classif.acc")

task = tsk("iris")

# Messages mlr3
# https://stackoverflow.com/a/69336802/7219311
options("mlr3.debug" = TRUE)

### Set up hyperparameter tuning
# AutoTuner for the inner resampling

## inner-resampling design
inner_resampling = rsmp("cv", folds = n_folds)
terminator = trm("none")
 
## XGB: no Hyperparameter Tuning
xgb_no_tuning = lrn("classif.xgboost", eval_metric = "mlogloss")
set_threads(xgb_no_tuning, n = 6)

## XGB: AutoTuner
# Setting up Hyperparameter Tuning

xgb_learner_tuning = lrn("classif.xgboost", eval_metric = "mlogloss")
xgb_search_space = ps(nrounds = p_int(lower = 100, upper= 500),
                      max_depth = p_int(lower = 3, upper= 10),
                      colsample_bytree = p_dbl(lower = 0.6, upper = 1)
                  )
xgb_tuner = tnr("grid_search", resolution = grid_search_resolution)

# implicit parallelisation
set_threads(xgb_learner_tuning, n = 6)

xgb_tuned = AutoTuner$new(xgb_learner_tuning, inner_resampling, measure, terminator, xgb_tuner, xgb_search_space, store_tuning_instance = TRUE)

## Outer re-sampling: hold-out
outer_resampling = rsmp("holdout")
outer_resampling$instantiate(task)

bm_design = benchmark_grid(
  tasks = task,
  learners = c(lrn("classif.featureless"), 
               xgb_no_tuning,
               xgb_tuned 
  ),
  resamplings = outer_resampling
)

begin_time = Sys.time()
bmr = benchmark(bm_design, store_models = TRUE)
duration = Sys.time() - begin_time

print(duration)

## Results of benchmarking
benchmark_results = bmr$aggregate(measure)
print(benchmark_results)


## Overview

mlr3misc::map(as.data.table(bmr)$learner, "model")

## Detailed results

# Specification of learners
print(bmr$learners$learner)

## Feature Importance

# extract models from outer sampling
# https://stackoverflow.com/a/69828801

data = as.data.table(bmr)
outer_learners = map(data$learner, "learner")

xgb_tuned_model = outer_learners[[3]]

print(xgb_tuned_model)

# print feature importance 
# (presumably gain - mlr3 documentation not clear)
print(xgb_tuned_model$importance())
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-11-03 16:54:55

代码语言:javascript
运行
复制
library(mlr3tuning)
library(mlr3learners)
library(mlr3misc)

learner = lrn("classif.xgboost", nrounds = to_tune(100, 500), eval_metric = "logloss")

at = AutoTuner$new(
  learner = learner,
  resampling = rsmp("cv", folds = 3),
  measure = msr("classif.ce"),
  terminator = trm("evals", n_evals = 5),
  tuner = tnr("random_search"),
  store_models = TRUE
)

design = benchmark_grid(task = tsk("pima"), learner = at, resampling = rsmp("cv", folds = 5))
bmr = benchmark(design, store_models = TRUE)

提取适合外循环的学习者

代码语言:javascript
运行
复制
data = as.data.table(bmr)
outer_learners = map(data$learner, "learner")

提取内循环中的学习者

代码语言:javascript
运行
复制
archives = extract_inner_tuning_archives(bmr)
inner_learners = map(archives$resample_result, "learners")
票数 3
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/69827716

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档