这个词语,翻译是基准测试,一般用于比较多任务、多学习器、不同抽样方案的表现,在mlr3中通过benchmar函数实现
在mlr3中进行基准测试,需要提供一个设计,这个设计是由任务、学习器、采样方式形成的矩阵
这里的例子设计一个单任务,2个学习器和一个采样方法的设计
通过benchmark_grid
函数进行组合
library("data.table")
library("mlr3")
design = benchmark_grid(
tasks = tsk("iris"),
learners = list(lrn("classif.rpart"), lrn("classif.featureless")),
resamplings = rsmp("holdout")
)
print(design)
## task learner resampling
## 1: <TaskClassif[45]> <LearnerClassifRpart[34]> <ResamplingHoldout[19]>
## 2: <TaskClassif[45]> <LearnerClassifFeatureless[34]> <ResamplingHoldout[19]>
# 通过benchmark创建
bmr = benchmark(design)
创建一个复杂的设计
# 获得多个task
tasks = lapply(c("german_credit", "sonar"), tsk)
# 多个学习器
library("mlr3learners")
learners = c("classif.featureless", "classif.rpart", "classif.ranger", "classif.kknn")
# 预测概率
learners = lapply(learners, lrn,
predict_type = "prob", predict_sets = c("train", "test"))
# 使用3折交叉进行重抽样
resamplings = rsmp("cv", folds = 3)
# 建立benchmarking设计
design = benchmark_grid(tasks, learners, resamplings)
print(design)
结果
## task learner resampling
## 1: <TaskClassif[45]> <LearnerClassifFeatureless[34]> <ResamplingCV[19]>
## 2: <TaskClassif[45]> <LearnerClassifRpart[34]> <ResamplingCV[19]>
## 3: <TaskClassif[45]> <LearnerClassifRanger[34]> <ResamplingCV[19]>
## 4: <TaskClassif[45]> <LearnerClassifKKNN[32]> <ResamplingCV[19]>
## 5: <TaskClassif[45]> <LearnerClassifFeatureless[34]> <ResamplingCV[19]>
## 6: <TaskClassif[45]> <LearnerClassifRpart[34]> <ResamplingCV[19]>
## 7: <TaskClassif[45]> <LearnerClassifRanger[34]> <ResamplingCV[19]>
## 8: <TaskClassif[45]> <LearnerClassifKKNN[32]> <ResamplingCV[19]>
建立设计之后,通过benchmark
执行bench
bmr = benchmark(design)
这里我们并没有对抽样方案进行实例化,因此,这里默认给每一个任务进行一次抽样
当benchmark运行结束之后,使用aggregate
函数进行合并结果
# 使用auc进行评估
measures = list(
msr("classif.auc", id = "auc_train", predict_sets = "train"),
msr("classif.auc", id = "auc_test")
)
tab = bmr$aggregate(measures)
print(tab)
## nr resample_result task_id learner_id resampling_id
## 1: 1 <ResampleResult[21]> german_credit classif.featureless cv
## 2: 2 <ResampleResult[21]> german_credit classif.rpart cv
## 3: 3 <ResampleResult[21]> german_credit classif.ranger cv
## 4: 4 <ResampleResult[21]> german_credit classif.kknn cv
## 5: 5 <ResampleResult[21]> sonar classif.featureless cv
## 6: 6 <ResampleResult[21]> sonar classif.rpart cv
## 7: 7 <ResampleResult[21]> sonar classif.ranger cv
## 8: 8 <ResampleResult[21]> sonar classif.kknn cv
## iters auc_train auc_test
## 1: 3 0.5000 0.5000
## 2: 3 0.8042 0.7062
## 3: 3 0.9985 0.7961
## 4: 3 0.9887 0.7179
## 5: 3 0.5000 0.5000
## 6: 3 0.9343 0.7455
## 7: 3 1.0000 0.8946
## 8: 3 0.9973 0.9265
筛选每个任务的最佳学习器,这里使用到data.table里的一些语法,同时对auc取负值,已便比较
ranks = tab[, .(learner_id, rank_train = rank(-auc_train), rank_test = rank(-auc_test)), by = task_id]
print(ranks)
## task_id learner_id rank_train rank_test
## 1: german_credit classif.featureless 4 4
## 2: german_credit classif.rpart 3 3
## 3: german_credit classif.ranger 1 1
## 4: german_credit classif.kknn 2 2
## 5: sonar classif.featureless 4 4
## 6: sonar classif.rpart 3 3
## 7: sonar classif.ranger 1 2
## 8: sonar classif.kknn 2 1
和之前一致,使用mlr3viz
包
library("mlr3viz")
library("ggplot2")
autoplot(bmr) + theme(axis.text.x = element_text(angle = 45, hjust = 1))
对单个任务进行绘制roc曲线
autoplot(bmr$clone()$filter(task_id = "german_credit"), type = "roc")
本质上和之前的代码没什么区别 不过,需要学习data.table的语法
tab = bmr$aggregate(measures)
rr = tab[task_id == "german_credit" & learner_id == "classif.ranger"]$resample_result[[1]]
print(rr)
## <ResampleResult> of 3 iterations
## * Task: german_credit
## * Learner: classif.ranger
## * Warnings: 0 in 0 iterations
## * Errors: 0 in 0 iterations
上述的rr为提取的单个任务和单个学习器的一次抽样 通过下述代码查看
measure = msr("classif.auc")
rr$aggregate(measure)
## classif.auc
## 0.7961
# 获得迭代的最小auc
perf = rr$score(measure)
i = which.min(perf$classif.auc)
# 获得此次迭代相应的学习器和数据集
print(rr$learners[[i]])
## <LearnerClassifRanger:classif.ranger>
## * Model: -
## * Parameters: list()
## * Packages: ranger
## * Predict Type: prob
## * Feature types: logical, integer, numeric, character, factor, ordered
## * Properties: importance, multiclass, oob_error, twoclass, weights
# 获得相应的训练集
head(rr$resampling$train_set(i))
## [1] 3 7 12 16 18 22
as_benchmark_result()
:将单个抽样合并为benchmarking
task = tsk("iris")
resampling = rsmp("holdout")$instantiate(task)
# 第一个重抽样
rr1 = resample(task, lrn("classif.rpart"), resampling)
# 第二个重抽样
rr2 = resample(task, lrn("classif.featureless"), resampling)
# 将两个重抽样结果转换为benchmarking
bmr1 = as_benchmark_result(rr1)
bmr2 = as_benchmark_result(rr2)
# 合并两个结果
bmr1$combine(bmr2)
bmr1
## <BenchmarkResult> of 2 rows with 2 resampling runs
## nr task_id learner_id resampling_id iters warnings errors
## 1 iris classif.rpart holdout 1 0 0
## 2 iris classif.featureless holdout 1 0 0
mlr3基础更新完毕
love&peace