我试图实现一些函数来比较五种不同的机器学习模型来预测回归问题中的一些值。
我的意图是做一套功能,可以训练不同的代码,并将它们组织成一系列的结果。通过实例选择的模型有: Lasso模型、随机森林模型、SVM模型、线性模型和神经网络模型。为了调优某些模型,我打算使用Max:https://topepo.github.io/caret/available-models.html的引用。但是,由于每个模型都需要不同的调优参数,所以我怀疑如何设置它们:
首先,我将网格设置为“nnet”模型优化。在这里,我选择了隐藏层中不同数量的节点和衰减系数:
my.grid <- expand.grid(size=seq(from = 1, to = 10, by = 1), decay = seq(from = 0.1, to = 0.5, by = 0.1))
然后,我构建了在六重配置中运行五种模型5次的函数:
my_list_model <- function(model) {
set.seed(1)
train.control <- trainControl(method = "repeatedcv",
number = 6,
repeats = 5,
returnResamp = "all",
savePredictions = "all")
# The tunning configurations of machine learning models:
set.seed(1)
fit_m <- train(ST1 ~.,
data = train, # my original dataframe, not showed in this code
method = model,
metric = "RMSE",
preProcess = "scale",
trControl = train.control
linout = 1 # linear activation function output
trace = FALSE
maxit = 1000
tuneGrid = my.grid) # Here is how I call the tune of 'nnet' parameters
return(fit_m)
}
最后,我执行以下五种模式:
lapply(list(
Lass = "lasso",
RF = "rf",
SVM = "svmLinear",
OLS = "lm",
NN = "nnet"),
my_list_model) -> model_list
但是,当我运行这个程序时,它会显示:
错误:调优参数网格不应包含列部分。
据我所知,我不知道如何很好地指定调参数。如果我试图丢弃'nnet‘模型,并将其更改为XGBoost模型,在倒数第二行中,它似乎工作良好,结果将得到计算。也就是说,问题似乎在于“nnet”调优参数。
然后,我想我真正的问题是:如何配置这些不同的模型参数,特别是“nnet”模型。此外,由于我不需要建立拉索,随机森林,svmLinear和线性模型的参数,它们是如何通过插入包进行调整的?
发布于 2019-07-04 09:29:22
my_list_model <- function(model,grd=NULL){
train.control <- trainControl(method = "repeatedcv",
number = 6,
returnResamp = "all",
savePredictions = "all")
# The tuning configurations of machine learning models:
set.seed(1)
fit_m <- train(Y ~.,
data = df, # my original dataframe, not showed in this code
method = model,
metric = "RMSE",
preProcess = "scale",
trControl = train.control,
linout = 1, # linear activation function output
trace = FALSE,
maxit = 1000,
tuneGrid = grd) # Here is how I call the tune of 'nnet' parameters
return(fit_m)
}
首先在代码下面运行并查看所有相关参数
modelLookup('rf')
现在,根据上面的查找代码对所有模型进行网格化。
svmGrid <- expand.grid(C=c(3,2,1))
rfGrid <- expand.grid(mtry=c(5,10,15))
创建所有模型网格的列表,并确保模型名称与列表中的名称相同
grd_all<-list(svmLinear=svmGrid
,rf=rfGrid)
model_list<-lapply(c("rf","svmLinear")
,function(x){my_list_model(x,grd_all[[x]])})
model_list
[[1]]
Random Forest
17 samples
3 predictor
Pre-processing: scaled (3)
Resampling: Cross-Validated (6 fold, repeated 1 times)
Summary of sample sizes: 14, 14, 15, 14, 14, 14, ...
Resampling results across tuning parameters:
mtry RMSE Rsquared MAE
5 63.54864 0.5247415 55.72074
10 63.70247 0.5255311 55.35263
15 62.13805 0.5765130 54.53411
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was mtry = 15.
[[2]]
Support Vector Machines with Linear Kernel
17 samples
3 predictor
Pre-processing: scaled (3)
Resampling: Cross-Validated (6 fold, repeated 1 times)
Summary of sample sizes: 14, 14, 15, 14, 14, 14, ...
Resampling results across tuning parameters:
C RMSE Rsquared MAE
1 59.83309 0.5879396 52.26890
2 66.45247 0.5621379 58.74603
3 67.28742 0.5576000 59.55334
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was C = 1.
https://stackoverflow.com/questions/54637027
复制相似问题