我正在使用glm进行5重交叉验证,以执行逻辑回归。以下是使用内置cars数据集的可重现示例
library(caret)
data("mtcars")
str(mtcars)
mtcars$vs<-as.factor(mtcars$vs)
df0<-na.omit(mtcars)
set.seed(123)
train.control <- trainControl(method = "cv", number = 5)
# Train the model
model <- train(vs ~., data = mtcars, method = "glm",
trControl = train.control)
print(model)
summary(model)
model$resample
confusionMatrix(model)
pred.mod <- predict(model)
confusionMatrix(data=pred.mod, reference=mtcars$vs)
输出
> print(model)
Generalized Linear Model
32 samples
10 predictors
2 classes: '0', '1'
No pre-processing
Resampling: Cross-Validated (5 fold)
Summary of sample sizes: 25, 26, 25, 27, 25
Resampling results:
Accuracy Kappa
0.9095238 0.8164638
> summary(model)
Call:
NULL
Deviance Residuals:
Min 1Q Median 3Q Max
-1.181e-05 -2.110e-08 -2.110e-08 2.110e-08 1.181e-05
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 8.117e+01 1.589e+07 0 1
mpg 2.451e+00 5.979e+04 0 1
cyl -3.908e+01 2.947e+05 0 1
disp -1.927e-02 8.518e+03 0 1
hp 3.129e-01 2.283e+04 0 1
drat -2.735e+01 9.696e+05 0 1
wt -1.248e+01 6.437e+05 0 1
qsec 1.565e+01 3.845e+05 0 1
am -4.562e+01 3.632e+05 0 1
gear -2.835e+01 5.448e+05 0 1
carb 1.788e+01 2.971e+05 0 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 4.3860e+01 on 31 degrees of freedom
Residual deviance: 7.2154e-10 on 21 degrees of freedom
AIC: 22
Number of Fisher Scoring iterations: 25
> model$resample
Accuracy Kappa Resample
1 0.8571429 0.6956522 Fold1
2 0.8333333 0.6666667 Fold2
3 0.8571429 0.7200000 Fold3
4 1.0000000 1.0000000 Fold4
5 1.0000000 1.0000000 Fold5
> confusionMatrix(model)
Cross-Validated (5 fold) Confusion Matrix
(entries are percentual average cell counts across resamples)
Reference
Prediction 0 1
0 50.0 3.1
1 6.2 40.6
Accuracy (average) : 0.9062
> pred.mod <- predict(model)
> confusionMatrix(data=pred.mod, reference=mtcars$vs)
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 18 0
1 0 14
Accuracy : 1
95% CI : (0.8911, 1)
No Information Rate : 0.5625
P-Value [Acc > NIR] : 1.009e-08
Kappa : 1
Mcnemar's Test P-Value : NA
Sensitivity : 1.0000
Specificity : 1.0000
Pos Pred Value : 1.0000
Neg Pred Value : 1.0000
Prevalence : 0.5625
Detection Rate : 0.5625
Detection Prevalence : 0.5625
Balanced Accuracy : 1.0000
'Positive' Class : 0
这一切都很好,但我希望获得每个折叠层的摘要(模型)信息(即执行summary()时获得的系数、p值、z分数等),以及每个折叠层的敏感性和特异性(如果可能的话)。有人能帮帮忙吗?
发布于 2021-04-21 17:21:32
是一个有趣的问题。您要查找的值不能直接从model
对象获得,但可以通过知道哪些训练数据的观测值属于哪个文件夹来重新计算。如果在model
函数中指定savePredictions = "all"
,则可以从trainControl
中提取此信息。有了每个k倍的预测,你可以这样做:
#first of all, save all predictions from all folds
set.seed(123)
train.control <- trainControl(method = "cv", number = 5,savePredictions =
"all")
# Train the model
model <- train(vs ~., data = mtcars, method = "glm",
trControl = train.control)
#now we can extract the statistics you are looking for
fold <- unique(pred$Resample)
mystat <- function(model,x){
pred <- model$pred
df <- pred[pred$Resample==x,]
cm <- confusionMatrix(df$pred,df$obs)
control <- trainControl(method = "none")
newdat <- mtcars[pred$rowIndex,]
fit <- train(vs~.,data=newdat,trControl=control)
summ <- summary(model)
z_p <- summ$coefficients[,3:4]
return(list(cm,z_p))
}
stat <- lapply(fold, mystat,model=model)
names(stat) <- fold
请注意,通过在trainControl
中指定method="none"
,强制train
将模型拟合到整个训练集,而无需任何重采样或参数调整。在这种形式下,它不是一个漂亮的函数,但它可以做你想要的,而且你总是可以修改它,使它更通用。
https://stackoverflow.com/questions/67167865
复制相似问题