caret 是一个用于创建机器学习工作流的一般包,方法使用如下:
library(caret)
library(kernlab)
data(iris)
head(iris)
使用createDataPartition创建一系列测试/训练分区。输入为预测结果的一列,和用于训练的数据集比例,一般用四分之三来训练,四分之一来测试,即p = 0.75。
intrain<-createDataPartition(iris$Species,p = 0.75,list = F)
head(intrain)
training<-iris[intrain,]
testing<-iris[-intrain,]
选用特定训练模型
modelFit<-train(Species~.,data=training,method = "pls",preProc = c("center", "scale"))
modelFit$finalModel
使用模型对测试数据集进行预测
predictions <- predict(modelFit,newdata=testing)
predictions
confusionMatrix(predictions,testing$Species)
# Confusion Matrix and Statistics
#
# Reference
# Prediction setosa versicolor virginica
# setosa 12 0 0
# versicolor 0 6 2
# virginica 0 6 10
#
# Overall Statistics
#
# Accuracy : 0.7778
# 95% CI : (0.6085, 0.8988)
# No Information Rate : 0.3333
# P-Value [Acc > NIR] : 5.965e-08
#
# Kappa : 0.6667
#
# Mcnemar's Test P-Value : NA
#
# Statistics by Class:
#
# Class: setosa Class: versicolor
# Sensitivity 1.0000 0.5000
# Specificity 1.0000 0.9167
# Pos Pred Value 1.0000 0.7500
# Neg Pred Value 1.0000 0.7857
# Prevalence 0.3333 0.3333
# Detection Rate 0.3333 0.1667
# Detection Prevalence 0.3333 0.2222
# Balanced Accuracy 1.0000 0.7083
# Class: virginica
# Sensitivity 0.8333
# Specificity 0.7500
# Pos Pred Value 0.6250
# Neg Pred Value 0.9000
# Prevalence 0.3333
# Detection Rate 0.2778
# Detection Prevalence 0.4444
# Balanced Accuracy 0.7917