它通过对训练样本的学习,并建立分类规则,然后依据分类规则,对新样本数据进行分类预测,属于有监督学习。 优点: 决策树有易于理解和实现; 决策树可处理数值型和非数值型数据;
基于条件的决策树在party包里 install.packages(“party”) ctree(formula,data)
predic数据预测 predict(model,newdata=data.test)
代码实现:
#install.packages("party") library(party) data <- read.csv("data.csv") formula <- CollegePlans ~ Gender+ParentIncome+IQ+ParentEncouragement #CollegePlans ~ . CollegePlansTree <- ctree(formula, data=data) plot(CollegePlansTree) plot(CollegePlansTree, type="simple")
#交叉验证 total <- nrow(data) index <- sample(1:total, total*0.7) data.train <- data[index, ] data.test <- data[-index, ] CollegePlansTree <- ctree(formula, data=data.train) data.test.predict <- predict(CollegePlansTree, newdata=data.test) prop.table(table(data.test$CollegePlans, data.test.predict), 1) data.test.predict Does not plan to attend Plans to attend Does not plan to attend 0.91242236 0.08757764 Plans to attend 0.32531646 0.67468354
可以看到,决策树准确率70%,有待提高
本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。
我来说两句