它通过对训练样本的学习,并建立分类规则,然后依据分类规则,对新样本数据进行分类预测,属于有监督学习。 优点: 决策树有易于理解和实现; 决策树可处理数值型和非数值型数据;
基于条件的决策树在party包里 install.packages(“party”) ctree(formula,data)
predic数据预测 predict(model,newdata=data.test)
代码实现:
#install.packages("party")
library(party)
data <- read.csv("data.csv")
formula <- CollegePlans ~ Gender+ParentIncome+IQ+ParentEncouragement
#CollegePlans ~ .
CollegePlansTree <- ctree(formula, data=data)
plot(CollegePlansTree)
plot(CollegePlansTree, type="simple")
#交叉验证
total <- nrow(data)
index <- sample(1:total, total*0.7)
data.train <- data[index, ]
data.test <- data[-index, ]
CollegePlansTree <- ctree(formula, data=data.train)
data.test.predict <- predict(CollegePlansTree, newdata=data.test)
prop.table(table(data.test$CollegePlans, data.test.predict), 1)
data.test.predict
Does not plan to attend Plans to attend
Does not plan to attend 0.91242236 0.08757764
Plans to attend 0.32531646 0.67468354
可以看到,决策树准确率70%,有待提高