首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >覆盖随机森林和启动的决策边界

覆盖随机森林和启动的决策边界
EN

Stack Overflow用户
提问于 2019-04-19 22:33:58
回答 1查看 50关注 0票数 0

我生成了一些随机数据,并尝试基于使用随机森林和boosting进行拟合来覆盖决策边界。我可以在下面重现这个问题。我生成数据,使用回归树,我可以使用以下代码轻松覆盖决策边界:

代码语言:javascript
复制
library(tidyverse)
# set seed and generate some random data
set.seed(123)
Dat <- tibble(
    x1 = rnorm(100),
    x2 = rnorm(100)
) %>% mutate(y = as_factor(ifelse(x1^2 + x2^2 > 1.39, "A", "B")))

circlepts <- tibble(theta = seq(0, 2*pi, length = 100)) %>%
    mutate(x = sqrt(1.39) * sin(theta), y = sqrt(1.39) * cos(theta))

# graph the data and draw the boundary

p <- ggplot(Dat) + geom_point(aes(x1, x2, color = y)) + coord_fixed() +
    geom_polygon(data = circlepts, aes(x, y), color = "blue", fill = NA)



# convert character to binary inputs making classification easier
binVec = as.vector(Dat$y)
binVec[which(binVec =="A")] = 1
binVec[which(binVec == "B")] = 0

binVec = as.numeric(binVec)
Dat$y = binVec


# split the data up
datasplit <- initial_split(Dat, prop = 0.7)
training_set <- as_tibble(training(datasplit))
testing_set <- as_tibble(testing(datasplit))

tree_fit <- tree(y~ ., training_set)
grid <- crossing(x1 = modelr::seq_range(testing_set$x1, 50), x2 = modelr::seq_range(testing_set$x1, 50))  %>% 
    modelr::add_predictions(tree_fit)

# plot the data with the decision overlay of the tree fit
p + geom_contour(data = grid, aes(x2, x1, z = as.numeric(pred)), binwidth = 1)

现在,如果我尝试使用随机森林或梯度提升,add_predictions不能很好地合作……

代码语言:javascript
复制
rf_fit <- randomForest(y ~ ., data=training_set, mtry = 2, ntree=500)



grid <- crossing(x1 = modelr::seq_range(testing_set$x1, 50), x2 = modelr::seq_range(testing_set$x1, 50))  %>% 
    modelr::add_predictions(rf_fit)

p + geom_contour(data = grid, aes(x2, x1, z = as.numeric(pred)), binwidth = 1)
##ERROR: Error in if (is.na(out.type)) stop("type must be one of 'response', 'prob', 'vote'") : argument is of length zero

对于梯度提升:

代码语言:javascript
复制
fitBoost <- gbm(y ~ ., data= Dat, distribution = "gaussian",
                 n.trees = 1000)

pred <- predict(fitBoost, newdata=training_set, n.trees=1000)

grid <- crossing(x1 = modelr::seq_range(testing_set$x1, 50), x2 = modelr::seq_range(testing_set$x1, 50))  %>% 
    modelr::add_predictions(fitBoost)
### ERROR: Error in paste("Using", n.trees, "trees...\n") : argument "n.trees" is missing, with no default

这似乎是一个非常简单的问题。有谁能帮帮我吗?

EN

回答 1

Stack Overflow用户

发布于 2019-04-20 00:47:53

以下代码适用于您的随机林:

代码语言:javascript
复制
training_set$y <- factor(training_set$y)
rf_fit <- randomForest(y ~ ., data=training_set, mtry=2, ntree=500)

grid <- crossing(x1 = modelr::seq_range(testing_set$x1, 50), 
                 x2 = modelr::seq_range(testing_set$x1, 50))  %>% 
        modelr::add_predictions(rf_fit)

p + geom_contour(data = grid, aes(x2, x1, z = as.numeric(pred)), binwidth = 1)

下面是梯度提升机器的代码:

代码语言:javascript
复制
fitBoost <- gbm(y ~ ., data=Dat, distribution="gaussian",  n.trees=1000)

pred <- predict(fitBoost, newdata=training_set, n.trees=1000)

add_predictions2 <- function (data, model, var = "pred", type = NULL) 
{
    data[[var]] <- predict2(model, data, type = type)
    data
}
predict2 <- function (model, data, type = NULL) 
{
    if (is.null(type)) {
        stats::predict(model, data, n.trees=1000)
    }  else {
        stats::predict(model, data, type = type, n.trees=1000)
    }
}

grid <- crossing(x1 = modelr::seq_range(testing_set$x1, 50), 
                 x2 = modelr::seq_range(testing_set$x1, 50))  %>% 
        add_predictions2(fitBoost)

p + geom_contour(data = grid, aes(x2, x1, z = as.numeric(pred)), binwidth = 1)

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/55763476

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档