R - 交叉验证错误处理 - “dims产品与对象的长度不匹配”?

内容来源于 Stack Overflow,并遵循CC BY-SA 3.0许可协议进行翻译与使用

  • 回答 (2)
  • 关注 (0)
  • 查看 (447)

我一直在通过ISLR软件包研究统计学习模型的一些例子。这里有代码(https://rpubs.com/davoodastaraky/subset),所以任何人都可以看到。我也把它放在下面以方便。

library(ISLR)
library(leaps)
data(Hitters)
Hitters
regfit.full = regsubsets(Salary ~ ., data = Hitters, nvmax = 19)
reg.summary = summary(regfit.full)
#plot rss
library(ggvis)
rsq <- as.data.frame(reg.summary$rsq)
names(rsq) <- "R2"
rsq %>% 
  ggvis(x=~ c(1:nrow(rsq)), y=~R2 ) %>%
  layer_points(fill = ~ R2 ) %>%
  add_axis("y", title = "R2") %>% 
  add_axis("x", title = "Number of variables")

par(mfrow=c(2,2))
plot(reg.summary$rss ,xlab="Number of Variables ",ylab="RSS",type="l")
plot(reg.summary$adjr2 ,xlab="Number of Variables ", ylab="Adjusted 
RSq",type="l")
# which.max(reg.summary$adjr2)
points(11,reg.summary$adjr2[11], col="red",cex=2,pch=20)
plot(reg.summary$cp ,xlab="Number of Variables ",ylab="Cp", type='l')
# which.min(reg.summary$cp )
points(10,reg.summary$cp [10],col="red",cex=2,pch=20)
plot(reg.summary$bic ,xlab="Number of Variables ",ylab="BIC",type='l')
# which.min(reg.summary$bic )
points(6,reg.summary$bic [6],col="red",cex=2,pch=20)

plot(regfit.full,scale="bic")

set.seed (1)
train = sample(c(TRUE,FALSE), nrow(Hitters),rep=TRUE)
test =(! train )

predict.regsubsets =function (object ,newdata ,id ,...){
  form=as.formula(object$call [[2]])
  mat=model.matrix(form,newdata)
  coefi=coef(object ,id=id)
  xvars=names(coefi)
  mat[,xvars]%*%coefi
}

regfit.best=regsubsets(Salary~.,data=Hitters ,nvmax=19)
coef(regfit.best ,10)

k = 10
set.seed(1)
folds = sample(1:k,nrow(Hitters),replace=TRUE)
table(folds)

代码运行顺利,直到我到达下面的这一部分:

for(j in 1:k){
  best.fit = regsubsets(Salary ~., data=Hitters[folds != j,], nvmax = 19)

 for (i in 1:19){
pred = predict.regsubsets(best.fit, Hitters[folds == j, ], id = i)
cv.errors[j, i] = mean((Hitters$Salary[folds == j] - pred)^2)
  }
}

我收到错误的地方:

Error in mean((Hitters$Salary[folds == j] - pred)^2) : 
  dims [product 18] do not match the length of object [22]
In addition: Warning message:
In Hitters$Salary[folds == j] - pred :
  longer object length is not a multiple of shorter object length

我的问题是:为什么我收到此错误?我如何解决它?代码实际上取自网站,我还没有改变它。显然我遗漏了一些关于物体长度的东西。谢谢。

提问于
用户回答回答于

Hitters中的Salary存在缺失值,所以先用na.omit(Hitters)将变量中的缺失值删除,然后再运行你的循环。

用户回答回答于

如果你想“修理”这个你需要拉出的属性pred从对象,然后选择匹配值Hitters基于它的对象rownames()

> str(Hitters$Salary)
 num [1:322] NA 475 480 500 91.5 750 70 100 75 1100 ...
> str(pred)
 num [1:18, 1] 988 359 370 808 383 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:18] "-Andre Thornton" "-Bob Dernier" "-Chris Brown" "-Chet Lemon" ...
  ..$ : NULL
> names(Hitters)
 [1] "AtBat"     "Hits"      "HmRun"     "Runs"      "RBI"       "Walks"     "Years"     "CAtBat"   
 [9] "CHits"     "CHmRun"    "CRuns"     "CRBI"      "CWalks"    "League"    "Division"  "PutOuts"  
[17] "Assists"   "Errors"    "Salary"    "NewLeague"
> rownames(Hitters)
  [1] "-Andy Allanson"     "-Alan Ashby"        "-Alvin Davis"       "-Andre Dawson"     
  [5] "-Andres Galarraga"  "-Alfredo Griffin"   "-Al Newman"         "-Argenis Salazar"  
  [9] "-Andres Thomas"     "-Andre Thornton"    "-Alan Trammell"     "-Alex Trevino"     
 [13] "-Andy VanSlyke"     "-Alan Wiggins"      "-Bill Almon"        "-Billy Beane"
#omitted the rest of the 322-item column     

所属标签

可能回答问题的人

  • 嗨喽你好

    7 粉丝480 提问9 回答
  • 富有想象力的人

    3 粉丝0 提问7 回答
  • 人生的旅途

    10 粉丝484 提问6 回答
  • Richel

    4 粉丝0 提问6 回答

扫码关注云+社区

领取腾讯云代金券