首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >套索回归中newx的格式在R中出现错误

套索回归中newx的格式在R中出现错误
EN

Stack Overflow用户
提问于 2019-04-02 20:57:43
回答 1查看 392关注 0票数 0

我正在尝试实现套索线性回归。我训练我的模型,但当我试图对未知数据进行预测时,它会给我以下错误:

代码语言:javascript
运行
复制
 Error in cbind2(1, newx) %*% nbeta : 
     invalid class 'NA' to dup_mMatrix_as_dgeMatrix

我的数据总结如下:

我想预测未知的percent_gc。我最初使用percent_gc已知的数据训练我的模型

代码语言:javascript
运行
复制
 set.seed(1)

 ###training data
 data.all <- tibble(description = c('Xylanimonas cellulosilytica XIL07, DSM 15894','Teredinibacter turnerae T7901',
                            'Desulfotignum phosphitoxidans FiPS-3, DSM 13687','Brucella melitensis bv. 1 16M'),
            phylum = c('Actinobacteria','Proteobacteria','Proteobacteria','Bacteroidetes'),
            genus = c('Acaryochloris','Acetohalobium','Acidimicrobium','Acidithiobacillus'),
            Latitude = c('63.93','69.372','3.493.11','44.393.704'),
            Longitude = c('-22.1','88.235','134.082.527','-0.130781'),
            genome_size = c(8361599,2469596,2158157,3207552),
            percent_gc = c(34,24,55,44),
            percent_psuedo = c(0.0032987747,0.0291222313,0.0353728489,0.0590663703),
            percent_signalpeptide = c(0.02987198,0.040607055,0.048757170,0.061606859))

  ###data for prediction
  data.prediction <- tibble(description = c('Liberibacter crescens BT-1','Saprospira grandis Lewin',
                            'Sinorhizobium meliloti AK83','Bifidobacterium asteroides ATCC 25910'),
            phylum = c('Actinobacteria','Proteobacteria','Proteobacteria','Bacteroidetes'),
            genus = c('Acaryochloris','Acetohalobium','Acidimicrobium','Acidithiobacillus'),
            Latitude = c('39.53','69.372','5.493.12','44.393.704'),
            Longitude = c('20.1','-88.235','134.082.527','-0.130781'),
            genome_size = c(474832,2469837,2158157,3207552),
            percent_gc = c(NA,NA,NA,NA),
            percent_psuedo = c(0.0074639239,0.0291222313,0.0353728489,0.0590663703),
            percent_signalpeptide = c(0.02987198,0.040607055,0.048757170,0.061606859))

x=model.matrix(percent_gc~.,data.all)
y=data.all$percent_gc

cv.out <- cv.glmnet (x, y, alpha = 1,family  = "gaussian")
best.lambda= cv.out$lambda.min

fit <- glmnet(x,y,alpha=1)

然后,我想对未知的percent_gc做出预测。

代码语言:javascript
运行
复制
newX = matrix(data = data.prediction %>% select(-percent_gc)) 
data.prediction$percent_gc <- 
 predict(object = fit ,type="response", s=best.lambda, newx=newX)

这会产生我上面提到的错误。

我不明白为了摆脱这个帮助,newX应该是哪种格式。如果您有真知灼见,我们将不胜感激。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-04-02 22:18:23

我真的不知道如何构造合适的矩阵,但是package glmnetUtils提供了将公式直接拟合到数据帧并进行预测的功能。有了这个,我得到了预测值:

代码语言:javascript
运行
复制
library(glmnetUtils)
fit <- glmnet(percent_gc~.,data.all,alpha=1)
cv.out <- cv.glmnet (percent_gc~.,data.all, alpha = 1,family  = "gaussian")
best.lambda= cv.out$lambda.min

predict(object = fit,data.prediction,s=best.lambda)
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/55475399

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档