我意识到以前有人问过这个问题,但看过所有的答案后,它们都是针对特定问题的,我找不到针对我独特情况的答案。
我在R中输入了以下内容,它在第一个示例中有效,但在第二个示例中无效,我不明白为什么。
设置glm的数据:
setwd("P:/STAT319")
ucb2<-read.table('Berkeley.PoissonTwo.txt',header=TRUE)
attach(ucb2)
ucb2如下所示:
Count Admit Department Gender
313 FALSE A Female
512 TRUE A Female
19 FALSE A Male
89 TRUE A Male
207 FALSE B Female
353 TRUE B Female
8 FALSE B Male
17 TRUE B Male
205 FALSE C Female
120 TRUE C Female
391 FALSE C Male
202 TRUE C Male
279 FALSE D Female
138 TRUE D Female
244 FALSE D Male
131 TRUE D Male
138 FALSE E Female
53 TRUE E Female
299 FALSE E Male
94 TRUE E Male
351 FALSE F Female
22 TRUE F Female
317 FALSE F Male
24 TRUE F Male
使用因子变量,对and和NotAdmit使用TRUE和FALSE:
Admit<-c(0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1)
fAdmit<-factor(Admit)
rAdmit<-factor(Admit,labels=c("FALSE","TRUE"))
glm2<-glm(Count~Admit+Department+Gender,family=poisson)
glm2
为排除一个交叉验证铺平道路
library(car)
vif(glm2)
# GVIF Df GVIF^(1/(2*Df))
# Admit 1 1 1
# Department 1 5 1
# Gender 1 1 1
step(glm2)
# Start: AIC=2272.73
# Count ~ Admit + Department + Gender
#
# Df Deviance AIC
# <none> 2097.7 2272.7
# - Department 5 2257.2 2422.2
# - Gender 1 2260.6 2433.6
# - Admit 1 2327.7 2500.8
#
# Call: glm(formula = Count ~ Admit + Department + Gender, family = poisson)
#
# Coefficients:
# (Intercept) Admit DepartmentB DepartmentC
# 5.82785 -0.45674 -0.46679 -0.01621
# DepartmentD DepartmentE DepartmentF GenderMale
# -0.16384 -0.46850 -0.26752 -0.38287
# Degrees of Freedom: 23 Total (i.e. Null); 16 Residual
# Null Deviance: 2650
# Residual Deviance: 2098 AIC: 2273
library(ipred)
errorest(Count~Admit+Department+Gender,data=ucb2,model=glm,est.para=control.errorest(k=24))
# Call:
# errorest.data.frame(formula = Count ~ Admit + Department + Gender,
# data = ucb2, model = glm, est.para = control.errorest(k = # 24))
#
# 24-fold cross-validation estimator of root mean squared error
#
# Root mean squared error: 180.5741
因此,第一个应用程序处理的数据如下所示。现在,为了进行相同的研究,我们必须重新排列数据,并执行Logistic回归:
ucb1<-read.table('Monday.Late.txt',header=TRUE)
attach(ucb1)
# The following object is masked _by_ .GlobalEnv:
#
# Admit
# The following objects are masked from ucb2:
#
# Admit, Department, Gender
y<-cbind(ucb1[,1],ucb1[,2])
glm1<-glm(y~Gender+Department,family=binomial)
这里的数据如下:
Admit NotAdmit Gender Department
512 313 female a
353 207 female b
120 205 female c
138 279 female d
53 138 female e
22 351 female f
89 19 male a
17 8 male b
202 391 male c
131 244 male d
94 299 male e
24 317 male f
将此新数据设置为略去一个:
vif(glm1)
# GVIF Df GVIF^(1/(2*Df))
# Gender 1.384903 1 1.176819
# Department 1.384903 5 1.033099
step(glm1)
# Start: AIC=103.14
# y ~ Gender + Department
# Df Deviance AIC
# - Gender 1 21.74 102.68
# <none> 20.20 103.14
# - Department 5 783.61 856.55
#
# Step: AIC=102.68
# y ~ Department
#
# Df Deviance AIC
# <none> 21.74 102.68
# - Department 5 877.06 948.00
#
# Call: glm(formula = y ~ Department, family = binomial)
#
# Coefficients:
# (Intercept) Departmentb Departmentc Departmentd
# 0.59346 -0.05059 -1.20915 -1.25833
# Departmente Departmentf
# -1.68296 -3.26911
#
# Degrees of Freedom: 11 Total (i.e. Null); 6 Residual
# Null Deviance: 877.1
# Residual Deviance: 21.74 AIC: 102.7
到目前为止,一切都很好,但现在问题出现了:
errorest(y~Gender+Department,data=ucb1,model=glm,est.para=control.errorest(k=12))
Error in xj[i, , drop = FALSE] : (subscript) logical subscript too long
那么为什么会发生这种情况呢?我尝试了k的其他值,不确定k的值是什么-我假设它是行数的值
然后我尝试相同的数据,以不同的方式排列:
ucb1a<-read.table('Berkeley.Rearranged.txt',header=TRUE)
attach(ucb1a)
ucb1a
这是对之前数据的重新排列
Admitted Not_Admit Depart Genders
1 512 313 A Female
2 89 19 A Male
3 353 207 B Female
4 17 8 B Male
5 120 205 C Female
6 202 391 C Male
7 138 279 D Female
8 131 244 D Male
9 53 138 E Female
10 94 299 E Male
11 22 351 F Female
12 24 317 F Male
然后
y<-cbind(ucb1[,1],ucb1[,2])
glm1a<-glm(y~Genders+Depart,family=binomial)
vif(glm1a)
# GVIF Df GVIF^(1/(2*Df))
# Gender 1.384903 1 1.176819
# Department 1.384903 5 1.033099
step(glm1a)
# Start: AIC=103.14
# y ~ Gender + Department
#
# Df Deviance AIC
# - Gender 1 21.74 102.68
# <none> 20.20 103.14
# - Department 5 783.61 856.55
#
# Step: AIC=102.68
# y ~ Department
#
# Df Deviance AIC
# <none> 21.74 102.68
# - Department 5 877.06 948.00
#
# Call: glm(formula = y ~ Department, family = binomial)
#
# Coefficients:
# (Intercept) Departmentb Departmentc Departmentd
# 0.59346 -0.05059 -1.20915 -1.25833
# Departmente Departmentf
# -1.68296 -3.26911
#
# Degrees of Freedom: 11 Total (i.e. Null); 6 Residual
# Null Deviance: 877.1
# Residual Deviance: 21.74 AIC: 102.7
再说一次,到目前为止一切都很好,但这又一次发生了:
errorest(y~Gender+Department,data=ucb1a,model=glm,est.para=control.errorest(k=12))
Error in xj[i, , drop = FALSE] : (subscript) logical subscript too long
相信我,我再次尝试了k的其他数字,我不明白为什么这个数字会出错。因此,如果任何人有任何想法,对于这个(下标)逻辑下标太长的特定示例,请回复此。
发布于 2017-08-24 15:23:10
当您的对象大小不同时,就会出现此问题。我认为您的问题来自attach(),但我不确定..尝试不使用它的代码,或者可以尝试使用()。正如nicola指出的那样,您应该先检查为什么必须先使用attach(),然后再使用它。另外,我也不确定你想用它来达到什么目的。
您可以在该函数的帮助部分中看到以下内容:
attach有更改搜索路径的副作用,这很容易导致找到特定名称的错误对象。人们经常忘记分离数据库。
在交互式使用中,with通常比使用attach/detach更可取,除非是save()-produced文件,在这种情况下,attach()是load()的(安全)包装。
https://stackoverflow.com/questions/29861434
复制