问逻辑下标太长
EN

Stack Overflow用户

提问于 2015-04-25 05:36:00

回答 1查看 4.2K关注 0票数 0

我意识到以前有人问过这个问题，但看过所有的答案后，它们都是针对特定问题的，我找不到针对我独特情况的答案。

我在R中输入了以下内容，它在第一个示例中有效，但在第二个示例中无效，我不明白为什么。

设置glm的数据：

setwd("P:/STAT319")
ucb2<-read.table('Berkeley.PoissonTwo.txt',header=TRUE)
attach(ucb2)

ucb2如下所示：

Count   Admit Department    Gender     
313 FALSE     A     Female     
512 TRUE      A     Female     
19  FALSE     A     Male       
89  TRUE      A     Male       
207 FALSE     B     Female     
353 TRUE      B     Female     
8   FALSE     B     Male       
17  TRUE      B     Male       
205 FALSE     C     Female     
120 TRUE      C     Female     
391 FALSE     C     Male       
202 TRUE      C     Male       
279 FALSE     D     Female     
138 TRUE      D     Female     
244 FALSE     D     Male       
131 TRUE      D     Male       
138 FALSE     E     Female     
53  TRUE      E     Female     
299 FALSE     E     Male       
94  TRUE      E     Male       
351 FALSE   F       Female     
22  TRUE      F     Female     
317 FALSE     F     Male       
24  TRUE      F     Male

使用因子变量，对and和NotAdmit使用TRUE和FALSE：

Admit<-c(0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1)
fAdmit<-factor(Admit)
rAdmit<-factor(Admit,labels=c("FALSE","TRUE"))
glm2<-glm(Count~Admit+Department+Gender,family=poisson)
glm2

为排除一个交叉验证铺平道路

library(car)
vif(glm2)
# GVIF Df GVIF^(1/(2*Df))
# Admit         1  1               1
# Department    1  5               1
# Gender        1  1               1
step(glm2)
# Start:  AIC=2272.73
# Count ~ Admit + Department + Gender
# 
# Df Deviance    AIC
# <none>            2097.7 2272.7
# - Department  5   2257.2 2422.2
# - Gender      1   2260.6 2433.6
# - Admit       1   2327.7 2500.8
# 
# Call:  glm(formula = Count ~ Admit + Department + Gender, family = poisson)
# 
# Coefficients:
#   (Intercept)        Admit  DepartmentB  DepartmentC  
# 5.82785     -0.45674     -0.46679     -0.01621  
# DepartmentD  DepartmentE  DepartmentF   GenderMale  
# -0.16384     -0.46850     -0.26752     -0.38287  

# Degrees of Freedom: 23 Total (i.e. Null);  16 Residual
# Null Deviance:        2650 
# Residual Deviance: 2098   AIC: 2273

library(ipred)
errorest(Count~Admit+Department+Gender,data=ucb2,model=glm,est.para=control.errorest(k=24))

# Call:
#   errorest.data.frame(formula = Count ~ Admit + Department + Gender, 
#                       data = ucb2, model = glm, est.para = control.errorest(k = # 24))
# 
# 24-fold cross-validation estimator of root mean squared error
# 
# Root mean squared error:  180.5741

因此，第一个应用程序处理的数据如下所示。现在，为了进行相同的研究，我们必须重新排列数据，并执行Logistic回归：

ucb1<-read.table('Monday.Late.txt',header=TRUE)
attach(ucb1)
# The following object is masked _by_ .GlobalEnv:
#   
#   Admit

# The following objects are masked from ucb2:
#   
#   Admit, Department, Gender

y<-cbind(ucb1[,1],ucb1[,2])
glm1<-glm(y~Gender+Department,family=binomial)

这里的数据如下：

Admit   NotAdmit    Gender  Department     
512 313 female  a      
353 207 female  b      
120 205 female  c      
138 279 female  d      
53  138 female  e      
22  351 female  f      
89  19  male    a      
17  8   male    b      
202 391 male    c      
131 244 male    d      
94  299 male    e      
24  317 male    f

将此新数据设置为略去一个：

vif(glm1)
# GVIF Df GVIF^(1/(2*Df))
# Gender     1.384903  1        1.176819
# Department 1.384903  5        1.033099
step(glm1)
# Start:  AIC=103.14
# y ~ Gender + Department

# Df Deviance    AIC
# - Gender      1    21.74 102.68
# <none>             20.20 103.14
# - Department  5   783.61 856.55
# 
# Step:  AIC=102.68
# y ~ Department
# 
# Df Deviance    AIC
# <none>             21.74 102.68
# - Department  5   877.06 948.00
# 
# Call:  glm(formula = y ~ Department, family = binomial)
# 
# Coefficients:
#   (Intercept)  Departmentb  Departmentc  Departmentd  
# 0.59346     -0.05059     -1.20915     -1.25833  
# Departmente  Departmentf  
# -1.68296     -3.26911  
# 
# Degrees of Freedom: 11 Total (i.e. Null);  6 Residual
# Null Deviance:        877.1 
# Residual Deviance: 21.74  AIC: 102.7

到目前为止，一切都很好，但现在问题出现了：

errorest(y~Gender+Department,data=ucb1,model=glm,est.para=control.errorest(k=12))
Error in xj[i, , drop = FALSE] : (subscript) logical subscript too long

那么为什么会发生这种情况呢？我尝试了k的其他值，不确定k的值是什么-我假设它是行数的值

然后我尝试相同的数据，以不同的方式排列：

ucb1a<-read.table('Berkeley.Rearranged.txt',header=TRUE)
attach(ucb1a)
ucb1a

这是对之前数据的重新排列

Admitted Not_Admit Depart Genders
1       512       313      A  Female
2        89        19      A    Male
3       353       207      B  Female
4        17         8      B    Male
5       120       205      C  Female
6       202       391      C    Male
7       138       279      D  Female
8       131       244      D    Male
9        53       138      E  Female
10       94       299      E    Male
11       22       351      F  Female
12       24       317      F    Male

然后

y<-cbind(ucb1[,1],ucb1[,2])
glm1a<-glm(y~Genders+Depart,family=binomial)
vif(glm1a)
# GVIF Df GVIF^(1/(2*Df))
# Gender     1.384903  1        1.176819
# Department 1.384903  5        1.033099

step(glm1a)
# Start:  AIC=103.14
# y ~ Gender + Department
# 
# Df Deviance    AIC
# - Gender      1    21.74 102.68
# <none>             20.20 103.14
# - Department  5   783.61 856.55
# 
# Step:  AIC=102.68
# y ~ Department
# 
# Df Deviance    AIC
# <none>             21.74 102.68
# - Department  5   877.06 948.00
# 
# Call:  glm(formula = y ~ Department, family = binomial)
# 
# Coefficients:
#   (Intercept)  Departmentb  Departmentc  Departmentd  
# 0.59346     -0.05059     -1.20915     -1.25833  
# Departmente  Departmentf  
# -1.68296     -3.26911  
# 
# Degrees of Freedom: 11 Total (i.e. Null);  6 Residual
# Null Deviance:        877.1 
# Residual Deviance: 21.74  AIC: 102.7

再说一次，到目前为止一切都很好，但这又一次发生了：

errorest(y~Gender+Department,data=ucb1a,model=glm,est.para=control.errorest(k=12))
Error in xj[i, , drop = FALSE] : (subscript) logical subscript too long