首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >数据帧的回归p值和t.test p值

数据帧的回归p值和t.test p值
EN

Stack Overflow用户
提问于 2018-04-02 19:33:43
回答 1查看 74关注 0票数 1

我正在尝试编写一个接受数据帧的函数。数据框架的df$x列由两个因素级别组成。df$y是一个连续的随机变量。到目前为止,这就是我所拥有的:

代码语言:javascript
运行
复制
compare_tests = function(df) {
    p5Model = lm(y ~ x, df)
    p5_Regression_P_Value = anova(p5Model)$'Pr(>F)'[1]
    p5_xFactorLevels = factor(df$x)
    p5_T_Test = t.test(p5_xFactorLevels[1], p5_xFactorLevels[2])
    p5_T_Test_P_Value = p5_T_Test$p.value
    p5Vector = c(regression = p5_Regression_P_Value , t.test = p5_T_Test_P_Value)
    return(p5Vector)
}

我的回归p值起作用,但不适用于因子t.test p值。

例如,sim2是:

代码语言:javascript
运行
复制
# A tibble: 40 x 2
   x           y
   <chr>   <dbl>
 1 a       1.94 
 2 a       1.18 
 3 a       1.24 
 4 a       2.62 
 5 a       1.11 
 6 a       0.866
 7 a      -0.910
 8 a       0.721
 9 a       0.687
10 a       2.07 
11 b       8.07 
12 b       7.36 
13 b       7.95 
14 b       7.75 
15 b       8.44 
16 b      10.8  
17 b       8.05 
18 b       8.58 
19 b       8.12 
20 b       6.09 
21 c       6.86 
22 c       5.76 
23 c       5.79 
24 c       6.02 
25 c       6.03 
26 c       6.55 
27 c       3.73 
28 c       8.68 
29 c       5.64 
30 c       6.21 
31 d       3.07 
32 d       1.33 
33 d       3.11 
34 d       1.75 
35 d       0.822
36 d       1.02 
37 d       3.07   
38 d       2.13 
39 d       2.49 
40 d       0.301

对于那些更愿意看dput(Sim2)的人来说:

代码语言:javascript
运行
复制
structure(list(x = c("a", "a", "a", "a", "a", "a", "a", "a", 
"a", "a", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "c", 
"c", "c", "c", "c", "c", "c", "c", "c", "c", "d", "d", "d", "d", 
"d", "d", "d", "d", "d", "d"), y = c(1.93536318980109, 1.17648861056246, 
1.2436854647462, 2.6235488834436, 1.11203808286976, 0.866002986937445, 
-0.910087467722212, 0.720762758415155, 0.68655402174211, 2.06730787876151, 
8.07003485029664, 7.36087667611434, 7.95003510095185, 7.74851655674979, 
8.44479711579273, 10.7554175753369, 8.04653138044419, 8.57770906930663, 
8.11819487440968, 6.0882795089718, 6.86208648183857, 5.75676326036652, 
5.79391280521842, 6.01917759220915, 6.02956075431977, 6.54982754180169, 
3.72588514310706, 8.68255718355635, 5.63877874450629, 6.21335574971003, 
3.07434588225969, 1.33491175145449, 3.11395241896922, 1.75410358832085, 
0.822436691056719, 1.02414938384014, 3.06505732002715, 2.13167063477289, 
2.48862880920098, 0.300549432154306)), .Names = c("x", "y"), class = 
c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -40L))

我的职能:

代码语言:javascript
运行
复制
 compare_tests(sim2 %>% filter(x %in% c('a', 'd')))

应该回来

代码语言:javascript
运行
复制
regression     t.test 
 0.1051552  0.1052173
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-04-02 20:04:15

您的函数在t.test值方面有问题。

p5_xFactorLevels = factor(df$x)将列转换为因子(Ok,但不是必需的)。然后,p5_T_Test = t.test(p5_xFactorLevels[1], p5_xFactorLevels[2])不正确地对x列的前2个元素执行t测试。

测试是:y列与x列:p5_T_Test = t.test(df$y ~df$x)

代码语言:javascript
运行
复制
compare_tests = function(df) {
  p5Model = lm(y ~ x, df)
  p5_Regression_P_Value = anova(p5Model)$'Pr(>F)'[1]
  #Correct line added below:        
  p5_T_Test = t.test(df$y ~df$x)

  p5_T_Test_P_Value = p5_T_Test$p.value
  p5Vector = c(regression = p5_Regression_P_Value , t.test = p5_T_Test_P_Value)
  return(p5Vector)
}

sim2<-structure(list(x = c("a", "a", "a", "a", "a", "a", "a", "a", 
                           "a", "a", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "c", 
                           "c", "c", "c", "c", "c", "c", "c", "c", "c", "d", "d", "d", "d", 
                           "d", "d", "d", "d", "d", "d"), y = c(1.93536318980109, 1.17648861056246, 
                                                                1.2436854647462, 2.6235488834436, 1.11203808286976, 0.866002986937445, 
                                                                -0.910087467722212, 0.720762758415155, 0.68655402174211, 2.06730787876151, 
                                                                8.07003485029664, 7.36087667611434, 7.95003510095185, 7.74851655674979, 
                                                                8.44479711579273, 10.7554175753369, 8.04653138044419, 8.57770906930663, 
                                                                8.11819487440968, 6.0882795089718, 6.86208648183857, 5.75676326036652, 
                                                                5.79391280521842, 6.01917759220915, 6.02956075431977, 6.54982754180169, 
                                                                3.72588514310706, 8.68255718355635, 5.63877874450629, 6.21335574971003, 
                                                                3.07434588225969, 1.33491175145449, 3.11395241896922, 1.75410358832085, 
                                                                0.822436691056719, 1.02414938384014, 3.06505732002715, 2.13167063477289, 
                                                                2.48862880920098, 0.300549432154306)), .Names = c("x", "y"), class = 
                  c("tbl_df", 
                    "tbl", "data.frame"), row.names = c(NA, -40L))
compare_tests(sim2 %>% filter(x %in% c('a', 'd')))
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/49617623

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档