我正在尝试根据其他列定义的子集,按组计算多列的中位数(但这可以被类似的度量所替代)。这是我的previous post直接跟进的问题。我试图将通过aggregate
计算中位数的方法合并到@Frank友好地提供的Map(function(x,y) dosomething, x, y)
解决方案中,但这没有效果。让我举例说明:
用群GRP1和GRP2计算A和B的中位数
df <- data.frame(GRP1 = c("A","A","A","A","A","A","B","B","B","B","B","B"), GRP2 = c("A","A","A","B","B","B","A","A","A","B","B","B"), A = c(0,4,6,7,0,1,9,0,0,8,3,4), B = c(6,0,4,8,6,7,0,9,9,7,3,0))
med <- aggregate(.~GRP1+GRP2,df,FUN=median)
很简单。现在添加定义用于计算中位数的行的列,即应该删除具有NAs的行,列A定义哪些行用于计算A列中的中位数,b和B列的情况相同:
a <- c(1,4,7,3,NA,3,7,NA,NA,4,8,1)
b <- c(5,NA,7,9,5,6,NA,8,1,7,2,9)
df1 <- cbind(df,a,b)
正如上面提到的,我尝试过将Map
和aggregate
结合起来,但这是行不通的。我假设Map
不知道如何处理GRP1和GRP2。
med1 <- Map(function(x,y) aggregate(.~GRP1+GRP2,df1[!is.na(y)],FUN=median), x=df1[,3:4], y=df1[, 5:6])
这就是我要找的结果:
GRP1 GRP2 A B
1 A A 4 5
2 B A 9 9
3 A B 4 7
4 B B 4 3
任何帮助都将不胜感激!
发布于 2018-08-29 00:24:58
使用data.table
library(data.table)
setDT(df1)
df1[, .(A = median(A[!is.na(a)]), B = median(B[!is.na(b)])), by = .(GRP1, GRP2)]
GRP1 GRP2 A B
1: A A 4 5
2: A B 4 7
3: B A 9 9
4: B B 4 3
dplyr
中的相同逻辑
library(dplyr)
df1 %>%
group_by(GRP1, GRP2) %>%
summarise(A = median(A[!is.na(a)]), B = median(B[!is.na(b)]))
原始df1
df1 <- data.frame(
GRP1 = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"),
GRP2 = c("A", "A", "A", "B", "B", "B", "A", "A", "A", "B", "B", "B"),
A = c(0, 4, 6, 7, 0, 1, 9, 0, 0, 8, 3, 4),
B = c(6, 0, 4, 8, 6, 7, 0, 9, 9, 7, 3, 0),
a = c(1, 4, 7, 3, NA, 3, 7, NA, NA, 4, 8, 1),
b = c(5, NA, 7, 9, 5, 6, NA, 8, 1, 7, 2, 9)
)
发布于 2018-08-29 00:36:52
用dplyr
library(dplyr)
df1 %>%
mutate(A = ifelse(is.na(a), NA, A),
B = ifelse(is.na(b), NA, B)) %>%
# I use this to put as NA the values we don't want to include
group_by(GRP1, GRP2) %>%
summarise(A = median(A, na.rm = T),
B = median(B, na.rm = T))
# A tibble: 4 x 4
# Groups: GRP1 [?]
GRP1 GRP2 A B
<fct> <fct> <dbl> <dbl>
1 A A 4 5
2 A B 4 7
3 B A 9 9
4 B B 4 3
https://stackoverflow.com/questions/52072656
复制相似问题