我有一个数据框,包含一个分组变量("Gene")和一个值变量(" value "):
Gene Value
A 12
A 10
B 3
B 5
B 6
C 1
D 3
D 4
对于我的分组变量的每一级,我希望提取最大值。因此,结果应该是一个数据帧,其中分组变量的每个级别都有一行:
Gene Value
A 12
B 6
C 1
D 4
aggregate
能做到这一点吗?
发布于 2014-08-14 17:58:01
在R中有很多这样做的可能性,下面是其中的一些:
df <- read.table(header = TRUE, text = 'Gene Value
A 12
A 10
B 3
B 5
B 6
C 1
D 3
D 4')
# aggregate
aggregate(df$Value, by = list(df$Gene), max)
aggregate(Value ~ Gene, data = df, max)
# tapply
tapply(df$Value, df$Gene, max)
# split + lapply
lapply(split(df, df$Gene), function(y) max(y$Value))
# plyr
require(plyr)
ddply(df, .(Gene), summarise, Value = max(Value))
# dplyr
require(dplyr)
df %>% group_by(Gene) %>% summarise(Value = max(Value))
# data.table
require(data.table)
dt <- data.table(df)
dt[ , max(Value), by = Gene]
# doBy
require(doBy)
summaryBy(Value~Gene, data = df, FUN = max)
# sqldf
require(sqldf)
sqldf("select Gene, max(Value) as Value from df group by Gene", drv = 'SQLite')
# ave
df[as.logical(ave(df$Value, df$Gene, FUN = function(x) x == max(x))),]
发布于 2017-07-24 14:25:18
df$Gene <- as.factor(df$Gene)
do.call(rbind, lapply(split(df,df$Gene), function(x) {return(x[which.max(x$Value),])}))
只需使用基数R
发布于 2017-06-22 06:26:48
使用sqldf和标准sql获取按另一个变量分组的最大值
https://cran.r-project.org/web/packages/sqldf/sqldf.pdf
library(sqldf)
sqldf("select max(Value),Gene from df1 group by Gene")
或
将优秀的Hmisc包用于function (max) https://www.rdocumentation.org/packages/Hmisc/versions/4.0-3/topics/summarize的分组应用
library(Hmisc)
summarize(df1$Value,df1$Gene,max)
https://stackoverflow.com/questions/25314336
复制相似问题