你好,我试图找出如何通过前缀匹配来合并数据帧的行(和列之和):
示例dataframe:
set.seed(42) ## for sake of reproducibility
df <- data.frame(col1=c(sprintf("gene%s", 1:3), sprintf("protein%s", 1:5), sprintf("lipid%s", 1:3)),
counts=runif(11, min=10, max=70))
df
# col1 counts
# 1 gene1 64.88836
# 2 gene2 66.22452
# 3 gene3 27.16837
# 4 protein1 59.82686
# 5 protein2 48.50473
# 6 protein3 41.14576
# 7 protein4 54.19530
# 8 protein5 18.08000
# 9 lipid1 49.41954
# 10 lipid2 52.30389
# 11 lipid3 37.46451
因此,我希望以"gene“开头的所有行都合并成一行,并与蛋白质和脂质行合并。
期望产出:
col1 counts
gene 158.2813
lipid 139.1879
protein 221.7526
发布于 2022-05-30 04:05:07
df %>%
group_by(col1 = str_remove(col1, "\\d+"))%>%
summarise(counts = sum(counts))
发布于 2022-05-30 04:46:15
将数字去掉,然后使用公式进行gsub
aggregate
aggregate(counts ~ gsub('\\d+', '', col1), df, sum)
# gsub("\\\\d", "", col1) counts
# 1 gene 158.2813
# 2 lipid 139.1879
# 3 protein 221.7526
或者是list
符号。
with(df, aggregate(list(counts=counts), list(col1=gsub('\\d+', '', col1)), sum))
# col1 counts
# 1 gene 158.2813
# 2 lipid 139.1879
# 3 protein 221.7526
字符串生成的Sidenote :您也可以使用paste0
以数字作为后缀。
paste0("gene", 1:3)
# [1] "gene1" "gene2" "gene3"
数据:
df <- structure(list(col1 = c("gene1", "gene2", "gene3", "protein1",
"protein2", "protein3", "protein4", "protein5", "lipid1", "lipid2",
"lipid3"), counts = c(64.8883626097813, 66.2245247978717, 27.1683720871806,
59.8268575640395, 48.5047311335802, 41.145756947808, 54.195298878476,
18.0799958342686, 49.4195374241099, 52.303887042217, 37.4645065749064
)), class = "data.frame", row.names = c(NA, -11L))
https://stackoverflow.com/questions/72429006
复制相似问题