首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >如何对R中的重复数据进行排序?

如何对R中的重复数据进行排序?
EN

Stack Overflow用户
提问于 2019-05-22 22:40:31
回答 1查看 56关注 0票数 1

我有一个数据集,我正在尝试按1列(rssnp1列)中的重复I排序,但我只能找到重复的函数来在线删除重复项。

我的数据如下所示:

代码语言:javascript
复制
Chr  Start   End     rssnp1        Type    gene
1   1244733 1244734 rs2286773   LD_SNP  ACE
1   1257536 1257436 rs301159    LD_SNP  CPEB4
1   1252336 1252336 rs2286773   Sentinel    CPEB4
1   1252343 1252343 rs301159    LD_SNP  CPEB4
1   1254841 1254841 rs301159    LD_SNP  CPEB4
1   1256703 1267404 rs301159    LD_SNP  CPEB4
1   1269246 1269246 rs301159    LD_SNP  CPEB4
1   1370168 1370168 rs301159    LD_SNP  GLUPA1
1   1371824 1371824 rs301159    LD_SNP  GLUPA1
1   1372591 1372591 rs301159    LD_SNP  GLUPA1

我的输出目标是:

代码语言:javascript
复制
Chr  Start   End     rssnp1        Type    gene
1   1244733 1244734 rs2286773   LD_SNP  ACE
1   1252336 1252336 rs2286773   Sentinel    CPEB4
1   1257536 1257436 rs301159    LD_SNP  CPEB4
1   1252343 1252343 rs301159    LD_SNP  CPEB4
1   1254841 1254841 rs301159    LD_SNP  CPEB4
1   1256703 1267404 rs301159    LD_SNP  CPEB4
1   1269246 1269246 rs301159    LD_SNP  CPEB4
1   1370168 1370168 rs301159    LD_SNP  GLUPA1
1   1371824 1371824 rs301159    LD_SNP  GLUPA1
1   1372591 1372591 rs301159    LD_SNP  GLUPA1

要重现数据,请使用:

代码语言:javascript
复制
structure(list(Chr = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), Start = c(1244733, 
1257536, 1252336, 1252343, 1254841, 1256703, 1269246, 1370168, 
1371824, 1372591), End = c(1244734, 1257436, 1252336, 1252343, 
1254841, 1267404, 1269246, 1370168, 1371824, 1372591), rssnp1 = c("rs2286773", 
"rs301159", "rs2286773", "rs301159", "rs301159", "rs301159", 
"rs301159", "rs301159", "rs301159", "rs301159"), Type = c("LD_SNP", 
"LD_SNP", "Sentinel", "LD_SNP", "LD_SNP", "LD_SNP", "LD_SNP", 
"LD_SNP", "LD_SNP", "LD_SNP"), gene = c("ACE", "CPEB4", "CPEB4", 
"CPEB4", "CPEB4", "CPEB4", "CPEB4", "GLUPA1", "GLUPA1", "GLUPA1"
)), .Names = c("Chr", "Start", "End", "rssnp1", "Type", "gene"
), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))

我已经尝试过了:

代码语言:javascript
复制
target_order <- c("a", "b", "c")
df[order(match(df$rssnp1)), target_order]

对target_order中的每个唯一值执行此操作,而不是c("a","b","c") -因此我得到了类似c("rs2286773","rs301159"...)的内容。我有成百上千的身份证。但这会给出一个错误:

代码语言:javascript
复制
Error in `[.data.frame`(df, order(match(df$rssnp1)), target_order) : 
  undefined columns selected

有没有其他方法可以做到这一点?

编辑: target_order需要位于代码的不同部分:df[order(match(df$rssnp1, target_order)), ]

然而,对于我来说,这仍然是一种乏味的方式来获得这是工作-有没有更有效的方法来按重复排序?

EN

回答 1

Stack Overflow用户

发布于 2019-05-23 05:34:22

根据我对您描述的理解,您希望结果遵循target_order在其他地方计算得出的特定序列。这应该能够通过合并操作来完成。

假设您有以下顺序。

代码语言:javascript
复制
target_order<-c("rs301159", "rs2286773") 

dt <- structure(list(Chr = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), Start = c(1244733, 
    1257536, 1252336, 1252343, 1254841, 1256703, 1269246, 1370168, 
    1371824, 1372591), End = c(1244734, 1257436, 1252336, 1252343, 
    1254841, 1267404, 1269246, 1370168, 1371824, 1372591), rssnp1 = c("rs2286773", 
    "rs301159", "rs2286773", "rs301159", "rs301159", "rs301159", 
    "rs301159", "rs301159", "rs301159", "rs301159"), Type = c("LD_SNP", 
    "LD_SNP", "Sentinel", "LD_SNP", "LD_SNP", "LD_SNP", "LD_SNP", 
    "LD_SNP", "LD_SNP", "LD_SNP"), gene = c("ACE", "CPEB4", "CPEB4", 
    "CPEB4", "CPEB4", "CPEB4", "CPEB4", "GLUPA1", "GLUPA1", "GLUPA1"
    )), .Names = c("Chr", "Start", "End", "rssnp1", "Type", "gene"
    ), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
    ))

下面的代码应该能够产生您想要的结果。

代码语言:javascript
复制
library(data.table)

setDT(dt)

# Setting sort=FALSE to persist the order in target_order
merge(as.data.table(target_order), dt, by.y="rssnp1", by.x="target_order", sort=FALSE)

 #     target_order Chr   Start     End     Type   gene
 #  1:     rs301159   1 1257536 1257436   LD_SNP  CPEB4
 #  2:     rs301159   1 1252343 1252343   LD_SNP  CPEB4
 #  3:     rs301159   1 1254841 1254841   LD_SNP  CPEB4
 #  4:     rs301159   1 1256703 1267404   LD_SNP  CPEB4
 #  5:     rs301159   1 1269246 1269246   LD_SNP  CPEB4
 #  6:     rs301159   1 1370168 1370168   LD_SNP GLUPA1
 #  7:     rs301159   1 1371824 1371824   LD_SNP GLUPA1
 #  8:     rs301159   1 1372591 1372591   LD_SNP GLUPA1
 #  9:    rs2286773   1 1244733 1244734   LD_SNP    ACE
 # 10:    rs2286773   1 1252336 1252336 Sentinel  CPEB4
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/56259382

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档