文章/答案/技术大牛

发布

社区首页 >问答首页 >将dataframe字符串列拆分为没有模式的多列。

问将dataframe字符串列拆分为没有模式的多列。
EN

Stack Overflow用户

提问于 2020-10-26 20:50:52

回答 2查看 70关注 0票数 1

我有一个名为“data.frame”的列，其中包含许多由";“分隔的信息。我只想保留第一个单词"MES“之后的部分。

> [1] 
IMPACT=MODIFIER;DISTANCE=3802;STRAND=1;MES-SWA_acceptor_alt=-1.269;MES-SWA_acceptor_diff=-4.016;MES-SWA_acceptor_ref=-5.005;MES-SWA_acceptor_ref_comp=-5.285;MES-SWA_donor_alt=-6.610;MES-SWA_donor_diff=0.781;MES-SWA_donor_ref=-1.165;MES-SWA_donor_ref_comp=-5.829

> [2] 
IMPACT=MODIFIER;STRAND=1;MES-SWA_acceptor_alt=0.965;MES-SWA_acceptor_diff=0.290;MES-SWA_acceptor_ref=1.255;MES-SWA_acceptor_ref_comp=1.255;MES-SWA_donor_alt=-9.796;MES-SWA_donor_diff=-1.219;MES-SWA_donor_ref=-10.341;MES-SWA_donor_ref_comp=-11.015

使用";“函数”将信息拆分成多个列是很容易的。但是，如果我这样做，因为并非所有行都包含完全相同的信息(例如：距离值在第一个示例中，而不是在第二个示例中)，因此列的信息会变得混乱，并且不匹配它们相应的列(参见图像)。我想这就是我收到警告信息的原因：

> df <- separate(tabla2, col = "Extra", c("IMPACT=MODIFIER", "DISTANCE", "STRAND", "MES-SWA_acceptor_alt", "MES-SWA_acceptor_diff", "MES-SWA_acceptor_ref", "MES-SWA_acceptor_ref_comp", "MES-SWA_donor_alt", "MES-SWA_donor_diff", "MES-SWA_donor_ref", "MES-SWA_donor_ref_comp"), sep = ";")

>Warning messages:
1: Expected 11 pieces. Additional pieces discarded in 23177 rows [2, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 22, 23, 24, ...]. 
2: Expected 11 pieces. Missing pieces filled with `NA` in 74 rows [1055, 1061, 1062, 1072, 1100, 1101, 1102, 1103, 1104, 1105, 1308, 1319, 1320, 1321, 2684, 2713, 2714, 10494, 10495, 10496, ...].

所以，如果我能把所有在我想要保存的信息之前没有价值的数据去掉，我会很高兴的。但是，我找到的所有函数(子字符串、substr、分离的、nchar.)在我的例子中没有用，因为他们需要一个开始参数，即在我的数据中它并不总是相同的。

我认为最接近解决这个问题的方法是将unlist(strsplit())这样的函数组合起来：

> tabla3 <- tabla2 %>% select(Extra, var_id)
> tabla4 <- unlist(strsplit(tabla2$Extra, "MES-SWA_acceptor_alt="))
> tabla5 <- bind_cols(tabla3, tabla4) --> Error: Argument 2 must have names

有人能帮我解决这个问题吗？我会很棒的！

这是我第一次发帖，所以我希望一切都清楚:)

string

split

multiple-columns

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-10-26 21:57:03

如果我正确地理解了您想要的输出，那么下面的代码应该适用于此：

# Given data example
tabla2 <- data.frame(Extra = c(
  "IMPACT=MODIFIER;DISTANCE=3802;STRAND=1;MES-SWA_acceptor_alt=-1.269;MES-SWA_acceptor_diff=-4.016;MES-SWA_acceptor_ref=-5.005;MES-SWA_acceptor_ref_comp=-5.285;MES-SWA_donor_alt=-6.610;MES-SWA_donor_diff=0.781;MES-SWA_donor_ref=-1.165;MES-SWA_donor_ref_comp=-5.829",
  "IMPACT=MODIFIER;STRAND=1;MES-SWA_acceptor_alt=0.965;MES-SWA_acceptor_diff=0.290;MES-SWA_acceptor_ref=1.255;MES-SWA_acceptor_ref_comp=1.255;MES-SWA_donor_alt=-9.796;MES-SWA_donor_diff=-1.219;MES-SWA_donor_ref=-10.341;MES-SWA_donor_ref_comp=-11.015"
  )
)
# Empty data frame
temp_df <- data.frame()
# Split Everything by ";"
temp_list <- strsplit(tabla2$Extra, split = ";")
# Cycle through elements to fill data frame
for (i in 1:length(temp_list)){
  temp_list_2 <- strsplit(temp_list[[i]], split = "=")
  for (j in 1:length(temp_list_2)){
    temp_df[i, temp_list_2[[j]][1]] <- temp_list_2[[j]][2]
  }  
}

票数 0

Stack Overflow用户

发布于 2020-10-26 22:05:47

使用data.table，在";"上拆分为新列，然后将宽到长重新划分，然后在"="上拆分为新列，最后从长到全重新组合。这将为我们提供对齐的列名，即使在缺少值时，例如，请参见距离，第二行为NA：

d <- data.table(Extra =  c("IMPACT=MODIFIER;DISTANCE=3802;STRAND=1;MES-SWA_acceptor_alt=-1.269;MES-SWA_acceptor_diff=-4.016;MES-SWA_acceptor_ref=-5.005;MES-SWA_acceptor_ref_comp=-5.285;MES-SWA_donor_alt=-6.610;MES-SWA_donor_diff=0.781;MES-SWA_donor_ref=-1.165;MES-SWA_donor_ref_comp=-5.829",
                           "IMPACT=MODIFIER;STRAND=1;MES-SWA_acceptor_alt=0.965;MES-SWA_acceptor_diff=0.290;MES-SWA_acceptor_ref=1.255;MES-SWA_acceptor_ref_comp=1.255;MES-SWA_donor_alt=-9.796;MES-SWA_donor_diff=-1.219;MES-SWA_donor_ref=-10.341;MES-SWA_donor_ref_comp=-11.015"))

d[, tstrsplit(Extra, ";")
  ][, id := .I
    ][, melt(.SD, id.vars = "id")
      ][, c("c1", "c2") := tstrsplit(value, "=", type.convert = TRUE)
        ][ , dcast(.SD, id ~ c1, value.var = "c2")]

#    id   NA DISTANCE   IMPACT MES-SWA_acceptor_alt MES-SWA_acceptor_diff
# 1:  1 <NA>     3802 MODIFIER               -1.269                -4.016
# 2:  2 <NA>     <NA> MODIFIER                0.965                 0.290
#    MES-SWA_acceptor_ref MES-SWA_acceptor_ref_comp MES-SWA_donor_alt
# 1:               -5.005                    -5.285            -6.610
# 2:                1.255                     1.255            -9.796
#    MES-SWA_donor_diff MES-SWA_donor_ref MES-SWA_donor_ref_comp STRAND
# 1:              0.781            -1.165                 -5.829      1
# 2:             -1.219           -10.341                -11.015      1

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/64544788

复制

相似问题

问将dataframe字符串列拆分为没有模式的多列。
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将dataframe字符串列拆分为没有模式的多列。EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将dataframe字符串列拆分为没有模式的多列。
EN