我有一个包含多列的数据框,如下所示:
Frequency Alels
0.5 C
0.6 C,G
0.02 A,T,TTT我想拆分第二列的值,新行的值是frequency = 0。
我尝试使用tidyr包中的separate(),但是我不能更改新行中的frequency列,我得到了上面的结果:
Frequency Alels
0.5 C
0.6 C
0.6 G
0.02 A
0.02 T
0.02 TTT但我希望输出如下所示:
Frequency Alels
0.5 C
0.6 C
0 G
0.02 A
0 T
0 TTT我正在尝试使用tidyr包中的separate(),但是我不能更改新行中的frequency列。
发布于 2018-05-02 22:11:31
这应该是可行的:
d <- read.table(text = "Frecuency Alels
0.5 C
0.6 C,G",
header = T, stringsAsFactors = F)
counts <- sapply(strsplit(d$Alels, split = ","), length)
data.frame("Frecuency" = unlist(lapply(seq_along(d$Frecuency),
function(x) c(d$Frecuency[x],
rep(0, counts[x] -1)))),
"Alels" = unlist(strsplit(d$Alels, split = ",")))发布于 2018-05-02 22:35:38
不是很好,但我觉得很管用。
# Create data frame
df <- data.frame(frequency = c(0.5, 0.6),
alels = c("C", "C, G, T"),
stringsAsFactors = FALSE)
# Duplicate the alels column, separate rows
# Requires magrittr, dplyr, tidyr
df %<>%
mutate(alels_check = alels) %>%
separate_rows(alels, sep = ",", convert = TRUE)
# Check for dupes and set them to zero
df[duplicated(df$frequency, df$alels_check),]$frequency <- 0
# Remove the duplicated alels column
df %<>% select(-alels_check)原件:
# frequency alels
# 1 0.5 C
# 2 0.6 C, G, T结果:
# frequency alels
# 1 0.5 C
# 2 0.6 C
# 3 0.0 G
# 4 0.0 T使用您的数据:
# frequency alels
# 1 0.50 C
# 2 0.60 C, G
# 3 0.02 A, T, TTT
# frequency alels
# 1 0.50 C
# 2 0.60 C
# 3 0.00 G
# 4 0.02 A
# 5 0.00 T
# 6 0.00 TTT 发布于 2018-05-02 22:45:10
您的示例中的数据:
df <- read.table(text = " Frequency Alels
0.5 C
0.6 C,G
0.02 A,T,TTT",
header = T, stringsAsFactors = F)还有另一个需要你考虑的解决方案:
library(dplyr)
lapply(1:nrow(df),
function(row_num){
s <- strsplit(df$Alels[row_num], ",") %>% unlist
data.frame(Frequency = c(df$Frequency[row_num], rep(0,length(s)-1)),
Alels = s)
}) %>% do.call(rbind, .)
df您还可以选择使用data.table包中的rbindlist(),而不使用do.call(rbind, .)
https://stackoverflow.com/questions/50136311
复制相似问题