我有成对的单词,它们是用ARPABET转录的。我正在尝试组合这些单词,以便在假设严格排序的情况下,生成每个可能的片段序列。示例如下所示:
word1 transcription1 word2 transcription2
dog D AA G cat K AE T 组合transcription1和transcription2将会产生类似下面的结果,其中它通过段进行迭代。出于这个玩具示例的目的,我没有包括组合的第二个单词中没有分段的实例(例如,dog+cat = dog),但它可能在逻辑空间中。
D K AE T
D AE T
D T
D AA K AE T
D AA AE T
D AA T
D AA G K AE T
D AA G AE T
D AA G T
D AA G
K D AA G
K AA G
K G
K AE D AA G
K AE AA G
K AE G
K AE T D AA G
K AE T AA G
K AE T G 最终目标是对这些输出中的每一个进行一些定量分析,因此将它们保存到一个大的数据帧中将是理想的,尽管它可能会随着我正在处理的数据量而变得笨拙(大约900对单词,每个3-7段)。在这个问题上的任何帮助都是很好的。
发布于 2020-03-05 03:32:24
下面是一个执行此操作的简单函数:
library(dplyr)
segment_sequences <- function(x, y) {
x <- strsplit(x, " ") %>% unlist
y <- strsplit(y, " ") %>% unlist
z <- c(x,y)
sapply(seq_along(z), function(j) {
combos <- combn(seq_along(z), j, simplify = FALSE)
sapply(combos, function(cb) paste0(z[cb], collapse=" "))
}) %>% do.call(c,.)
}
segment_sequences("D AA G","K AE T")
[1] "D" "AA" "G" "K" "AE" "T" "D AA" "D G" "D K" "D AE" "D T" "AA G" "AA K" "AA AE" "AA T" "G K" "G AE"
[18] "G T" "K AE" "K T" "AE T" "D AA G" "D AA K" "D AA AE" "D AA T" "D G K" "D G AE" "D G T" "D K AE" "D K T" "D AE T" "AA G K" "AA G AE" "AA G T"
[35] "AA K AE" "AA K T" "AA AE T" "G K AE" "G K T" "G AE T" "K AE T" "D AA G K" "D AA G AE" "D AA G T" "D AA K AE" "D AA K T" "D AA AE T" "D G K AE" "D G K T" "D G AE T" "D K AE T"
[52] "AA G K AE" "AA G K T" "AA G AE T" "AA K AE T" "G K AE T" "D AA G K AE" "D AA G K T" "D AA G AE T" "D AA K AE T" "D G K AE T" "AA G K AE T" "D AA G K AE T"发布于 2020-03-05 04:39:15
我的手工函数,它只使用基础函数。
fun <- function(x, y){
x <- strsplit(x, " ")[[1]]
y <- strsplit(y, " ")[[1]]
apply(do.call(expand.grid, lapply(c(x, y), c, NA)),
1, function(x) paste(x[!is.na(x)], collapse = " "))
}
fun("D AA G", "K AE T")
# [1] "D AA G K AE T" "AA G K AE T" "D G K AE T" "G K AE T"
# [5] "D AA K AE T" "AA K AE T" "D K AE T" "K AE T"
# [9] "D AA G AE T" "AA G AE T" "D G AE T" "G AE T"
# [13] "D AA AE T" "AA AE T" "D AE T" "AE T"
# [17] "D AA G K T" "AA G K T" "D G K T" "G K T"
# [21] "D AA K T" "AA K T" "D K T" "K T"
# [25] "D AA G T" "AA G T" "D G T" "G T"
# [29] "D AA T" "AA T" "D T" "T"
# [33] "D AA G K AE" "AA G K AE" "D G K AE" "G K AE"
# [37] "D AA K AE" "AA K AE" "D K AE" "K AE"
# [41] "D AA G AE" "AA G AE" "D G AE" "G AE"
# [45] "D AA AE" "AA AE" "D AE" "AE"
# [49] "D AA G K" "AA G K" "D G K" "G K"
# [53] "D AA K" "AA K" "D K" "K"
# [57] "D AA G" "AA G" "D G" "G"
# [61] "D AA" "AA" "D" "" https://stackoverflow.com/questions/60533170
复制相似问题