在我的脚本中,我得到了各种重复字符串的长长列表。我希望将每个连续的重复字符串简化为后面紧跟(#x)的字符串。字符串重复次数对应的(#x)
summarise(Path = paste0(Channel, collapse = " > ")
上面的代码片段是我的user path语句的一部分,它为每条记录提供了类似以下内容的输出:Direct > Direct > Direct > Endpage > Direct > Endpage > Direct > Direct > Direct > Endpage > Endpage > Direct > Direct > Direct > Direct > Direct > Direct > Direct > Direct > Direct > Direct > Direct > Endpage > Direct
我希望输出结果是这样的,因此不再需要在excel中手动执行!!Direct (3x) > Endpage > Direct > Endpage > Direct (3x) > Endpage (2x) > Direct (11x) > Endpage > Direct
发布于 2019-05-09 02:20:40
txt<-"Direct > Direct > Direct > Endpage > Direct > Endpage > Direct > Direct > Direct > Endpage > Endpage > Direct > Direct > Direct > Direct > Direct > Direct > Direct > Direct > Direct > Direct > Direct > Endpage > Direct"
tt<-rle(strsplit(txt, " > ")[[1]])
res<-""
for (i in 1:length(tt$values)) {
res<-paste0(res, ifelse(i>1, " > ", ""), tt$values[i], ifelse(tt$lengths[i]>1, paste0("(X", tt$lengths[i],")"), ""))
}
结果:
res
[1] "Direct(X3) > Endpage > Direct > Endpage > Direct(X3) > Endpage(X2) > Direct(X11) > Endpage > Direct"
或者,如果您希望它作为一个函数:
shorten<-function(txt) {tt<-rle(strsplit(txt, " > ")[[1]])
res<-""
for (i in 1:length(tt$values)) {
res<-paste0(res, ifelse(i>1, " > ", ""), tt$values[i], ifelse(tt$lengths[i]>1, paste0("(X", tt$lengths[i],")"), ""))
}
res
}
shorten(txt)
[1] "Direct(X3) > Endpage > Direct > Endpage > Direct(X3) > Endpage(X2) > Direct(X11) > Endpage > Direct"
如果您想将其应用于一列字符串列,请尝试:
lapply(data$column, shorten)
PS -只是为了好玩,我想出了一个简单的替代方案:
shorten2<-function(txt) gsub(" 1 "," ",paste(apply(sapply(rle(strsplit(txt, " > ")[[1]]), paste),1,function(x) paste(x, collapse=" ")),collapse=" > "))
但事实证明,这个解决方案避免了for循环,但引入了两个*apply
,在500行的列上实际上稍微慢了一点:
Unit: milliseconds
expr min lq mean median uq max neval
forloop 64.85126 66.08620 76.05589 68.54179 69.89208 191.8934 100
gsub 71.98645 73.45945 81.75625 75.83651 77.32290 186.3958 100
https://stackoverflow.com/questions/56046436
复制相似问题