让我们考虑下面的向量:
x <- c("GDP_UK", "GDP_US", "GDP_UK_diff2_L2",
"INC","GDP_UK_L2", "GDP_US_level", "INC_UK", "INC_L1", "INC_diff1")正如你所看到的,有一个包含一些字符串的向量。
我想要做的是找到其中包含"_diff(number)", "_L(number), _level的人,并截断字符串的这一部分。
我想要得到的是下面的向量:
c("GDP_UK", "GDP_US", "GDP_UK", "INC", "GDUP_UK", "GDP_US", "INC_UK", "INC", "INC")
如您所见,所有_diff, _L, _level都被截断以获得原始字符串。
我不知道该怎么做。我试过写代码
x[grepl(paste(c("diff", "level", "_L"), collapse = "|"), x)]
只获取包含grepl、level或_L的元素,但我不知道如何裁剪它。尝试使用substring,但不确定如何指定要删除的字母。你知道怎么做吗?
**编辑**
我们可以使用以下代码:
x <- gsub(pattern = "_L", replacement = "", x)
x <- gsub(pattern = "_diff", replacement = "", x)
x <- gsub(pattern = "_level", replacement = "", x)然而,我们将在字符串的末尾得到剩余的数字:
"GDP_UK" "GDP_US" "GDP_UK22" "INC" "GDP_UK2" "GDP_US" "INC_UK" "INC2" "INC1" 发布于 2021-03-26 01:19:32
你正在寻找的是正则表达式"_L\\d*",等等。它匹配一个下划线,L和零或多个数字。
完整的
x <- c("GDP_UK", "GDP_US", "GDP_UK_diff2_L2",
"INC","GDP_UK_L2", "GDP_US_level", "INC_UK", "INC_L1", "INC_diff1")
gsub("_L\\d*", "", x)
gsub("_diff\\d*", "", x)
gsub("_level\\d*", "", x)
# or in one go:
library(stringr)
x %>%
str_replace_all("_L\\d*", "") %>%
str_replace_all("_diff\\d*", "") %>%
str_replace_all("_level\\d*", "")
#> [1] "GDP_UK" "GDP_US" "GDP_UK" "INC" "GDP_UK" "GDP_US" "INC_UK" "INC"
#> [9] "INC"
## or even in one go:
gsub("_(L|diff|level)\\d*", "", x)
#> [1] "GDP_UK" "GDP_US" "GDP_UK" "INC" "GDP_UK" "GDP_US" "INC_UK" "INC"
#> [9] "INC"https://stackoverflow.com/questions/66804307
复制相似问题