我有一个字符串向量:
keywords <- c("kw 1", "kw2", "kw3", "kw4", "kw5", "kw6", "kw7", "kw8",
"kw 9 kw", "kw10", "kw11", "kw12", "kw13", "kw14", "kw15")
和具有空列关键字的数据框:
df <- data.frame("Description" = c("blabla kw10", "blabla kw15","blabla kw 1",
"blabla kw13", "blabla kw7", "kw2 bla", "kw8 blabla","bla kw11 bla",
"blabla kw10","blakw 9 kw", "blablakw4", "blakw 1bla"),
"Keyword" = NA)
我需要找到一种方法来查找关键字向量中的字符串,该字符串与描述变量中的值部分匹配,并从关键字向量中返回该匹配字符串作为df数据帧中关键字列的值。
我需要此结果
df <- data.frame("Description" = c("blabla kw10", "blabla kw15","blabla kw 1",
"blabla kw13", "blabla kw7", "kw2 bla", "kw8 blabla","bla kw11 bla",
"blabla kw10","blakw 9 kw", "blablakw4", "blakw 1bla"),
"Keyword" = c("kw10", "kw15", "kw 1", "kw13", "kw7", "kw2", "kw8", "kw11", "kw10", "kw 9 kw", "kw4", "kw 1"))
您能为此提出任何解决方案吗?
编辑:
keywords2矢量和df2数据帧的可重现示例:
keywords2 <- c("cartucho", "MOLDE", "FILTRO", "BOMBA", "MOTOR")
df2 <- data.frame("Description" = c("CULATA PARA MOTOR", "BOMBA CENTRIFUGA PARA LIQUIDOS",
" CARTUCHO FILTRANTE", "APARATO FILTRO MONITOR", "MOLDES PARA QUESO",
"BOMBA PERISTALTICA", "MOLDE CON TAPA Y DESUERADOR",
"APARATO FILTRO DE MEMBRANA", "BOMBA DE VACIO"),
"Keyword" = NA)
预期结果:
df2 <- data.frame("Description" = c("CULATA PARA MOTOR", "BOMBA CENTRIFUGA PARA LIQUIDOS",
" CARTUCHO FILTRANTE", "APARATO FILTRO MONITOR", "MOLDES PARA QUESO",
"BOMBA PERISTALTICA", "MOLDE CON TAPA Y DESUERADOR",
"APARATO FILTRO DE MEMBRANA", "BOMBA DE VACIO"),
"Keyword" = c("MOTOR", "BOMBA", "cartucho", "FILTRO", "MOLDE", "BOMBA", "MOLDE", "FILTRO", "BOMBA")
发布于 2018-06-05 02:15:40
我们可以使用str_extract
library(stringr)
df$Keyword <- str_extract(df$Description, paste(keywords, collapse='|'))
df$Keyword
#[1] "kw10" "kw15" "kw 1" "kw13" "kw7" "kw2" "kw8"
#[8] "kw11" "kw10" "kw 9 kw" "kw4" "kw 1"
更新
使用新的数据集和关键字,将'keywords2‘转换为大写,然后将其paste
在一起作为str_extract
的pattern
str_extract(df2$Description, paste(toupper(keywords2), collapse="|"))
#[1] "MOTOR" "BOMBA" "CARTUCHO" "FILTRO" "MOLDE" "BOMBA" "MOLDE"
#[8] "FILTRO" "BOMBA"
https://stackoverflow.com/questions/50686493
复制相似问题