文章/答案/技术大牛

发布

社区首页 >问答首页 >基于单词列表R的分类

问基于单词列表R的分类
EN

Stack Overflow用户

提问于 2019-02-07 06:54:04

回答 1查看 437关注 0票数 0

我有一个数据集，其中包含文章标题和摘要，我希望根据匹配词对其进行分类。

“这是一个文本示例，我希望根据列表中匹配的单词进行分类。这大约有2-3个句子长。word4，word5，text，text，text”

Topic 1     Topic 2     Topic (X)
word1       word4       word(a)
word2       word5       word(b)
word3       word6       word(c)

鉴于上面的文本与主题2中的单词相匹配，我想用这个标签分配一个新的列。如果这可以用“整洁的verse”包来完成，那就更好了。

text-classification

stringr

回答 1

Stack Overflow用户

发布于 2019-02-08 03:20:27

给定一个字符串形式的句子和数据框中的主题，您可以这样做

input<- c("This is an example of text that I want to classify based on the words that are matched from a list. This would be about 2 - 3 sentences long. word4, word5, text, text, text")
df <- data.frame(Topic1 = c("word1", "word2", "word3"),Topic2 = c("word4", "word5", "word6"))

## This splits on space and punctation (only , and .)
input<-unlist(strsplit(input, " |,|\\."))

newcol <- paste(names(df)[apply(df,2, function(x) sum(input %in% x) > 0)], collapse=", ")

考虑到我不确定你想要添加的数据框，我也做了一个向量新列。

如果你有一个长句子的数据框架，那么你可以使用类似的方法。

inputdf<- data.frame(title=c("This is an example of text that I want to classify based on the words that are matched from a list. This would be about 2 - 3 sentences long. word4, word5, text, text, text", "word2", "word3, word4"))
input <- strsplit(as.character(inputdf$title), " |,|\\.")

inputdf$newcolmn <-unlist(lapply(input, function(x) paste(names(df)[apply(df,2, function(y) sum(x %in% y)>0)], collapse = ", ")))

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/54563814

复制

相似问题

问基于单词列表R的分类
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问基于单词列表R的分类EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问基于单词列表R的分类
EN