文章/答案/技术大牛

发布

社区首页 >问答首页 >停用词不会使用python删除

问停用词不会使用python删除
EN

Stack Overflow用户

提问于 2020-05-25 02:45:07

回答 3查看 163关注 0票数 0

我正在尝试从我拥有的令牌列表中删除停用词。但是，看起来文字并没有被删除。会有什么问题呢？谢谢。

已尝试：

Trans = []
    with open('data.txt', 'r') as myfile:
        file = myfile.read()
            #start readin from the start of the charecter
        myfile.seek(0)
        for row in myfile:
            split = row.split()
            Trans.append(split)
        myfile.close()


    stop_words = list(get_stop_words('en'))         
    nltk_words = list(stopwords.words('english')) 
    stop_words.extend(nltk_words)

    output = [w for w in Trans if not w in stop_words]


    Input: 

    [['Apparent',
      'magnitude',
      'is',
      'a',
      'measure',
      'of',
      'the',
      'brightness',
      'of',
      'a',
      'star',
      'or',
      'other']]

    output:

    It returns the same words as input.

python

nlp

stop-words

回答 3

Stack Overflow用户

回答已采纳

发布于 2020-05-25 02:50:30

我认为Trans.append(split)应该是Trans.extend(split)，因为split返回一个列表。

票数 1

Stack Overflow用户

发布于 2020-05-25 03:07:58

由于输入包含列表列表，因此需要遍历一次外部列表和内部列表元素，然后使用以下命令获得正确的输出

output = [j for w in Trans for j in w if j not in stop_words]

票数 1

Stack Overflow用户

发布于 2020-05-25 03:14:16

为了提高可读性，创建一个函数。例如：

def drop_stopwords(row):
    stop_words = set(stopwords.words('en'))
    return [word for word in row if word not in stop_words and word not in list(string.punctuation)]

并且with open()不需要close()并创建字符串(句子)列表并应用该函数。例如：

Trans = Trans.map(str).apply(drop_stopwords)

这将应用于每一句话...您可以为lemmitize等添加其他函数。这里有一个非常清晰的示例(代码)：https://github.com/SamLevinSE/job_recommender_with_NLP/blob/master/job_recommender_data_mining_JOBS.ipynb

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/61990749

复制

相似问题

问停用词不会使用python删除
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问停用词不会使用python删除EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问停用词不会使用python删除
EN