print(NGramLM.ngram_counts)所以,(NGramLM.ngram_counts)会给我这个
Counter({('Natural-language', ('<s>', '<s>')): 1, ('processing', ('<s>', 'Natural-language')): 1, ('processing', ('Natural-language', 'processing')): 1, ('is', ('processing', 'processing')): 1, ('an', ('processing', 'is')): 1, ('area', ('is', 'an')): 1, ('is', ('an', 'area')): 1, ('an', ('area', 'is')): 1, ('of', ('is', 'an')): 1, ('Natural-language', ('an', 'of')): 1, ('processing', ('of', 'Natural-language')): 1, ('(NLP)', ('Natural-language', 'processing')): 1, ('</s>', ('processing', '(NLP)')): 1, ('</s>', ('(NLP)', '</s>')): 1})我需要提取元组中的每个元组并将其插入列表中。
当我这么做的时候,
context_list = ([x[1] for x in NGramLM.ngram_counts])
print(context_list)我明白了
[('<s>', '<s>'), ('<s>', 'Natural-language'), ('Natural-language', 'processing'), ('processing', '(NLP)'), ('(NLP)', 'is'), ('is', 'an'), ('an', 'area'), ('area', 'is'), ('is', 'an'), ('an', 'of'), ('of', 'Natural-language'), ('processing', '(NLP)'), ('(NLP)', '</s>')]但是('Natural-language', 'processing')只出现一次,它应该在context_list中显示两次。我不知道为什么会这样!
输出:(检查最后的第三个元组)
[('<s>', '<s>'), ('<s>', 'Natural-language'), ('Natural-language', 'processing'), ('processing', '(NLP)'), ('(NLP)', 'is'), ('is', 'an'), ('an', 'area'), ('area', 'is'), ('is', 'an'), ('an', 'of'), ('of', 'Natural-language'), ('Natural-language', 'processing'),('processing', '(NLP)'), ('(NLP)', '</s>')]发布于 2019-09-06 18:52:00
您可以使用elements()对象的Counter方法获得所需的列表,并根据项目的计数重复这些项:
context_list = [x for _, x in NGramLM.ngram_counts.elements()]https://stackoverflow.com/questions/57827066
复制相似问题