文章/答案/技术大牛

发布

问用python打印Unigram计数
EN

Stack Overflow用户

提问于 2019-02-11 10:07:39

回答 4查看 1.2K关注 0票数 1

我有一个名为corpus.txt的文本文件，它包含以下4行文本

 peter piper picked a peck of pickled peppers 
 a peck of pickled peppers peter piper picked 
 if peter piper picked a peck of pickled peppers 
 where s the peck of pickled peppers peter piper picked

我希望程序的输出输出一个单词和它发生的次数，例如

4 peter
4 piper

等。

这是我写的代码

f = open("corpus.txt","r")
w, h = 100, 100;
k=1
a=0
uwordcount=[]
for i in range(100):
       uwordcount.append(0)
uword = [[0 for x in range(w)] for y in range(h)]
l = [[0 for x in range(w)] for y in range(h)] 
l[1] = f.readline()
l[2] = f.readline()
l[3] = f.readline()
l[4] = f.readline()
lwords = [[0 for x in range(w)] for y in range(h)] 
lwords[1]=l[1].split()
lwords[2]=l[2].split()
lwords[3]=l[3].split()
lwords[4]=l[4].split()
for i in [1,2,3,4]:
    for j in range(len(lwords[i])):
        uword[k]=lwords[i][j]
        uwordcount[k]=0
        for x in [1,2,3,4]:
            for y in range(len(lwords[i])):
                if uword[k] == lwords[x][y]:
                    uwordcount[k]=uwordcount[k]+1
        for z in range(k):
            if uword[k]==uword[z]:
                a=1

        if a==0:
            print(uwordcount[k],' ',uword[k])
            k=k+1

我收到错误了

回溯(最近一次调用)：文件“F：\\新文件夹\1.py”，第25行，在if uwordk == lwordsx: IndexError: list索引超出范围

有人能告诉我这里有什么问题吗？

python

回答 4

Stack Overflow用户

回答已采纳

发布于 2019-02-11 10:21:28

你在这里有太多不同的清单了。此外，不要依赖所有这些魔术数字的行数，每个列表的最大字数/条目，等等。与其为每行中的单词有一个列表，只对所有的单词使用一个列表。与第二个计数列表不同，只需使用一个字典来保存唯一的单词及其计数：

with open("corpus.txt") as f:
    counts = {}
    for line in f:
        for word in line.split():
            if word not in counts:
                counts[word] = 1
            else:
                counts[word] += 1

之后，counts如下所示：{'peter': 4, 'piper': 4, 'picked': 4, 'a': 3, 'peck': 4, 'of': 4, 'pickled': 4, 'peppers': 4, 'if': 1, 'where': 1, 's': 1, 'the': 1}用于检索单词和计数，您还可以使用一个循环：

for word in counts:
    print(word, counts[word])

当然，您可以使用collections.Counter在较少的代码行中进行同样的操作，但我认为手动执行可以帮助您更多地了解Python。

老实说，我不明白for i in [1,2,3,4]:下面的任何代码应该做什么。似乎你想为这些词创造一种共现矩阵？在这种情况下，我也会建议一个(嵌套的)字典，这样可以更容易地存储和检索语句。

with open("corpus.txt") as f:
    matrix = {}
    for line in f:
        for word1 in line.split():
            if word1 not in matrix:
                matrix[word1] = {}
            for word2 in line.split():
                if word2 != word1:
                    if word2 not in matrix[word1]:
                        matrix[word1][word2] = 1
                    else:
                        matrix[word1][word2] += 1

代码与以前几乎相同，但另一个嵌套循环在同一行的其他单词上循环。例如，"peter"的输出将是{'piper': 4, 'picked': 4, 'a': 3, 'peck': 4, 'of': 4, 'pickled': 4, 'peppers': 4, 'if': 1, 'where': 1, 's': 1, 'the': 1}

票数 2

Stack Overflow用户

发布于 2019-02-11 10:14:29

IndexError: list索引超出了范围，意味着您的索引试图访问列表之外的内容--您需要调试代码才能找到这种情况。

使用collections.Counter可以简化此任务：

# with open('corups.txt', 'r') as r: text = r.read()

text = """peter piper picked a peck of pickled peppers 
 a peck of pickled peppers peter piper picked 
 if peter piper picked a peck of pickled peppers 
 where s the peck of pickled peppers peter piper picked """

from collections import Counter

# split the text in lines, then each line into words and count those:
c = Counter( (x for y in text.strip().split("\n") for x in y.split()) )

# format the output
print(*(f"{cnt} {wrd}" for wrd,cnt in c.most_common()), sep="\n")

输出：

4 peter
4 piper
4 picked
4 peck
4 of
4 pickled
4 peppers
3 a
1 if
1 where
1 s
1 the

Stack Overflow用户

发布于 2019-02-11 10:25:13

老实说，我没有得到您的代码，因为您有更多的循环和不必要的逻辑(，我猜是)。所以我是用我自己的方式做的。

import pprint

with open('corups.txt', 'r') as cr:
     dic= {}  # Empty dictionary
     lines = cr.readlines()

     for line in lines:
         if line in dic:   # If key already exists in dic then add 1 to its value
             dic['line'] += 1

         else:
             dic['line'] = 1   # If key is not present in dic then create value as 1

pprint.pprint(dic)  # Using pprint built in function to print dictionary data types

如果您真的很着急，那么请使用collections.Counter

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/54628090

复制

相似问题

问用python打印Unigram计数
EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用python打印Unigram计数EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用python打印Unigram计数
EN