我有一个名为corpus.txt的文本文件,它包含以下4行文本
peter piper picked a peck of pickled peppers
a peck of pickled peppers peter piper picked
if peter piper picked a peck of pickled peppers
where s the peck of pickled peppers peter piper picked
我希望程序的输出输出一个单词和它发生的次数,例如
4 peter
4 piper
等。
这是我写的代码
f = open("corpus.txt","r")
w, h = 100, 100;
k=1
a=0
uwordcount=[]
for i in range(100):
uwordcount.append(0)
uword = [[0 for x in range(w)] for y in range(h)]
l = [[0 for x in range(w)] for y in range(h)]
l[1] = f.readline()
l[2] = f.readline()
l[3] = f.readline()
l[4] = f.readline()
lwords = [[0 for x in range(w)] for y in range(h)]
lwords[1]=l[1].split()
lwords[2]=l[2].split()
lwords[3]=l[3].split()
lwords[4]=l[4].split()
for i in [1,2,3,4]:
for j in range(len(lwords[i])):
uword[k]=lwords[i][j]
uwordcount[k]=0
for x in [1,2,3,4]:
for y in range(len(lwords[i])):
if uword[k] == lwords[x][y]:
uwordcount[k]=uwordcount[k]+1
for z in range(k):
if uword[k]==uword[z]:
a=1
if a==0:
print(uwordcount[k],' ',uword[k])
k=k+1
我收到错误了
回溯(最近一次调用):文件“F:\\新文件夹\1.py”,第25行,在if uwordk == lwordsx: IndexError: list索引超出范围
有人能告诉我这里有什么问题吗?
发布于 2019-02-11 10:21:28
你在这里有太多不同的清单了。此外,不要依赖所有这些魔术数字的行数,每个列表的最大字数/条目,等等。与其为每行中的单词有一个列表,只对所有的单词使用一个列表。与第二个计数列表不同,只需使用一个字典来保存唯一的单词及其计数:
with open("corpus.txt") as f:
counts = {}
for line in f:
for word in line.split():
if word not in counts:
counts[word] = 1
else:
counts[word] += 1
之后,counts
如下所示:{'peter': 4, 'piper': 4, 'picked': 4, 'a': 3, 'peck': 4, 'of': 4, 'pickled': 4, 'peppers': 4, 'if': 1, 'where': 1, 's': 1, 'the': 1}
用于检索单词和计数,您还可以使用一个循环:
for word in counts:
print(word, counts[word])
当然,您可以使用collections.Counter
在较少的代码行中进行同样的操作,但我认为手动执行可以帮助您更多地了解Python。
老实说,我不明白for i in [1,2,3,4]:
下面的任何代码应该做什么。似乎你想为这些词创造一种共现矩阵?在这种情况下,我也会建议一个(嵌套的)字典,这样可以更容易地存储和检索语句。
with open("corpus.txt") as f:
matrix = {}
for line in f:
for word1 in line.split():
if word1 not in matrix:
matrix[word1] = {}
for word2 in line.split():
if word2 != word1:
if word2 not in matrix[word1]:
matrix[word1][word2] = 1
else:
matrix[word1][word2] += 1
代码与以前几乎相同,但另一个嵌套循环在同一行的其他单词上循环。例如,"peter"
的输出将是{'piper': 4, 'picked': 4, 'a': 3, 'peck': 4, 'of': 4, 'pickled': 4, 'peppers': 4, 'if': 1, 'where': 1, 's': 1, 'the': 1}
发布于 2019-02-11 10:14:29
IndexError: list索引超出了范围,意味着您的索引试图访问列表之外的内容--您需要调试代码才能找到这种情况。
使用collections.Counter可以简化此任务:
# with open('corups.txt', 'r') as r: text = r.read()
text = """peter piper picked a peck of pickled peppers
a peck of pickled peppers peter piper picked
if peter piper picked a peck of pickled peppers
where s the peck of pickled peppers peter piper picked """
from collections import Counter
# split the text in lines, then each line into words and count those:
c = Counter( (x for y in text.strip().split("\n") for x in y.split()) )
# format the output
print(*(f"{cnt} {wrd}" for wrd,cnt in c.most_common()), sep="\n")
输出:
4 peter
4 piper
4 picked
4 peck
4 of
4 pickled
4 peppers
3 a
1 if
1 where
1 s
1 the
相关信息:
发布于 2019-02-11 10:25:13
老实说,我没有得到您的代码,因为您有更多的循环和不必要的逻辑(,我猜是)。所以我是用我自己的方式做的。
import pprint
with open('corups.txt', 'r') as cr:
dic= {} # Empty dictionary
lines = cr.readlines()
for line in lines:
if line in dic: # If key already exists in dic then add 1 to its value
dic['line'] += 1
else:
dic['line'] = 1 # If key is not present in dic then create value as 1
pprint.pprint(dic) # Using pprint built in function to print dictionary data types
如果您真的很着急,那么请使用collections.Counter
https://stackoverflow.com/questions/54628090
复制相似问题