因此,我有一系列的单词,存储为键值对。现在,我正在尝试计算字符串数组中单词的频率,tokens
。我尝试了以下方法,但这没有找到x的索引,因为它只是一个字符串。我在令牌数组中没有对应的x
值(如果有的话)。有没有任何方法可以直接访问它,而不是增加一个循环来先找到它呢?
for x in tokens:
if ((x in [c.keys()[0] for c in words])):
words[words.index(x)].values()[0]+=1
else:
words.append({x:1})
发布于 2015-09-20 02:36:16
若要计数字符串数组中单词的频率,可以使用collections
中的Counter
In [89]: from collections import Counter
In [90]: s=r'So I have an array of words, stored as key value pairs. Now I am trying to count the frequency of words in an array of strings, tokens. I have tried the following but this doesnt find the index of x as it is only a string. I do not have the corresponding value, if any, of x in tokens array. Is there any way to directly access it rather than adding one more loop to find it first?'
In [91]: tokens=s.split()
In [92]: c=Counter(tokens)
In [93]: print c
Counter({'of': 5, 'I': 4, 'the': 4, 'it': 3, 'have': 3, 'to': 3, 'an': 2, 'as': 2, 'in': 2, 'array': 2, 'find': 2, 'x': 2, 'value,': 1, 'words': 1, 'do': 1, 'there': 1, 'is': 1, 'am': 1, 'frequency': 1, 'if': 1, 'string.': 1, 'index': 1, 'one': 1, 'directly': 1, 'tokens.': 1, 'any': 1, 'access': 1, 'only': 1, 'array.': 1, 'way': 1, 'doesnt': 1, 'Now': 1, 'words,': 1, 'more': 1, 'a': 1, 'corresponding': 1, 'tried': 1, 'than': 1, 'adding': 1, 'strings,': 1, 'but': 1, 'tokens': 1, 'So': 1, 'key': 1, 'first?': 1, 'not': 1, 'trying': 1, 'pairs.': 1, 'count': 1, 'this': 1, 'Is': 1, 'value': 1, 'rather': 1, 'any,': 1, 'stored': 1, 'following': 1, 'loop': 1})
In [94]: c['of']
Out[94]: 5
编辑:
当有一个外部循环时,手动计数单词。每个迭代都会改变令牌,@Alexander建议的是一个很好的方法。此外,Counter
支持+
操作符,这使得累积计数更容易:
In [30]: (c+c)['of']
Out[30]: 10
发布于 2015-09-20 02:42:20
您肯定想使用@zhangzaochen建议的Counter
。
但是,下面是一种更有效的编写代码的方法:
words = {}
for x in tokens:
if x in words:
words[x] += 1
else:
words[x] = 1
您还可以使用列表理解:
tokens = "I wish I went".split()
words = {}
_ = [words.update({word: 1 if word not in words else words[word] + 1})
for word in tokens]
>>> words
{'I': 2, 'went': 1, 'wish': 1}
https://stackoverflow.com/questions/32678322
复制相似问题