我目前正在努力解决这个问题。需要帮助。我正在努力实现的是答案:如下:(预期答案)
{'and': 1.91629073, 'document': 1.22314355, 'first': 1.51082562, 'is': 1., 'one': 1.91629073, 'second': 1.91629073, 'the': 1., 'third': 1.91629073, 'this,: 1.} ]
但我实际得到的是:
{'and': 2.791759469228055, 'document': 3.0794415416798357, 'first': 2.9459101490553135, 'is': 3.1972245773362196, 'one': 2.791759469228055, 'second': 2.791759469228055, 'the': 3.1972245773362196, 'third': 2.791759469228055, 'this': 3.1972245773362196}
主要考虑的是这个代码:代码和公式是正确的,但由于某种原因,它在in len(语料库)中接受了几个额外的数值。
vocab1[word2] = 1+(math.log(1+len(corpus)/1+count))
实际的代码从这里开始:
corpus = [
'this is the first document',
'this document is the second document',
'and this is the third one',
'is this the first document',]
import math
unique_words = set() # at first we will initialize an empty set
lenofcorpus= len(corpus)
# print(lenofcorpus)
vocab1 = dict()
# vocab = dict()
# check if its list type or not
if isinstance(corpus, (list,)):
for row in corpus: # for each review in the dataset
for word in row.split(" "): # for each word in the review. #split method converts a string into list of words
if len(word) < 2:
continue
unique_words.add(word)
unique_words = sorted(list(unique_words))
# print(unique_words)
for idx, word2 in enumerate(unique_words) :
count = 0
for sentence in corpus :
if word2 in sentence :
count+=1
# print(word2, count)
vocab1[word2] = count
# print(lenofcorpus)
vocab1[word2] = 1+(math.log(1+len(corpus)/1+count))#its taking log of 12/2 instead it should take 5/2, its taking 7 extra or six
print(vocab1)
我想知道怎样才能得到想要的答案。其次,得出这个答案的思维过程是什么,我做错了什么。如果有人能给出一个解释,那真的会很有帮助。我知道我在字典循环函数和赋值方面也做了一些错误的事情。仅供参考:len(语料库)=4#这是整个语料库的长度,它有4个句子。
发布于 2020-08-24 14:38:52
您缺少括号。您描述的结果与您想要的结果相对应:
vocab1[word2] = 1+(math.log((1+len(corpus))/(1+count)))
或拼写为:
numerator = (1+len(corpus))
denominator = (1+count)
result = 1+math.log(numerator/denominator)
你最初写的东西等同于
vocab1[word2] = 1+math.log(1+(len(corpus)/1)+count)
当你指的是x/(y+z)
时编写x/y+z
是一个很常见的错误,或者当你指的是(x+y)/z
时编写x+y/z
是一个很常见的错误,你已经设法同时做到了这两点。
https://stackoverflow.com/questions/63562995
复制相似问题