在最后提交之前,我要在我的项目中做一些修正。我需要数我的代码中的正、中性和否定词。我之前也做过同样的事情,当时我试图在文本中查找单词频率,在文本中输出是可以的。
def gen_freq(text):
word_list=[] #stores the list of words
for words in text.split(): #Loop over all the reviews and extract words into word_list
word_list.extend(words)
word_freq=pd.Series(word_list).value_counts() #Create word frequencies using word_list
word_freq[:20]
#Print top 20 word
print(word_freq)
return word_freq[:20]
gen_freq(dataset.text.str)
我也曾尝试做同样的事情来产生积极词的词频:
def positive_freq(text):
positive_list=[] #stores the list of words
for words in text.split(): #Loop over all the reviews and extract words into word_list
positive_list.extend(words)
word_freq=pd.Series(positive_list).value_counts() #Create word frequencies using word_list
word_freq[:20]
#Print top 20 word
print(word_freq)
return word_freq[:20]
positive_freq(dataset.text.str)
我使用以下代码获得了数据:
with open('reviews.json') as project_file:
data = json.load(project_file)
dataset=pd.json_normalize(data)
print(dataset.head())
正向频率的输出如下:
and 136
a 127
the 114
iPad 102
I 69
...
"fully 1
didn't. 1
would 1
instructions...but 1
these 1
情况不应如此,因为被确定为肯定的形容词如下:
Positive:
polarity adjectives
1 0.209881 right
1 0.209881 mad
1 0.209881 full
1 0.209881 full
1 0.209881 iPad
1 0.209881 iPad
1 0.209881 bad
1 0.209881 different
1 0.209881 wonderful
1 0.209881 much
1 0.209881 affordable
2 0.633333 stop
2 0.633333 great
2 0.633333 awesome
3 0.437143 awesome
4 0.398333 max
4 0.398333 high
4 0.398333 high
4 0.398333 Gorgeous
5 0.466667 decent
5 0.466667 easy
6 0.265146 it’s
6 0.265146 bright
6 0.265146 wonderful
6 0.265146 amazing
6 0.265146 full
6 0.265146 few
6 0.265146 such
6 0.265146 facial
6 0.265146 Big
6 0.265146 much
8 0.161979 old
8 0.161979 little
8 0.161979 Easy
8 0.161979 daily
8 0.161979 that’s
8 0.161979 late
9 0.084762 few
9 0.084762 huge
9 0.084762 storage.If
9 0.084762 few
同样在生成频率的时候,我想在每个单词上绘制一个频率的条形图,比如如果右的频率是1,超棒的频率是2,它应该显示在图表上。中立性和负性也是如此。帮帮忙吧。
发布于 2022-03-22 06:26:27
你的问题是,你希望机器知道积极/消极/中性的词。机器是如何从.split()的正词中知道的?您需要首先提供一个正/负/中性单词的预定义列表,然后在拆分后,您应该检查列表中是否存在每个标记。您可以通过多愁善感的字典,如多愁善感的网络、多愁善感或其他许多人或现有的python包来访问这样的列表。示例:
from textblob import TextBlob
sent = 'a very simple and good sample'
pos_word_list = []
neg_word_list = []
neu_word_list = []
for word in sent.split():
testimonial = TextBlob(word)
if testimonial.sentiment.polarity >= 0.5:
pos_word_list.append(word)
elif testimonial.sentiment.polarity <= -0.5:
neg_word_list.append(word)
else:
neu_word_list.append(word)
输出:
https://stackoverflow.com/questions/71566693
复制相似问题