文章/答案/技术大牛

发布

问正、中、负词频率
EN

Stack Overflow用户

提问于 2022-03-22 03:48:04

回答 1查看 442关注 0票数 -1

在最后提交之前，我要在我的项目中做一些修正。我需要数我的代码中的正、中性和否定词。我之前也做过同样的事情，当时我试图在文本中查找单词频率，在文本中输出是可以的。

def gen_freq(text):
    word_list=[] #stores the list of words
        
    for words in text.split(): #Loop over all the reviews and extract words into word_list
        word_list.extend(words)

    word_freq=pd.Series(word_list).value_counts() #Create word frequencies using word_list

    word_freq[:20]

     #Print top 20 word
    print(word_freq)
    return word_freq[:20]
      
gen_freq(dataset.text.str)

我也曾尝试做同样的事情来产生积极词的词频：

def positive_freq(text):
    positive_list=[] #stores the list of words
        
    for words in text.split(): #Loop over all the reviews and extract words into word_list
        positive_list.extend(words)

    word_freq=pd.Series(positive_list).value_counts() #Create word frequencies using word_list

    word_freq[:20]

     #Print top 20 word
    print(word_freq)
    return word_freq[:20]
      
positive_freq(dataset.text.str)

我使用以下代码获得了数据：

with open('reviews.json') as project_file:    
    data = json.load(project_file)
dataset=pd.json_normalize(data) 
print(dataset.head())

正向频率的输出如下：

and                   136
a                     127
the                   114
iPad                  102
I                      69
                     ...
"fully                  1
didn't.                 1
would                   1
instructions...but      1
these                   1

情况不应如此，因为被确定为肯定的形容词如下：

Positive:
   polarity  adjectives
1  0.209881       right
1  0.209881         mad
1  0.209881        full
1  0.209881        full
1  0.209881        iPad
1  0.209881        iPad
1  0.209881         bad
1  0.209881   different
1  0.209881   wonderful
1  0.209881        much
1  0.209881  affordable
2  0.633333        stop
2  0.633333       great
2  0.633333     awesome
3  0.437143     awesome
4  0.398333         max
4  0.398333        high
4  0.398333        high
4  0.398333    Gorgeous
5  0.466667      decent
5  0.466667        easy
6  0.265146      itâ€™s
6  0.265146      bright
6  0.265146   wonderful
6  0.265146     amazing
6  0.265146        full
6  0.265146         few
6  0.265146        such
6  0.265146      facial
6  0.265146         Big
6  0.265146        much
8  0.161979         old
8  0.161979      little
8  0.161979        Easy
8  0.161979       daily
8  0.161979    thatâ€™s
8  0.161979        late
9  0.084762         few
9  0.084762        huge
9  0.084762  storage.If
9  0.084762         few

同样在生成频率的时候，我想在每个单词上绘制一个频率的条形图，比如如果右的频率是1，超棒的频率是2，它应该显示在图表上。中立性和负性也是如此。帮帮忙吧。

python

pandas

nlp

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-03-22 06:26:27

你的问题是，你希望机器知道积极/消极/中性的词。机器是如何从.split()的正词中知道的？您需要首先提供一个正/负/中性单词的预定义列表，然后在拆分后，您应该检查列表中是否存在每个标记。您可以通过多愁善感的字典，如多愁善感的网络、多愁善感或其他许多人或现有的python包来访问这样的列表。示例：

from textblob import TextBlob

sent = 'a very simple and good sample'
pos_word_list = []
neg_word_list = []
neu_word_list = []

for word in sent.split():
    testimonial = TextBlob(word)
    if testimonial.sentiment.polarity >= 0.5:
        pos_word_list.append(word)
    elif testimonial.sentiment.polarity <= -0.5:
        neg_word_list.append(word)
    else:
        neu_word_list.append(word)

输出：

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/71566693

复制

相似问题

问正、中、负词频率
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问正、中、负词频率EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问正、中、负词频率
EN