文章/答案/技术大牛

发布

社区首页 >问答首页 >令牌索引序列长度大于此模型指定的最大序列长度(651 > 512)，该模型使用拥抱面部情感分类器

问令牌索引序列长度大于此模型指定的最大序列长度(651 > 512)，该模型使用拥抱面部情感分类器
EN

Stack Overflow用户

提问于 2021-04-05 22:33:11

回答 1查看 3.7K关注 0票数 4

我正在尝试通过拥抱脸部情感分析预训练模型来获取评论的情感。它像Token indices sequence length is longer than the specified maximum sequence length for this model (651 > 512) with Hugging face sentiment classifier一样返回错误。

下面我附上了代码，请看一下

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
import transformers
import pandas as pd

model = AutoModelForSequenceClassification.from_pretrained('/content/drive/MyDrive/Huggingface-Sentiment-Pipeline')
token = AutoTokenizer.from_pretrained('/content/drive/MyDrive/Huggingface-Sentiment-Pipeline')

classifier = pipeline(task='sentiment-analysis', model=model, tokenizer=token)

data = pd.read_csv('/content/drive/MyDrive/DisneylandReviews.csv', encoding='latin-1')

data.head()

输出为

    Review
0   If you've ever been to Disneyland anywhere you...
1   Its been a while since d last time we visit HK...
2   Thanks God it wasn t too hot or too humid wh...
3   HK Disneyland is a great compact park. Unfortu...
4   the location is not in the city, took around 1...

紧接着是

classifier("My name is mark")

输出为

[{'label': 'POSITIVE', 'score': 0.9953688383102417}]

后跟代码

basic_sentiment = [i['label'] for i in value if 'label' in i]
basic_sentiment

输出为

['POSITIVE']

将总行追加到空列表

text = []

for index, row in data.iterrows():
    text.append(row['Review'])

我正在尝试获取所有行的情绪

sent = []

for i in range(len(data)):
    sentiment = classifier(data.iloc[i,0])
    sent.append(sentiment)

错误是：

Token indices sequence length is longer than the specified maximum sequence length for this model (651 > 512). Running this sequence through the model will result in indexing errors
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-19-4bb136563e7c> in <module>()
      2 
      3 for i in range(len(data)):
----> 4     sentiment = classifier(data.iloc[i,0])
      5     sent.append(sentiment)

11 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   1914         # remove once script supports set_grad_enabled
   1915         _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 1916     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
   1917 
   1918 

IndexError: index out of range in self

deep-learning

nlp

sentiment-analysis

huggingface-transformers

huggingface-tokenizers

回答 1

Stack Overflow用户

发布于 2021-04-06 00:46:53

数据框的Review列中的某些句子太长。当这些句子被转换为标记并发送到模型中时，它们超出了model的512 seq_length限制，sentiment-analysis任务中使用的模型嵌入在512标记嵌入上进行了训练。

要解决此问题，您可以过滤掉长句子，只保留较小的句子(标记长度< 512 )

或者，您可以使用truncating = True截断句子

sentiment = classifier(data.iloc[i,0], truncation=True)

票数 5

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/66954682

复制

相似问题

问令牌索引序列长度大于此模型指定的最大序列长度(651 > 512)，该模型使用拥抱面部情感分类器
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问令牌索引序列长度大于此模型指定的最大序列长度(651 > 512)，该模型使用拥抱面部情感分类器EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问令牌索引序列长度大于此模型指定的最大序列长度(651 > 512)，该模型使用拥抱面部情感分类器
EN