文章/答案/技术大牛

发布

社区首页 >问答首页 >当微调HuggingFace NLI模型( RoBERTa/BART)时，损失是“nan”。

问当微调HuggingFace NLI模型( RoBERTa/BART)时，损失是“nan”。
EN

Stack Overflow用户

提问于 2020-12-16 22:41:32

回答 1查看 1.2K关注 0票数 0

我正在使用HuggingFace的变压器库，我试图微调一个经过预先训练的NLI模型(ynie/roberta-large-snli_mnli_fever_anli_R1_R2_R3-nli)，该模型包含大约276.000个假设--前提对。我正在按照docs 这里和这里的指示进行微调。我的印象是，微调工作(它进行培训并保存检查点)，但trainer.train()和trainer.evaluate()返回"nan“以弥补损失。

我尝试过的：

我试着使用ynie/roberta-large-snli_mnli_fever_anli_R1_R2_R3-nli和facebook/bart-large-mnli来确保它没有链接到特定的模型，但是我发现了这两个模型的问题
我试着遵循这个相关github问题中的建议，但是将num_labels=3添加到配置文件并不能解决这个问题。(我认为我的问题是不同的，因为在我的例子中，模型已经在NLI上进行了微调)
我尝试了许多不同的方法来改变我的输入数据，因为我怀疑我的输入数据可能有问题，但我也不能这样解决它。
问题的可能来源：I在训练期间检查了模型的预测输出，奇怪的是，在100%的情况下，预测值总是"0“(包含)(参见下面代码中的打印输出)。这显然是个错误。我认为这方面的来源是，模型在培训期间似乎返回的逻辑是torch.tensor([[np.nan, np.nan, np.nan]])，当您将.argmax(-1)应用于此时，您将得到torch.tensor(0)。对我来说，最大的谜团是为什么逻辑会变成"nan"，因为当我只在训练器之外使用相同的输入数据时，模型不会这样做。=>，有人知道这个问题是从哪里来的吗？请看下面的代码。

提前谢谢您的建议！

这是我的代码：

### load model & tokenize
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

max_length = 256
hg_model_hub_name = "ynie/roberta-large-snli_mnli_fever_anli_R1_R2_R3-nli"
# also tried: hg_model_hub_name = "facebook/bart-large-mnli"
tokenizer = AutoTokenizer.from_pretrained(hg_model_hub_name)
model = AutoModelForSequenceClassification.from_pretrained(hg_model_hub_name)
model.config

device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Device: {device}")
if device == "cuda":
  model = model.half()
model.to(device)
model.train();

#... some data preprocessing

encodings_train = tokenizer(premise_train, hypothesis_train, return_tensors="pt", max_length=max_length,
                            return_token_type_ids=True, truncation=False, padding=True)
encodings_val = tokenizer(premise_val, hypothesis_val, return_tensors="pt", max_length=max_length,
                          return_token_type_ids=True, truncation=False, padding=True)
encodings_test = tokenizer(premise_test, hypothesis_test, return_tensors="pt", max_length=max_length,
                           return_token_type_ids=True, truncation=False, padding=True)


### create pytorch dataset object
class XDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels
    def __getitem__(self, idx):
        item = {key: torch.as_tensor(val[idx]) for key, val in self.encodings.items()}
        #item = {key: torch.as_tensor(val[idx]).to(device) for key, val in self.encodings.items()}
        item['labels'] = torch.as_tensor(self.labels[idx])
        #item['labels'] = self.labels[idx]
        return item
    def __len__(self):
        return len(self.labels)

dataset_train = XDataset(encodings_train, label_train)
dataset_val = XDataset(encodings_val, label_val)
dataset_test = XDataset(encodings_test, label_test)

# compute metrics with trainer
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
def compute_metrics(pred):
    labels = pred.label_ids
    print(labels)
    preds = pred.predictions.argmax(-1)
    print(preds)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='binary', pos_label=0)
    acc = accuracy_score(labels, preds)
    return {
        'accuracy': acc,
        'f1': f1,
        'precision': precision,
        'recall': recall
    }


## training
from transformers import Trainer, TrainingArguments

# https://huggingface.co/transformers/main_classes/trainer.html#transformers.TrainingArguments
training_args = TrainingArguments(
    output_dir='./results',          # output directory
    num_train_epochs=1,              # total number of training epochs
    per_device_train_batch_size=8,  # batch size per device during training
    per_device_eval_batch_size=8,   # batch size for evaluation
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./logs',            # directory for storing logs
    logging_steps=100,
)

trainer = Trainer(
    model=model,                         # the instantiated  Transformers model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=dataset_train,         # training dataset
    eval_dataset=dataset_val             # evaluation dataset
)

trainer.train()
# output: TrainOutput(global_step=181, training_loss=nan)
trainer.evaluate()
# output: 
[2 2 2 0 0 2 2 2 0 2 0 0 2 2 2 2 0 2 0 2 2 2 2 0 2 0 2 0 0 2 0 0 2 0 0 0 2
 0 2 0 0 0 0 0 2 0 0 2 2 2 0 2 2 2 2 2 0 0 0 0 2 0 0 0 2 2 0 0 0 2 0 0 0 2
 2 0 2 0 0 2 2 2 0 2 2 0 0 0 0 0 0 0 2 0 0 0 0 2 0 2 2 0 2 0 0 2 2 2 2 2 2
 2 0 0 0 0 2 0 0 2 0 0 0 0 2 2 2 0 0 0 0 0 2 0 0 2 0 2 0 2 0 2 0 0 2 2 0 0
 2 2 2 2 2 2 0 0 2 2 2 2 0 2 0 0 2 2 2 0 0 2 0 2 0 2 0 0 0 0 0 0 2 0 0 2 2
 0 2 2 2 0 2 2 0 2 2 2 2 2 2 0 0 2 0 0 2 2 0 0 0 2 0 2 2 2 0 0 0 0 0 0 0 0
 2 0 2 2 2 0 2 0 0 2 0 2 2 0 0 0 0 2 2 2 0 0 0 2 2 2 2 0 2 0 2 2 2]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

{'epoch': 1.0,
 'eval_accuracy': 0.5137254901960784,
 'eval_f1': 0.6787564766839378,
 'eval_loss': nan,
 'eval_precision': 0.5137254901960784,
 'eval_recall': 1.0}

编辑：我也打开了一个github问题，这里有一个更详细的描述：https://github.com/huggingface/transformers/issues/9160

python

nlp

pytorch

huggingface-transformers

huggingface-tokenizers

回答 1

Stack Overflow用户

发布于 2020-12-17 13:47:43

我从github上的HuggingFace团队那里得到了一个很好的答案。问题是model.half()，它具有提高速度和减少内存使用的优点，但它也以产生错误的方式改变模型。删除model.half()为我解决了这个问题。有关详细信息，请参阅https://github.com/huggingface/transformers/issues/9160

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/65332165

复制

相似问题

问当微调HuggingFace NLI模型( RoBERTa/BART)时，损失是“nan”。
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问当微调HuggingFace NLI模型( RoBERTa/BART)时，损失是“nan”。EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问当微调HuggingFace NLI模型( RoBERTa/BART)时，损失是“nan”。
EN