我正在使用Hugging,Keras,Tensorflow库对BERT模型进行微调。
从昨天开始,我在Google Colab中运行我的代码时遇到了这个错误。奇怪的是,代码过去运行时没有任何问题,然后突然开始抛出这个错误。更可疑的是,代码在我的苹果M1 tensorflow配置中运行时没有任何问题。再说一次,我没有对我的代码做任何更改,但是现在代码不能在Google Colab中运行,尽管它过去运行起来没有任何问题。
两个环境都安装了tensorflow 2.6.0
为了重现错误,我创建了下面的代码。我希望你能对此有所了解。
!pip install transformers
!pip install datasets
import pandas as pd
import numpy as np
import tensorflow as tf
from transformers import AutoTokenizer
from datasets import Dataset
# dummy sentences
sentences = ['the house is blue and big', 'this is fun stuff','what a horrible thing to say']
# create a pandas dataframe and converto to Hugging Face dataset
df = pd.DataFrame({'Text': sentences})
dataset = Dataset.from_pandas(df)
#download bert tokenizer
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
# tokenize each sentence in dataset
dataset_tok = dataset.map(lambda x: tokenizer(x['Text'], truncation=True, padding=True, max_length=10), batched=True)
# remove original text column and set format
dataset_tok = dataset_tok.remove_columns(['Text']).with_format('tensorflow')
# extract features
features = {x: dataset_tok[x].to_tensor() for x in tokenizer.model_input_names}
发布于 2021-10-26 05:37:05
在删除to_tensor()
之后,给定的代码按照@Harold G.建议的那样工作。
!pip install transformers
!pip install datasets
import pandas as pd
import numpy as np
import tensorflow as tf
from transformers import AutoTokenizer
from datasets import Dataset
# dummy sentences
sentences = ['the house is blue and big', 'this is fun stuff','what a horrible thing to say']
# create a pandas dataframe and converto to Hugging Face dataset
df = pd.DataFrame({'Text': sentences})
dataset = Dataset.from_pandas(df)
#download bert tokenizer
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
# tokenize each sentence in dataset
dataset_tok = dataset.map(lambda x: tokenizer(x['Text'], truncation=True, padding=True, max_length=10), batched=True)
# remove original text column and set format
dataset_tok = dataset_tok.remove_columns(['Text']).with_format('tensorflow')
# extract features
features = {x: dataset_tok[x] for x in tokenizer.model_input_names}
https://stackoverflow.com/questions/69577998
复制相似问题