在线文档中Transformers的Spacy 3.1示例代码似乎是错误的

在使用Transformers与Spacy 3.1集成时，可能会遇到一些问题，特别是因为API的变化和版本兼容性问题。以下是一个更新的示例代码，展示了如何在Spacy 3.1中使用Transformers。

安装依赖

首先，确保你已经安装了必要的依赖项：

pip install spacy
pip install transformers
pip install spacy-transformers

示例代码

以下是一个示例，展示了如何在Spacy 3.1中使用Transformers：

import spacy
from spacy.tokens import DocBin
from spacy.training import Example
from spacy_transformers import TransformersLanguage, TransformersWordPiecer, TransformersTok2Vec

# 加载预训练的Transformers模型
transformer_model = "bert-base-uncased"

# 创建一个新的Spacy语言对象
nlp = spacy.blank("en")

# 添加Transformers组件到管道中
nlp.add_pipe("transformer", config={"model": transformer_model})
nlp.add_pipe("ner")

# 准备训练数据
train_data = [
    ("Apple is looking at buying U.K. startup for $1 billion", {"entities": [(0, 5, "ORG"), (27, 31, "GPE"), (44, 54, "MONEY")]}),
    ("San Francisco considers banning sidewalk delivery robots", {"entities": [(0, 13, "GPE")]}),
]

# 创建DocBin对象来存储训练数据
db = DocBin()
for text, annotations in train_data:
    doc = nlp.make_doc(text)
    ents = []
    for start, end, label in annotations["entities"]:
        span = doc.char_span(start, end, label=label)
        if span is None:
            print(f"Skipping entity: {text[start:end]}")
        else:
            ents.append(span)
    doc.ents = ents
    db.add(doc)

# 保存训练数据到磁盘
db.to_disk("./train.spacy")

# 加载训练数据
train_docs = DocBin().from_disk("./train.spacy").get_docs(nlp.vocab)

# 准备训练示例
train_examples = []
for doc in train_docs:
    example = Example.from_dict(doc, {"entities": [(ent.start_char, ent.end_char, ent.label_) for ent in doc.ents]})
    train_examples.append(example)

# 开始训练
optimizer = nlp.begin_training()
for i in range(10):
    losses = {}
    nlp.update(train_examples, sgd=optimizer, losses=losses)
    print(f"Losses at iteration {i}: {losses}")

# 保存模型
nlp.to_disk("./model")

# 加载模型并测试
nlp = spacy.load("./model")
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")
for ent in doc.ents:
    print(ent.text, ent.label_)

解释

安装依赖：确保安装了spacy、transformers和spacy-transformers。
创建Spacy语言对象：使用spacy.blank创建一个新的Spacy语言对象。
添加Transformers组件：使用nlp.add_pipe方法将Transformers组件添加到管道中。
准备训练数据：创建训练数据并使用DocBin对象存储。
加载训练数据：从磁盘加载训练数据并创建训练示例。
训练模型：使用nlp.update方法进行模型训练。
保存和加载模型：将训练好的模型保存到磁盘，并重新加载进行测试。

注意事项

版本兼容性：确保spacy、transformers和spacy-transformers的版本兼容。
数据格式：确保训练数据的格式正确，特别是实体的起始和结束字符位置。
错误处理：在处理实体时，可能会遇到None的情况，需要进行适当的错误处理。

安装依赖

示例代码

解释

注意事项

相关·内容

转：模拟退火算法在企业文档管理系统中的代码示例

C# 8.0 可空引用类型中的各项警告错误的含义和示例代码

应用实战｜大模型驱动的智能知识引擎

Python自然语言处理面试：NLTK、SpaCy与Hugging Face库详解

【AI】探索自然语言处理（NLP）：从基础到前沿技术及代码实践

2022了你还不会『低代码』？数据科学也能玩转Low-Code啦！ ⛵

独家 | 快速掌握spacy在python中进行自然语言处理（附代码&链接）

广告行业中那些趣事系列60：详解超好用的无监督关键词提取算法Keybert

利用BERT和spacy3联合训练实体提取器和关系抽取器

【NLP】竞赛必备的NLP库

《AIGC与电影剧本创作的未来》

星标破10万！Auto-GPT之后，Transformer越新里程碑

利用spaCy和Cython实现高速NLP项目

深度学习（四）：自然语言处理的强大引擎（410）

2022年必须要了解的20个开源NLP 库

5分钟NLP：快速实现NER的3个预训练库总结

NLP简报（Issue#8）

使用Python过滤出类似的文本的简单方法

如何避免LLM的“幻觉”(Hallucination)

教程 | 比Python快100倍，利用spaCy和Cython实现高速NLP项目

扫码

相关资讯

热门标签

活动推荐

运营活动

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐