前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >transformer小白入门

transformer小白入门

作者头像
菩提树下的杨过
发布2023-08-21 08:13:17
4210
发布2023-08-21 08:13:17
举报

transformer库是huggingface发布的1个框架,非常好用,很多外行看起来高大上的问题,用它都可以轻松解决,先来看1个小例子:

 一、情感分析

代码语言:javascript
复制
from transformers import pipeline
classifier = pipeline('sentiment-analysis')
classifier('you are beautiful')

这简单的三行代码,就能分析出"you are beautiful" 这句话的情感,是积极正向的(即:好话),还是消极负面(即:坏话)。顺利的话,会看到类似下面的输出:

[{'label': 'POSITIVE', 'score': 0.9998794794082642}] 表明这是一句好话,score可以理解为可信度,0.9998即99.98%。另外注意到首次使用 sentiment-analysis 这个分类器时,会从huggingface下载依赖的模型。

万事开头难,如果你第1个示例就跑不通,出现下面的错误:

多半是transformers版本太低。可以用

代码语言:javascript
复制
import transformers
transformers.__version__

看看当前版本,如果是2.1.1就表示太低了,可另开1个终端输入:

代码语言:javascript
复制
pip install --upgrade transformers -i https://pypi.tuna.tsinghua.edu.cn/simple

将其升级至最新版本。

代码语言:javascript
复制
from transformers import pipeline
print(transformers.__version__)
classifier = pipeline('sentiment-analysis')
classifier('you are beautiful')

这次对了,如下图:

 但是有一行警告文字 :

代码语言:javascript
复制
No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.

 这个的意思是说,没有指定具体的模型,所以情感分析默认使用了https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english这个模型,建议指定1个具体的模型。

代码语言:javascript
复制
from transformers import pipeline
print(transformers.__version__)
model_id="distilbert-base-uncased-finetuned-sst-2-english"
classifier = pipeline('sentiment-analysis',model=model_id)
classifier('you are beautiful')

警告就被消除了。默认的模型对中文支持并不好,可以到HuggingFace上搜索"sentiment chinese",参考下图:

 可以看到很多模型,我们选下载量排行第1的这个(下图)

复制名称(参考下图)

试一下:

代码语言:javascript
复制
from transformers import pipeline
model_id="hw2942/bert-base-chinese-finetuning-financial-news-sentiment-v2"
classifier = pipeline('sentiment-analysis',model=model_id)
classifier(['这是什么鬼天气!','你可真棒!','看你那脸,拉得跟驴似的!','今天手气真差,又他妈输了!'])

模型首次使用会先下载,然后输出分析结果,可以看到,总体还算靠谱,但也有不太合理的,比如:“这是什么鬼天气!”,“看你那脸,拉得跟驴似的!” ,这二句明显是负面情绪,会被标为“中性”,所以效果好不好,主要还得看模型本身的质量。不过总体来讲,这比先前默认的英文模型,还是要强一些,来看看对比:

二、0样本分类

代码语言:javascript
复制
from transformers import pipeline
classifier = pipeline("zero-shot-classification")
classifier(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
)

效果:给一段话和几个候选标签,让代码分析每个标签匹配的可信度。以上面的例子来说,最接近education(教育)

三、文本生成

代码语言:javascript
复制
from transformers import pipeline
generator = pipeline("text-generation",model="distilgpt2")
generator("once upon a time", max_length=30,num_return_sequences=2)

简单说,就是起个头,让它自己接着编

四、填空

代码语言:javascript
复制
from transformers import pipeline
unmasker = pipeline("fill-mask",model="distilroberta-base")
unmasker("I love sweet foods,such as <mask>.", top_k=2)

<mask>部分将由算法自动填充

五、阅读理解(提取答案)

代码语言:javascript
复制
from transformers import pipeline

question_answerer = pipeline("question-answering")
question_answerer(
    question="Is it raining today?",
    context="In the evening, a large cloud drifted in the distance, and soon it began to rain"
)

大致效果就是给它一段话,然后提问,让它从这段话中把跟答案相关的内容找出来。

六、翻译

汉译英

代码语言:javascript
复制
from transformers import pipeline
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-zh-en")
translator("今天是周四,我要吃肯德基。")

英译汉

代码语言:javascript
复制
from transformers import pipeline
translator= pipeline("translation", model="Helsinki-NLP/opus-mt-en-zh")
translator("It's Thursday. I'm gonna eat Kentucky Fried Chicken.")

七、生成摘要

代码语言:javascript
复制
from transformers import pipeline
summarizer = pipeline("summarization",model="sshleifer/distilbart-cnn-12-6")
summarizer("""Speaking a language is a skill, like driving a car, playing a musical instrument or learning to swim. 
To be a good driver, you need to practise driving. You can read a book about car mechanics. You can study the rules of the road. 
But nothing is as good for your driving as spending time behind the wheel of a car, actually driving.
It's the same with speaking English. No matter how much you study grammar and vocabulary, if you don't practise spoken communication, it's very difficult to get good at it. 
So maybe you talk to yourself in English as you go about your day. Or maybe you look for opportunities to chat in English with people you meet. 
But however you do it, the most powerful way to improve your English speaking skills is to use them. """,max_length=100)
本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2023-08-20,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
腾讯云代码分析
腾讯云代码分析(内部代号CodeDog)是集众多代码分析工具的云原生、分布式、高性能的代码综合分析跟踪管理平台,其主要功能是持续跟踪分析代码,观测项目代码质量,支撑团队传承代码文化。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档