📂 /path/to/project
┣━━ 📂 actions
┃ ┣━━ 🐍 init.py
┃ ┗━━ 🐍 actions.py
┣━━ 📂 data
┃ ┣━━ 📄 nlu.yml
┃ ┣━━ 📄 rules.yml
┃ ┗━━ 📄 stories.yml
┣━━ 📂 models
┣━━ 📂 tests
┃ ┗━━ 📄 test_stories.yml
┣━━ 📄 config.yml
┣━━ 📄 credentials.yml
┣━━ 📄 domain.yml
┗━━ 📄 endpoints.yml
本模块会具体针对意图识别,实体提取等任务,配置意图以及触发该意图的文本,提供用户在各种意图下的文本作为examples:询问Query:用户对聊天机器人发出的询问。行动Action: 聊天机器人根据用户询问做出的回应。意图Intent:用户输入蕴含的目的或意图,eg. 用户:你好;intent:打招呼。实体Entity:从用户输入中提取的有用信息
提供bot在各种类型下的响应,预设定好的内容,不需要执行代码或返回事件。和actions中的response对应,定义格式为utter_intent
responses:
utter_greet:
- text: "今天天气怎么样" #添加文字
image: "https://i.imgur.com/nGF1K8f.jpg" #添加图像
我们在激活form之后,会循环请求slot值,为了让用户知道机器人正在请求哪一个slot值,我们会在responses里添加utter_ask_<slot_name>,使得机器需要填这个槽时,会对用户发出询问。下面是机器需要填name_spelled_correctly这个slot的回复实例:
responses:
utter_ask_name_spelled_correctly:
- buttons:
- payload: /affirm
title: Yes
- payload: /deny
title: No
text: Is {first_name} spelled correctly?
提供用户和bot的对话信息作为examples,用来训练bot的 Core (DM) 模型,能推广到看不见的对话路径。
stories:
- story: happy path
steps:
- intent: greet
- action: utter_greet
- intent: mood_great
- action: utter_happy
rasa init # 使用自带的样例数据生成一个新的 project
rasa train # 训练模型
rasa test # 测试训练好的 rasa 模型 (默认使用最新的)
rasa interactive # 和 bot 进行交互,创建新的训练数据
rasa shell # 加载模型 (默认使用最新的),在命令行和 bot 对话
rasa run # 使用训练好的模型,启动 server,包括 NLU 和 DM
rasa run actions # 使用 rasa SDK,启动 action server
nlu:
- regex: car_type
examples: |
- ^[a-zA-Z][0-9]$
- intent: 手机产品介绍
examples: |
- 这款手机[续航](property)怎么样呀?
输出结果:包括实体的取值"value",类别"entity",置信水平"confidence"和抽取器"extractor"等的json格式:
{ "entities": [{
"value": "续航",
"start": 20,
"end": 33,
"confidence": 0.812631,
"entity": "property",
"extractor": "DIETClassifier"
}]
}
实体角色:如果你觉得两个实体属于同一种类但是他们在文本中扮演的角色不一样,那么你可以通过定义entity roles来区分他们。
nlu:
- intent: 手机产品介绍
examples: |
- 我想要对比一下[Mi11]{“entity”: “type”, “role”: “xiaomi”}和[iPhone11]{“entity”: “type”, “role”: “apple”}
实体group:如果你不想具体定义相同的实体的角色名称,只是想给他们分成不同的group,可以使用这个方法:
nlu:
- intent: 手机产品介绍
examples: |
- 我想要对比一下[Mi10]{“entity”: “type”, “group”: “1”},
[Mi11]{“entity”: “type”, “group”: “1”}
和[iPhone11]{“entity”: “type”, “group”: “2”}
nlu:
- intent: 手机产品介绍
examples: |
- 我想看看[Mi11](type)
- synonym: P7
examples: |
- 小米11
- xiaomi11
config.yml文件中定义nlu pipeline部分示意:
language: en
pipeline:
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
epochs: 100
每个组件接受其他组件的输入数据并输出数据,一个组件的输出可以被pipeline中的任何排在他后面的组件使用。某些组件仅生成pipeline中其他组件使用的信息,而一些组件可以流程完成后返回的输出属性。
pip install rasa_chinese
pip install rasa_chinese_service
其中包括基于 HuggingFace’s transformers 的分词组件,在config.yml修改此组件如下:
- name: "rasa_chinese.nlu.tokenizers.lm_tokenizer.LanguageModelTokenizer"
tokenizer_url: "http://127.0.0.1:8000/"
需要使用 rasa_chinese_service 作为服务器
JiebaTokenizer:
“我想要了解小鹏汽车” —> ‘我’, ‘想要’, ‘了解’, ‘小鹏’, ‘汽车’
WhitespaceTokenizer:
“I would like to know about Xiaopeng car.” —> ‘I’, ‘would’, ‘like’, ‘to’, ‘know’, ‘about’, ‘Xiaopeng’, ‘car’
- name: LanguageModelFeaturizer
model_name: bert
model_weights: /path/to/offline_model
pipeline:
- name: "rasa.nlu.components.MyComponent"
rasa/nlu/components.py文件模板:
import typing
from typing import Any, Optional, Text, Dict, List, Type
from rasa.nlu.components import Component
from rasa.nlu.config import RasaNLUModelConfig
from rasa.shared.nlu.training_data.training_data import TrainingData
from rasa.shared.nlu.training_data.message import Message
if typing.TYPE_CHECKING:
from rasa.nlu.model import Metadata
class MyComponent(Component):
"""A new component"""
# Which components are required by this component.
# Listed components should appear before the component itself in the pipeline.
@classmethod
def required_components(cls) -> List[Type[Component]]:
"""Specify which components need to be present in the pipeline."""
return []
# Defines the default configuration parameters of a component
# these values can be overwritten in the pipeline configuration
# of the model. The component should choose sensible defaults
# and should be able to create reasonable results with the defaults.
defaults = {}
# Defines what language(s) this component can handle.
# This attribute is designed for instance method: `can_handle_language`.
# Default value is None which means it can handle all languages.
# This is an important feature for backwards compatibility of components.
supported_language_list = None
# Defines what language(s) this component can NOT handle.
# This attribute is designed for instance method: `can_handle_language`.
# Default value is None which means it can handle all languages.
# This is an important feature for backwards compatibility of components.
not_supported_language_list = None
def __init__(self, component_config: Optional[Dict[Text, Any]] = None) -> None:
super().__init__(component_config)
def train(
self,
training_data: TrainingData,
config: Optional[RasaNLUModelConfig] = None,
**kwargs: Any,
) -> None:
"""Train this component.
This is the components chance to train itself provided
with the training data. The component can rely on
any context attribute to be present, that gets created
by a call to :meth:`components.Component.pipeline_init`
of ANY component and
on any context attributes created by a call to
:meth:`components.Component.train`
of components previous to this one."""
pass
def process(self, message: Message, **kwargs: Any) -> None:
"""Process an incoming message.
This is the components chance to process an incoming
message. The component can rely on
any context attribute to be present, that gets created
by a call to :meth:`components.Component.pipeline_init`
of ANY component and
on any context attributes created by a call to
:meth:`components.Component.process`
of components previous to this one."""
pass
def persist(self, file_name: Text, model_dir: Text) -> Optional[Dict[Text, Any]]:
"""Persist this component to disk for future loading."""
pass
@classmethod
def load(
cls,
meta: Dict[Text, Any],
model_dir: Text,
model_metadata: Optional["Metadata"] = None,
cached_component: Optional["Component"] = None,
**kwargs: Any,
) -> "Component":
"""Load this component from file."""
if cached_component:
return cached_component
else:
return cls(meta)
自定义tokenizer:
class MyTokenizer(Tokenizer):
language_list = ["zh"]
def __init__(self, component_config: Dict[Text, Any] = None) -> None:
"""Construct a new Tokenizer."""
super().__init__(component_config)
@classmethod
def required_packages(cls) -> List[Text]:
return []
def tokenize(self, message: Message, attribute: Text) -> List[Token]:
"""Construct a new Tokenizer."""
return tokens
对话管理模块 (Dialogue Management)主要用来根据NLU输出的用户意图、槽位等信息,结合对话跟踪模块提供的历史上下文信息,决定对话过程中执行什么actions,因此也被称为Dialogue Policies。可以同时选择多个policies,由rasa agent统一调度,选择其中置信度最高的action。
Policy Priority: 当不同的policy的预测结果有相同的置信度时,按优先级进行排序:RulePolicy,6;MemoizationPolicy / Augmented Memoization Policy,3;UnexpecTEDIntentPolicy,2;TEDPolicy,1。
policies:
- name: "RulePolicy"
core_fallback_threshold: 0.3 # 当 policies不能预测出下一个action时,回退的action
core_fallback_action_name: action_default_fallback
enable_fallback_prediction: true
restrict_rules: true
check_for_contradictions: true # 训练前检查 slots和 active loops在rules中的一致性
policies:
- name: "MemoizationPolicy"
max_history: 7
policies: (自定义 policy)
- name: "path.to.your.policy.class"
arg1: "……"
需要在配置项中指定priority (优先级)参数:
class TEDPolicy(Policy):
def __init__(
self,
featurizer: Optional[TrackerFeaturizer] = None,
priority: int = DEFAULT_POLICY_PRIORITY,
max_history: Optional[int] = None,
model: Optional[RasaModel] = None,
fake_features: Optional[Dict[Text, List["Features"]]] = None,
) -> None:
主要函数:
train( ): 训练组件(可选) ;
predict_action_probalities( ): 预测下一步需要执行的 action;
persist( ): 保存 policy 组件模型到本地;
load( ): 加载保存好的 policy 组建模型
max_history:需要考虑多少轮的对话历史来预测下一步的action。太长会增加训练时长,可以把上下文信息设置为slot,在整个对话过程中都有效
Data augmentation:随机将stories.yml中的story粘在一起,创建更长的故事
Featurizers:
State Featurizers:需要将用户的历史状态数据转换成特征向量,供policy使用;rasa的每个故事都对应一个追踪器,对历史中的每个事件都创建一个状态;对追踪器的单个状态进行特征化:SingleStateFeaturizer。
Tracker Featurizers:在预测action时,除了当前状态,还应包含一些历史记录。FullDialogueTrackerFeaturizer:将整个对话都传入神经网络,遍历跟踪器状态,为每一个状态调用 SingleStateFeaturizer。MaxHistoryTrackerFeaturizer:和 full 类似,参数 max_history 定义了跟踪历史状态的轮数。IntentMaxHistoryTrackerFeaturizer:由 UnexpecTEDIntentPolicy 使用,目标标签是用户在对话跟踪器的上下文中表达的意图
responses:
utter_ask_type:
- text: 请问您是咨询哪款机型?
utter_greet:
- text: 你好, {name}. 最近怎样?
utter_cheer_up:
- text: 这儿有一些有趣的照片。
image: "https://i.imgur.com/nGF1K8f.jpg"
forms:
type_form: 型号表单
required_slots:
car_type:
- type: from_entity
插槽映射 (slot mappings),即需要哪些slots,如何填充这些slots:
from_entity:基于抽取出的 entities 来 fill slots。
from_text:使用用户的对话来 fill slots。
from_intent:若用户意图满足条件,使用得到的 value 来 fill slots。
from_trigger_intent:根据激活表单的意图来 fill slots
激活表单:添加story或rule,告知何时运行该表单
rules: (激活表单)
- rule: Activate form
steps:
- intent: request_car_type
- action: type_form # 开始运行表单
- active_loop: car_form # 激活表单
注意:激活填单会自动执行 utter_ask_{slot name} 的response,来询问槽位信息。
停用表单:当所有 slots 都被填满时,将自动停用。提前中断/停止:当用户不配合时,使用一些自定义的包含意图中断的 rules / stories
rules:
- rule: (停用并提交表单)
condition:
- active_loop: type_form # 满足表单激活的状态
steps:
- action: type_form
- active_loop: null # 停用该表单
- slot_was_set:
- requested_slot: null # 不再请求slot
- action: utter_submit # 提交表单
- rule: (用户中途切换话题)
condition:
- active_loop: type_form
steps:
- intent: chitchat # 用户中途开始闲聊
- action: utter_chitchat # 回应闲聊
- action: type_form
- active_loop: type_form
class Action:
def name(self) -> Text:
raise NotImplementedError("An action must implement a name")
async def run(
self,
dispatcher: "CollectingDispatcher",
tracker: Tracker,
domain: "DomainDict",
) -> List[Dict[Text, Any]]:
训练数据,stories和rules
rules:
- rule: 当置信度低于阈值时,要求用户重新表述
steps:
- intent: nlu_fallback
- action: utter_please_rephrase
rules:
- rule: 只有当用户提供姓名后才回复问好
condition:
- slot_was_set:
- user_provided_name: true
steps:
- intent: greet
- action: utter_greet
? Next user input: hello
? Is the NLU classification for 'hello' with intent 'hello' correct? Yes
------
Chat History
# Bot You
────────────────────────────────────────────
1 action_listen
────────────────────────────────────────────
2 hello
intent: hello 1.00
------
? The bot wants to run 'utter_greet', correct? (Y/n)