评测维度

最近更新时间:2025-11-05 15:17:32

我的收藏

建议评分

建议评分(SuggestedScore),取值范围[0,100],评分方式为建议评分 = 精准度(PronAccuracy)× 完整度(PronCompletion)×(2 - 完整度(PronCompletion)),如若评分策略不符合请参考 Words 数组中的详细分数自定义评分逻辑。
评测模式
建议评分
句子模式
段落模式
句子多分支模式
精准度(PronAccuracy)× 完整度(PronCompletion)×(2 - 完整度(PronCompletion))
单词评测模式
单词纠错模式
拼音模式
取精准度评分即可
自由说评测
情景评测模式
可能需要根据返回结果进一步计算。如果无相关需求,自由说模式可以取精准度;情景评测模式当切题度(PronCompletion)为1时,取精准度。当切题度(PronCompletion)为-1时,评分为0
单词实时评测
取中间结果(SentenceInfo)的精准度

精准度

发音精准度(PronAccuracy),取值范围[0,100], 评判标准是在参照了标准发音以及语音库同种发音对比的前提下,综合参照元音字母以及组合在不同音节中的不同读法、辅音以及成音节的读音、单词重音、句子重音等不同的发音指标,给出发音精准度的得分。
总精准度是所有匹配单词的平均值。

流利度

发音流利度 (PronFluency), 取值范围[0-1.0], 是根据发音是否通顺给出基础进行打分,并考虑发音变化中连读、失去爆破、弱读、同化的表达,综合参照语调和节奏中意群与停顿、语调、节奏的发音情况,给出流利度的得分。
总流利度是所有匹配单词的平均值。

完整度

发音完整度 (PronCompletion), 取值范围[0-1.0], 是根据发音识别文本与上传 ref_text 文本的对比,是所有匹配单词ref_text 文本长度的比值。

匹配情况

MatchTag 表示当前音频数据的词和当前文本词的匹配情况。
0:匹配单词、1:新增单词、2:缺少单词、3:错读的词、4:未录入单词。

当前词与参考词

智聆口语评测进行评测的时候会对文本进行处理,参考 评估文本介绍
处理后会按照当前词(Word)来进行评测,原始文本可以参考(ReferenceWord)。

评测示例

请求参数

# 参数示例为websocket连接URL展开, 如:soe.cloud.tencent.com/soe/api/1306***?eval_mode=0&voice_format=1&...
eval_mode=1
keyword=
rec_mode=0
ref_text=1st,wakey i go to school by bus
score_coeff=1.000000
sentence_info_enabled=0
server_engine_type=16k_en
text_mode=0
voice_format=1
voice_id=6bd18918-8d15-4272-b693-6c588f058a33

返回结果

{ "code": 0, "message": "6bd18918-8d15-4272-b693-6c588f058a33_12", "voice_id": "6bd18918-8d15-4272-b693-6c588f058a33", "result": { "SuggestedScore": 56.42676544189453, "PronAccuracy": 61.44247817993164, "PronFluency": 0.976434051990509, "PronCompletion": 0.7142857313156128, "Words": [ { "MemBeginTime": 140, "MemEndTime": 550, "PronAccuracy": -1, "PronFluency": 0, "ReferenceWord": "*", "Word": "*", "MatchTag": 1, "KeywordTag": 0, "PhoneInfos": [ ], "Tone": { "Valid": false, "RefTone": -1, "HypothesisTone": -1 } }, { "MemBeginTime": 550, "MemEndTime": 910, "PronAccuracy": 99.0815658569336, "PronFluency": 0.9839948415756226, "ReferenceWord": "1st_0", "Word": "first", "MatchTag": 0, "KeywordTag": 0, "PhoneInfos": [ { "MemBeginTime": 550, "MemEndTime": 650, "PronAccuracy": 98.95162963867188, "DetectedStress": false, "Phone": "f", "ReferencePhone": "", "ReferenceLetter": "", "Stress": false, "MatchTag": 0 }, { "MemBeginTime": 650, "MemEndTime": 770, "PronAccuracy": 99.2125244140625, "DetectedStress": false, "Phone": "er", "ReferencePhone": "", "ReferenceLetter": "", "Stress": false, "MatchTag": 0 }, { "MemBeginTime": 770, "MemEndTime": 840, "PronAccuracy": 99.16388702392578, "DetectedStress": false, "Phone": "s", "ReferencePhone": "", "ReferenceLetter": "", "Stress": false, "MatchTag": 0 }, { "MemBeginTime": 840, "MemEndTime": 910, "PronAccuracy": 98.99822235107422, "DetectedStress": false, "Phone": "t", "ReferencePhone": "", "ReferenceLetter": "", "Stress": false, "MatchTag": 0 } ], "Tone": { "Valid": false, "RefTone": -1, "HypothesisTone": -1 } }, { "MemBeginTime": 910, "MemEndTime": 1210, "PronAccuracy": 42.349395751953125, "PronFluency": 0.9971372485160828, "ReferenceWord": "wakey_2", "Word": "wakey", "MatchTag": 4, "KeywordTag": 0, "PhoneInfos": [ { "MemBeginTime": 910, "MemEndTime": 970, "PronAccuracy": 98.7578353881836, "DetectedStress": false, "Phone": "w", "ReferencePhone": "", "ReferenceLetter": "", "Stress": false, "MatchTag": 0 }, { "MemBeginTime": 970, "MemEndTime": 1050, "PronAccuracy": 98.17191314697266, "DetectedStress": false, "Phone": "ey", "ReferencePhone": "", "ReferenceLetter": "", "Stress": false, "MatchTag": 0 }, { "MemBeginTime": 1050, "MemEndTime": 1130, "PronAccuracy": 14.024602890014648, "DetectedStress": false, "Phone": "k", "ReferencePhone": "", "ReferenceLetter": "", "Stress": false, "MatchTag": 0 }, { "MemBeginTime": 1130, "MemEndTime": 1180, "PronAccuracy": 0.4065273702144623, "DetectedStress": false, "Phone": "w", "ReferencePhone": "", "ReferenceLetter": "", "Stress": false, "MatchTag": 0 }, { "MemBeginTime": 1180, "MemEndTime": 1210, "PronAccuracy": 0.38612863421440125, "DetectedStress": false, "Phone": "ay", "ReferencePhone": "", "ReferenceLetter": "", "Stress": false, "MatchTag": 0 } ], "Tone": { "Valid": false, "RefTone": -1, "HypothesisTone": -1 } }, { "MemBeginTime": 1210, "MemEndTime": 1310, "PronAccuracy": 2.167337656021118, "PronFluency": 0.9951702952384949, "ReferenceWord": "i_3", "Word": "i", "MatchTag": 0, "KeywordTag": 0, "PhoneInfos": [ { "MemBeginTime": 1210, "MemEndTime": 1310, "PronAccuracy": 2.167337656021118, "DetectedStress": false, "Phone": "ay", "ReferencePhone": "", "ReferenceLetter": "", "Stress": false, "MatchTag": 0 } ], "Tone": { "Valid": false, "RefTone": -1, "HypothesisTone": -1 } }, { "MemBeginTime": 1310, "MemEndTime": 1490, "PronAccuracy": 0.26200056076049805, "PronFluency": 0.9832581281661987, "ReferenceWord": "go_4", "Word": "go", "MatchTag": 0, "KeywordTag": 0, "PhoneInfos": [ { "MemBeginTime": 1310, "MemEndTime": 1390, "PronAccuracy": 0.5108872056007385, "DetectedStress": false, "Phone": "g", "ReferencePhone": "", "ReferenceLetter": "", "Stress": false, "MatchTag": 0 }, { "MemBeginTime": 1390, "MemEndTime": 1490, "PronAccuracy": 0.013113964349031448, "DetectedStress": false, "Phone": "ow", "ReferencePhone": "", "ReferenceLetter": "", "Stress": false, "MatchTag": 0 } ], "Tone": { "Valid": false, "RefTone": -1, "HypothesisTone": -1 } }, { "MemBeginTime": 1490, "MemEndTime": 1650, "PronAccuracy": 49.854270935058594, "PronFluency": 0.9747055768966675, "ReferenceWord": "to_5", "Word": "to", "MatchTag": 0, "KeywordTag": 0, "PhoneInfos": [ { "MemBeginTime": 1490, "MemEndTime": 1560, "PronAccuracy": 0.8929398655891418, "DetectedStress": false, "Phone": "t", "ReferencePhone": "", "ReferenceLetter": "", "Stress": false, "MatchTag": 0 }, { "MemBeginTime": 1560, "MemEndTime": 1650, "PronAccuracy": 98.81560516357422, "DetectedStress": false, "Phone": "ah", "ReferencePhone": "", "ReferenceLetter": "", "Stress": false, "MatchTag": 0 } ], "Tone": { "Valid": false, "RefTone": -1, "HypothesisTone": -1 } }, { "MemBeginTime": 1650, "MemEndTime": 2250, "PronAccuracy": 98.87284851074219, "PronFluency": 0.924338161945343, "ReferenceWord": "school_6", "Word": "school", "MatchTag": 0, "KeywordTag": 0, "PhoneInfos": [ { "MemBeginTime": 1650, "MemEndTime": 1750, "PronAccuracy": 98.44365692138672, "DetectedStress": false, "Phone": "s", "ReferencePhone": "", "ReferenceLetter": "", "Stress": false, "MatchTag": 0 }, { "MemBeginTime": 1750, "MemEndTime": 1830, "PronAccuracy": 99.21931457519531, "DetectedStress": false, "Phone": "k", "ReferencePhone": "", "ReferenceLetter": "", "Stress": false, "MatchTag": 0 }, { "MemBeginTime": 1830, "MemEndTime": 1990, "PronAccuracy": 99.09530639648438, "DetectedStress": false, "Phone": "uw", "ReferencePhone": "", "ReferenceLetter": "", "Stress": false, "MatchTag": 0 }, { "MemBeginTime": 1990, "MemEndTime": 2250, "PronAccuracy": 98.73311614990234, "DetectedStress": false, "Phone": "l", "ReferencePhone": "", "ReferenceLetter": "", "Stress": false, "MatchTag": 0 } ], "Tone": { "Valid": false, "RefTone": -1, "HypothesisTone": -1 } }, { "MemBeginTime": 2250, "MemEndTime": 2250, "PronAccuracy": -1, "PronFluency": 0, "ReferenceWord": "by_7", "Word": "by", "MatchTag": 2, "KeywordTag": 0, "PhoneInfos": [ ], "Tone": { "Valid": false, "RefTone": -1, "HypothesisTone": -1 } }, { "MemBeginTime": 2250, "MemEndTime": 2250, "PronAccuracy": -1, "PronFluency": 0, "ReferenceWord": "bus_8", "Word": "bus", "MatchTag": 2, "KeywordTag": 0, "PhoneInfos": [ ], "Tone": { "Valid": false, "RefTone": -1, "HypothesisTone": -1 } } ], "SentenceId": -1, "RefTextId": -1, "KeyWordHits": null, "UnKeyWordHits": null }, "final": 1 }

返回结果解析

音频发音内容为 hello first wakey i want to school。
hello:由于评估文本中未存在这个单词,所以不会进行评测,此时 MatchTag 为1,Word为*。
first:对应原文本 1st,此时 MatchTag 为0.评估的当前词 Word 为 first,评估文本参考词 ReferenceWord 为 1st_0。表示 1st 的下标为 0。
wakey:此时 MatchTag 为4,由于该单词不属于常见词,所以我们暂未录入。一般情况下会尝试进行发音评测,如果不符合可以通过 音素标注。此时 Word 为 wakey,ReferenceWord 为 wakey_2。一般情况下标点符号不参与评测,但是会占位,所以此时表示wakey的下标为2,而不是1。
i want to school:与评估文本 i go to school 有出入,但是此时 MatchTag 为0。评分偏低情况下,可以认为是错读的情况。
by bus:此时 MatchTag 为2,由于未读该内容所以缺少单词。
总精准度只会计算当前词 MatchTag 为0的精准度的平均值。
完整度也只会计算当前词 MatchTag 为0的完整度。评估文本中虽然有8个单词,但是由于 wakey 是不在字典里面,MatchTag 为4。所以实际只有7个单词,MatchTag 为0的有5个单词,所以最终完整度为5/7。