分享 | OpenCV4.5.4 语音识别使用测试(含详细步骤)

Color Space

发布于 2021-12-20 16:17:03

8650

发布于 2021-12-20 16:17:03

文章被收录于专栏：OpenCV与AI深度学习OpenCV与AI深度学习

导读

本文主要为大家分享OpenCV4.5.4中语音识别实例的使用(验证)与注意事项。

背景介绍

OpenCV4.5.4的DNN模块中新增了对语音识别的支持，本文以Python版本实例来做验证介绍。

使用步骤

Python-OpenCV实例代码位置：OpenCV4.5.4_Release\opencv\sources\samples\dnn\speech_recognition.py

使用步骤：

【1】下载语音识别模型：

https://drive.google.com/drive/folders/1wLtxyao4ItAg8tt4Sb63zt6qXzhcQoR6

模型下载jasper_reshape.onnx，然后重命名为：jasper.onnx，放到py文件同目录

【2】下载测试音频：

如上图中下载audio6.flac和audio6.flac，初步测试发现程序不支持mp3格式音频，需转为flac或wav格式，其他格式暂未尝试。

【3】安装soundfile包：

pip install soundfile 即可。

【4】cmd命令行运行：

python speech_recognition.py --input_audio=./audio/audio6.flac

audio6.flac识别结果：

Predicting...
Audio file 1/1
['an american instead of going in a leisure hour to dance merrily at some place of public resort as the fellows of his calling continued to do throughout the greater part of europe shuts himself up at home to drink']

audio10.flac识别结果：

Predicting...
Audio file 1/1
['she opened the door softly there sat missus wilson in the old rocking chair with one sick death like boy lying on her knee crying without let or pause but softly gently as fearing to disturb the troubled gasping child while behind her old alice let her fast dropping tears fall down on the dead body of the other twin which she was laying out on a board placed on a sort of sofa settee in the corner of the room']

上面两段音频识别结果都还不错，注意此模型不支持中文识别，换两段英文音频试试：

第一段音频：https://www.tingclass.net/show-5406-3632-1.html

python speech_recognition.py --input_audio=./audio/CET4.wav

识别结果：

Predicting...
Audio file 1/1
['o hom m bell amo hn haha am o waa iha  me howa e al ru e  hi hera morbo ao ha yur you move fore hung mo by wholl hab your hu mo ah  miseur luuel u lonlur wole olla iwer home all  bou o how bu olur aa men he ul um aha ol a oh a he notn ol all hole ar rule sa mer peaile hall her orha ah be a hen hom all murn a bown lok ano gerl orhehan or holy mule i ea the lol and theyn whole mon wingle all form ']

呃呃，和实际结果差别很大，结果中的单词也很多看不懂。

换另一段音频：https://m.kekenet.com/Article/201504/369129.shtml

python speech_recognition.py --input_audio=./audio/english.wav

识别结果：

Predicting...
Audio file 1/1
[" shakish am am shut shash an shi hang ca iunkun usha y oru u warm room  wo o emon o  chjonnoe e  ah wo an o a hush e i've o ask rule ur o sqawe grewh ula u ho a o ah"]

这一段音频识别结果还是很差。

初步分析应该是模型训练时的音频跟我们测试的音频差异较大，要想得到好的识别结果，还得自己训练。例程代码speech_recognition.py中还包含预训练模型下载地址，大家有兴趣可以自己尝试。相关内容如有新的动态再分享给大家！

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2021-12-16，如有侵权请联系 cloudcommunity@tencent.com 删除

opencv