Sensory发布嵌入式THF&TNL V6.18.1版本

用户6026865

发布于 2022-04-02 17:11:54

3290

发布于 2022-04-02 17:11:54

Sensory于近日发布其嵌入式语音识别算法引擎 - TrulyHandsFree，和其嵌入式大词汇量连续语言识别引擎(Large Vocabulary Continuous Speech Recognition)- TrulyNatural的最新版本，即V6.18.1版本。

对比上一代版本，具有如下更新 -

Continuous Adaptation fixed wakeword models
Improved EFT models that identify the enrollment phrase with a fixed wakeword spotter
iOS PhraseSpot sample app supports Bluetooth headsets.
Support for creating ARM based deeply embedded apps with no OS
Multi-threaded models
Custom memory allocation, performance stats, Panic function
“Universal” UDT models that support most languages
Trigger-to-command mode supports optional “extra” commands without the need to say the wakeword (eg trigger + command + command …)
Very large multithreaded US English 845 MB LVCSR Broad model
LVCSR Background models for rejecting out-of-vocabulary speech
Tool to query snsr model settings

对比上一代版本具有相当大的功能和性能提升 -

支持固定唤醒词如Alexa, OK Google等的持续自适应(continuous Adaptation)性能提升
更好地对iOS平台蓝牙产品的支持
可定制化的内存分配能力
用户自定义唤醒词(UDT - User Defined Trigger)支持绝大部分语种
支持一次唤醒在一定时间窗口内可识别多次语音命令
对美国英语更好的支持，支持更大的LVCSR(Large Vacabulary Continuous Speech Recognition)

关于TrulyHandsFree -

High Accuracy,

Low Power, Customizable

Voice Control for Devices & Applications

Fixed wake word (FW), a speaker-independent wake word that responds to a predefined wake word (i.e. “Hey Siri”. “Alexa”.“Ok Google”, “Cortana”) by any speaker in the wake word’s native language. Sensory trains the wake word to work best in the real-world use-case and demographic required by the customer. Fixed wake words have the advantage of providing a ready to use out-of-the-box experience.

Enrolled wake word (EW), a wake word that is pre-determined and adapts to a user’s voice. The adaptation requires a few recordings, collected during an enrollment phrase, of the user saying the wake word. After adaptation, the EW will respond better to the enrolled user’s voice than other users. Enrolled wake words have the advantage of lower false reject errors and lower false accept errors than fixed wake words (FW).

User-defined wake word (UDW), a language independent wake word or phrase specified by the user speaking the intended phrase. UDW enrollment requires a few recordings of the desired wake word, and results in a user-specific recognizer.

Command Sets, speech recognition for product control. THF supports small to medium-sized command sets in a listening window that can begin immediately after a FW/EW/UDW. Commands can be short sentences or a single word.

Speaker Verification (SV) and Speaker ID (SID) offers a secured wake word or phrase that authenticates a user’s identity based upon a spoken, pre-defined, password or phrase. Unlike other voice security solutions, SV and SID attempt to detect differences in the way a word is spoken, making it more sensitive to an individual’s voice. SV and SID adaptation requires a few recordings, collected during an enrollment phrase, of the user saying the target phrase or password

Voice Activity Detector (VAD), looks for the beginning and end of speech, typically after a wakeword, and captures it. The

audio is automatically passed to an output stream which can be a WAV file, or a memory buffer passed to the cloud for Speech-To-Text (STT) processing.

False Accept (FA) Filtering, an advanced machine-learning algorithm for reducing False Accept ( FA) errors. FA Filtering can reduce the False Accept Rate (FAR) by anywhere from 50-90%

Low-Power Sound Detection (LPSD), available in our DSP version saves power by only processing speech that has enough energy to be considered relevant in a quiet environment.

Model Combining, Models can be concurrently combined at design time or runtime for multiple wakeword recognition .Concurrently combined models work in parallel Models can also be sequentially combined for a “wakeword-to-command”

Model Debugging, Any recognition model can be combined with a “debug model” which automatically creates a log file with time stamps and captured audio of all recognition events.

Code Space Model Linking, allows for fixed models to be stored in code space. This frees up more data memory on RAMlimited systems for recognition and other tasks.

Little-Big Models, combine the speed of a small (<100 KB) model with the accuracy of a large (1+ MB) one on systems that are CPU cycle constrained but want the best and most accurate models. In Little-Big, the recognizer continuously listens for the wakeword using the little model. When it detects a match, then the wakeword audio is rechecked using the big, more accurate model. Only results that pass the big model are reported. The downside of little-big is that it adds latency since the wakeword must pass two checks.

End-Point Dectection (EPD) After a successful recognition result, the recognizer returns the matching word/phrase and the timestamp of the beginning and end points in the audio stream. The stream is relative to the start of the recognizer

广泛的语言支持 -

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2022-03-03，如有侵权请联系 cloudcommunity@tencent.com 删除

linux

本文分享自 SmellLikeAISpirit 微信公众号，前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

linux

登录后参与评论

0 条评论

热度

Sensory发布嵌入式THF&TNL V6.18.1版本

Sensory发布嵌入式THF&TNL V6.18.1版本

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐