前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >EMBEDDED DICTATION, CAN YOU DO IT?

EMBEDDED DICTATION, CAN YOU DO IT?

作者头像
用户6026865
发布2022-05-17 13:20:05
3370
发布2022-05-17 13:20:05
举报

It’s funny how many companies ask for or claim that they can provide embedded dictation without qualifying what they really want or provide. Embedded dictation is very easy to do, but one must consider…

Which languages?

Some languages are easier than others. Different accuracies and conditions could require more data to train on. Luckily Sensory has been around long enough that we have collected over 150,000 hours of audio data across over 50 languages and dialects.

What size engine and platform?

Sensory has engines running on the tiniest of platforms with <50KB memory to large solutions requiring powerful DSP and inference engines. We can run a speech to text algorithm on as little as 3MB memory, and this algorithm will have extremely high task completion rates and low word error rates (<5%) for in domain usage, but out of domain it won’t perform as well. To get reasonable cross-domain performance engines need to get up to 20MB and have reasonably powerful processing or at least specialized inference functions.

What domain coverage?

Dictation isn’t specific to domains, or is it? Sensory’s top-of-the-line engine can get under 5% word error rates on certain Ted Talks, but apply a different test set or a different domain and the accuracy can get worse. The more we understand the domain or the testing methodology the better we can do.

What about accuracy?

Accuracy is typically measured in word error rates (WER) and Task Completion Rates (TCR). If it’s not straight dictation being performed, then task completion rates are usually most important because even if a word is recognized incorrectly if it performs the right function then it doesn’t really matter. Sensory likes TCR to ALWAYS exceed 95%, if it gets much below that it starts to feel unusable. The nice thing is that WER can drop as low as 10 or 15% and a good TCR can still be achieved.

What noise or signal to noise ratio?

Noise and distance make it much harder to recognize accurately. It is important to implement noise management strategies that fit the usage model. Sensory’s noise data includes about 15,000 hours of data and we have a variety of noise and acoustic simulation tools. Typically, multi-mic beam-forming helps, but watch out for noise suppression algorithms and nonlinear echo cancellation schemes that were developed around the psychoacoustics of human perception and not around deep learned speech recognizers! Sensory partners with companies like: Alango, Andrea Electronics Corporation, Bolom, DSP Concepts, Meeami Technologies, MightyWorks, Phillips, and Yobe to manage noise for a wide range of environments and usages.

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2022-05-16,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 SmellLikeAISpirit 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • Which languages?
  • What size engine and platform?
  • What domain coverage?
  • What about accuracy?
  • What noise or signal to noise ratio?
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档