首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Google语音识别库在Python有非常慢的speech_to_text()时代

Google语音识别库在Python有非常慢的speech_to_text()时代
EN

Stack Overflow用户
提问于 2022-03-24 21:15:30
回答 1查看 557关注 0票数 0

正如标题所述,我正在尝试使用speech_recognition库持续侦听名为蓝宝石的AI。在重新启动代码后大约一分钟,它工作正常,但是,在运行了超过一分钟之后,speech_to_text()将永远运行。

任何帮助将不胜感激,我正在寻找某种形式的解决方案,这一问题。也许我对函数的理解不够好,或者可能有一种方法可以在一段时间后停止speech_to_text()函数。

除了使用线程的语音版本之外,我还在运行机器人的文本/电子邮件版本,但是在涉及线程之前,speech_to_text()出现了这个问题。

谢谢你的帮助!

这是输出:

代码语言:javascript
复制
Me  -->  Sapphire what time is it
speech_to_text() Time =  5.611827599990647
Sapphire -->  16:46.
Listening...
Me  -->  ERROR
speech_to_text() Time =  3.4650153999973554
Listening...
Me  -->  ERROR
speech_to_text() Time =  6.241592899998068
Listening...
Me  -->  ERROR
speech_to_text() Time =  12.198483600004693
Listening...
Me  -->  ERROR
speech_to_text() Time =  3.7981161000061547
Listening...
Me  -->  shoe stamps
speech_to_text() Time =  51.52946890000021
Listening...
Me  -->  ERROR
speech_to_text() Time =  6.57019980000041
Listening...
Me  -->  ERROR
speech_to_text() Time =  46.647391800011974
Listening...

下面是我运行蓝宝石AI的代码:

代码语言:javascript
复制
class ChatBot():
    def __init__(self, name):
        print("----- Starting up", name, "-----")
        self.name = name

    def speech_to_text(self):
        recognizer = sr.Recognizer()
        # with sr.Microphone(device_index=3) as mic:
        with sr.Microphone() as mic:
            recognizer.adjust_for_ambient_noise(mic)
            print("Listening...")
            audio = recognizer.listen(mic)
            self.text="ERROR"
        try:
            self.text = recognizer.recognize_google(audio)
            print("Me  --> ", self.text)
        except:
            print("Me  -->  ERROR")

    @staticmethod
    def text_to_speech(text):
        if text == "":
            print("ERROR")
        else:
            print((ai.name+" --> "), text)
            speaker = gTTS(text=text, lang="en", slow=False)
            speaker.save("res.mp3")

            vlc_instance = vlc.Instance("--no-video")
            player = vlc_instance.media_player_new()

            media = vlc_instance.media_new("res.mp3")

            player.set_media(media)
            player.play()


    def wake_up(self, text):
        return True if (self.name).lower() in text.lower() else False


def parse_input(txt):
    ## action time
    if "time" in txt and "is" in txt and "it" in txt:
        res = action_time()
    elif ai.name.lower() in txt:
        res = np.random.choice(
            ["That's me!, Sapphire!", "Hello I am Sapphire the AI", "Yes I am Sapphire!", "My name is Sapphire, okay?!", "I am Sapphire and I am alive!",
             "It's-a Me!, Sapphire!"])
    ## respond politely
    elif any(i in txt for i in ["thank", "thanks"]):
        res = np.random.choice(
            ["you're welcome!", "anytime!", "no problem!", "cool!", "I'm here if you need me!",
             "mention not."])
    elif any(i in txt for i in ["exit", "close"]):
        res = np.random.choice(
            ["Tata!", "Have a good day!", "Bye!", "Goodbye!", "Hope to meet soon!", "peace out!"])
        ex = False
    ## conversation
    else:
        if txt == "ERROR":
            # res="Sorry, come again?"
            res = ""
        else:
            starttime1 = timeit.default_timer()
            chat = nlp(transformers.Conversation(txt), pad_token_id=50256)
            endtime1 = timeit.default_timer()
            print("Transformer Time = ", (endtime1 - starttime1))
            res = str(chat)
            res = res[res.find("bot >> ") + 6:].strip()
    return res

def sapphire_audio():
    ex = True
    start = 0
    while ex:
        starttime1 = timeit.default_timer()
        ai.speech_to_text()
        endtime1 = timeit.default_timer()
        print("speech_to_text() Time = ", (endtime1 - starttime1))
        ## wake up
        if ai.wake_up(ai.text) is True:
            #remove Sapphire from phrase
            ai.text = ai.text.lower().replace(ai.name.lower(), "", 1)
            if start == 0:
                res = "Hello I am Sapphire the AI, what can I do for you?"
                start = 1
            else:
                res = parse_input(ai.text)
            ai.text_to_speech(res)

if __name__ == "__main__":

    os.environ["TOKENIZERS_PARALLELISM"] = "true"

    # sapphire_email()
    threading.Thread(target=sapphire_email).start()
    threading.Thread(target=sapphire_audio).start()
EN

回答 1

Stack Overflow用户

发布于 2022-03-24 22:05:04

首先,试着衡量哪种方法需要那么长的时间才能执行。是listen()方法还是recognize_google()

在开始时尝试使用函数recognizer.adjust_for_ambient_noise(mic)一次,而不是每次使用speech_to_text()函数时都使用函数,看看之后会发生什么。

函数recognizer.listen(mic)等待来自麦克风的音频下降到recognizer.adjust_for_ambient_noise(mic)设置的某个阈值。

我假设有时阈值设置得太低,要达到这种水平的环境噪声,你需要等待很长时间。(检查你的麦克风是否大胆?听一听,分析环境噪声是否不时变化?)

此外,您正在使用公共API键将音频发送到Google服务器。这只是一个猜测,但可能会提供一些额外的延迟发送长长度的音频数据,使用的家庭互联网上传速度不高。也许Google,因为你用公共API键发送了很多请求,并没有对你的请求进行排序,这可能会导致又一次的延迟。

但这只是猜测。试着做我刚开始写的东西,我们会想出办法的。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/71609501

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档