文章/答案/技术大牛

发布

社区首页 >问答首页 >Google语音识别库在Python有非常慢的speech_to_text()时代

问Google语音识别库在Python有非常慢的speech_to_text()时代
EN

Stack Overflow用户

提问于 2022-03-24 21:15:30

回答 1查看 557关注 0票数 0

正如标题所述，我正在尝试使用speech_recognition库持续侦听名为蓝宝石的AI。在重新启动代码后大约一分钟，它工作正常，但是，在运行了超过一分钟之后，speech_to_text()将永远运行。

任何帮助将不胜感激，我正在寻找某种形式的解决方案，这一问题。也许我对函数的理解不够好，或者可能有一种方法可以在一段时间后停止speech_to_text()函数。

除了使用线程的语音版本之外，我还在运行机器人的文本/电子邮件版本，但是在涉及线程之前，speech_to_text()出现了这个问题。

谢谢你的帮助!

这是输出：

Me  -->  Sapphire what time is it
speech_to_text() Time =  5.611827599990647
Sapphire -->  16:46.
Listening...
Me  -->  ERROR
speech_to_text() Time =  3.4650153999973554
Listening...
Me  -->  ERROR
speech_to_text() Time =  6.241592899998068
Listening...
Me  -->  ERROR
speech_to_text() Time =  12.198483600004693
Listening...
Me  -->  ERROR
speech_to_text() Time =  3.7981161000061547
Listening...
Me  -->  shoe stamps
speech_to_text() Time =  51.52946890000021
Listening...
Me  -->  ERROR
speech_to_text() Time =  6.57019980000041
Listening...
Me  -->  ERROR
speech_to_text() Time =  46.647391800011974
Listening...

下面是我运行蓝宝石AI的代码：

class ChatBot():
    def __init__(self, name):
        print("----- Starting up", name, "-----")
        self.name = name

    def speech_to_text(self):
        recognizer = sr.Recognizer()
        # with sr.Microphone(device_index=3) as mic:
        with sr.Microphone() as mic:
            recognizer.adjust_for_ambient_noise(mic)
            print("Listening...")
            audio = recognizer.listen(mic)
            self.text="ERROR"
        try:
            self.text = recognizer.recognize_google(audio)
            print("Me  --> ", self.text)
        except:
            print("Me  -->  ERROR")

    @staticmethod
    def text_to_speech(text):
        if text == "":
            print("ERROR")
        else:
            print((ai.name+" --> "), text)
            speaker = gTTS(text=text, lang="en", slow=False)
            speaker.save("res.mp3")

            vlc_instance = vlc.Instance("--no-video")
            player = vlc_instance.media_player_new()

            media = vlc_instance.media_new("res.mp3")

            player.set_media(media)
            player.play()


    def wake_up(self, text):
        return True if (self.name).lower() in text.lower() else False


def parse_input(txt):
    ## action time
    if "time" in txt and "is" in txt and "it" in txt:
        res = action_time()
    elif ai.name.lower() in txt:
        res = np.random.choice(
            ["That's me!, Sapphire!", "Hello I am Sapphire the AI", "Yes I am Sapphire!", "My name is Sapphire, okay?!", "I am Sapphire and I am alive!",
             "It's-a Me!, Sapphire!"])
    ## respond politely
    elif any(i in txt for i in ["thank", "thanks"]):
        res = np.random.choice(
            ["you're welcome!", "anytime!", "no problem!", "cool!", "I'm here if you need me!",
             "mention not."])
    elif any(i in txt for i in ["exit", "close"]):
        res = np.random.choice(
            ["Tata!", "Have a good day!", "Bye!", "Goodbye!", "Hope to meet soon!", "peace out!"])
        ex = False
    ## conversation
    else:
        if txt == "ERROR":
            # res="Sorry, come again?"
            res = ""
        else:
            starttime1 = timeit.default_timer()
            chat = nlp(transformers.Conversation(txt), pad_token_id=50256)
            endtime1 = timeit.default_timer()
            print("Transformer Time = ", (endtime1 - starttime1))
            res = str(chat)
            res = res[res.find("bot >> ") + 6:].strip()
    return res

def sapphire_audio():
    ex = True
    start = 0
    while ex:
        starttime1 = timeit.default_timer()
        ai.speech_to_text()
        endtime1 = timeit.default_timer()
        print("speech_to_text() Time = ", (endtime1 - starttime1))
        ## wake up
        if ai.wake_up(ai.text) is True:
            #remove Sapphire from phrase
            ai.text = ai.text.lower().replace(ai.name.lower(), "", 1)
            if start == 0:
                res = "Hello I am Sapphire the AI, what can I do for you?"
                start = 1
            else:
                res = parse_input(ai.text)
            ai.text_to_speech(res)

if __name__ == "__main__":

    os.environ["TOKENIZERS_PARALLELISM"] = "true"

    # sapphire_email()
    threading.Thread(target=sapphire_email).start()
    threading.Thread(target=sapphire_audio).start()

runtime

speech-recognition

google-speech-api

google-speech-to-text-api

python

回答 1

Stack Overflow用户

发布于 2022-03-24 22:05:04

首先，试着衡量哪种方法需要那么长的时间才能执行。是listen()方法还是recognize_google()

在开始时尝试使用函数recognizer.adjust_for_ambient_noise(mic)一次，而不是每次使用speech_to_text()函数时都使用函数，看看之后会发生什么。

函数recognizer.listen(mic)等待来自麦克风的音频下降到recognizer.adjust_for_ambient_noise(mic)设置的某个阈值。

我假设有时阈值设置得太低，要达到这种水平的环境噪声，你需要等待很长时间。(检查你的麦克风是否大胆？听一听，分析环境噪声是否不时变化？)

此外，您正在使用公共API键将音频发送到Google服务器。这只是一个猜测，但可能会提供一些额外的延迟发送长长度的音频数据，使用的家庭互联网上传速度不高。也许Google，因为你用公共API键发送了很多请求，并没有对你的请求进行排序，这可能会导致又一次的延迟。

但这只是猜测。试着做我刚开始写的东西，我们会想出办法的。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/71609501

复制

相似问题

问Google语音识别库在Python有非常慢的speech_to_text()时代
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Google语音识别库在Python有非常慢的speech_to_text()时代EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Google语音识别库在Python有非常慢的speech_to_text()时代
EN