问Azure语音到文本的Python start_continuous_recognition女士不会停留在流的末尾。
EN

Stack Overflow用户

提问于 2022-11-13 23:16:34

回答 1查看 11关注 0票数 0

我将Azure女士演讲至文字服务与Python结合使用。

我的data输入是一个字节字符串，只有几秒钟的音频。我的期望是云服务在流结束后停止处理音频，并返回识别的文本。相反，需要大约5分钟才能触发recognized事件。

           speech_config = speechsdk.SpeechConfig(subscription=API_KEY,
                                                   region="westeurope",
                                                   speech_recognition_language='de-DE')
            stream = PushAudioInputStream(stream_format=
                                          AudioStreamFormat(samples_per_second=sample_rate, bits_per_sample=SAMPLE_WIDTH * 8,
                                                            compressed_stream_format=speechsdk.AudioStreamContainerFormat.FLAC))
            audio_input = speechsdk.AudioConfig(stream=stream)
            stream.write(data)
            speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_input)
            speech_recognizer.start_continuous_recognition()
            done = False

            def stop_recognition(evt):
                logger.debug("Stopped MS Azure recognition: %s", evt)
                nonlocal done
                done = True

            def recognized(evt):
                logger.info("Recognized MS Azure transcript: %s", evt)
                nonlocal text
                text += " " + evt.result.text

            speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
            speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
            speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
            speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
            speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))

            speech_recognizer.recognized.connect(recognized)
            speech_recognizer.session_stopped.connect(stop_recognition)
            speech_recognizer.canceled.connect(stop_recognition)
            while not done:
                time.sleep(.5)
            speech_recognizer.stop_continuous_recognition()

相反，我看到了一个5分钟的延迟：

2022-11-13 23:58:19,504 - speech_processing.speech_recognition.speech_recognition - DEBUG - Sending 192000 bytes (6 sec) for recognition
RECOGNIZING: SpeechRecognitionEventArgs(session_id=2e4c92f4fed6498f8f5260199bdcc5d7, result=SpeechRecognitionResult(result_id=50e5c478cdc34e0a8ced3867be493bc3, text="telefon", reason=ResultReason.RecognizingSpeech))
RECOGNIZING: SpeechRecognitionEventArgs(session_id=2e4c92f4fed6498f8f5260199bdcc5d7, result=SpeechRecognitionResult(result_id=d1448833ac8f40ef9c1ebc4cae488bcd, text="telefonspeicher", reason=ResultReason.RecognizingSpeech))
RECOGNIZING: SpeechRecognitionEventArgs(session_id=2e4c92f4fed6498f8f5260199bdcc5d7, result=SpeechRecognitionResult(result_id=cdf9f074c13b4a2c94960ec147db765c, text="telefon speichere", reason=ResultReason.RecognizingSpeech))
RECOGNIZING: SpeechRecognitionEventArgs(session_id=2e4c92f4fed6498f8f5260199bdcc5d7, result=SpeechRecognitionResult(result_id=548133156bb44dc8ae08fd0848fa8ec5, text="telefon speichere als", reason=ResultReason.RecognizingSpeech))
RECOGNIZING: SpeechRecognitionEventArgs(session_id=2e4c92f4fed6498f8f5260199bdcc5d7, result=SpeechRecognitionResult(result_id=c03970619f1e42278b2a2ef19ee4f1fe, text="telefon speichere als bärbel", reason=ResultReason.RecognizingSpeech))
RECOGNIZING: SpeechRecognitionEventArgs(session_id=2e4c92f4fed6498f8f5260199bdcc5d7, result=SpeechRecognitionResult(result_id=ff5f6a18d1e4409cab2661582cb8a693, text="telefon speichere als bärbel 0", reason=ResultReason.RecognizingSpeech))
RECOGNIZING: SpeechRecognitionEventArgs(session_id=2e4c92f4fed6498f8f5260199bdcc5d7, result=SpeechRecognitionResult(result_id=a3cb8f82c62b4235abc2fea2696342f8, text="telefon speichere als bärbel 03", reason=ResultReason.RecognizingSpeech))
RECOGNIZING: SpeechRecognitionEventArgs(session_id=2e4c92f4fed6498f8f5260199bdcc5d7, result=SpeechRecognitionResult(result_id=cafc805031654aa4865a4fe1b742d1cd, text="telefon speichere als bärbel 038", reason=ResultReason.RecognizingSpeech))
RECOGNIZING: SpeechRecognitionEventArgs(session_id=2e4c92f4fed6498f8f5260199bdcc5d7, result=SpeechRecognitionResult(result_id=69782c9485244e3191846b924adb3807, text="telefon speichere als bärbel 0385", reason=ResultReason.RecognizingSpeech))
RECOGNIZED: SpeechRecognitionEventArgs(session_id=2e4c92f4fed6498f8f5260199bdcc5d7, result=SpeechRecognitionResult(result_id=9d92890d52d84b7f926a6977d6324ca1, text="Telefon speichere als Bärbel 0385.", reason=ResultReason.RecognizedSpeech))
2022-11-14 00:03:26,487 - speech_processing.speech_recognition.speech_recognition - INFO - Recognized MS Azure transcript: SpeechRecognitionEventArgs(session_id=2e4c92f4fed6498f8f5260199bdcc5d7, result=SpeechRecognitionResult(result_id=9d92890d52d84b7f926a6977d6324ca1, text="Telefon speichere als Bärbel 0385.", reason=ResultReason.RecognizedSpeech))

speech-to-text

azure-cognitive-services

回答 1

Stack Overflow用户

发布于 2022-11-13 23:42:31

我发现了我的错误：

流必须关闭：

stream.write(data)
stream.close()

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/74425524

复制

相似问题

问Azure语音到文本的Python start_continuous_recognition女士不会停留在流的末尾。
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Azure语音到文本的Python start_continuous_recognition女士不会停留在流的末尾。EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Azure语音到文本的Python start_continuous_recognition女士不会停留在流的末尾。
EN