我将Azure女士演讲至文字服务与Python结合使用。
我的data
输入是一个字节字符串,只有几秒钟的音频。我的期望是云服务在流结束后停止处理音频,并返回识别的文本。相反,需要大约5分钟才能触发recognized
事件。
speech_config = speechsdk.SpeechConfig(subscription=API_KEY,
region="westeurope",
speech_recognition_language='de-DE')
stream = PushAudioInputStream(stream_format=
AudioStreamFormat(samples_per_second=sample_rate, bits_per_sample=SAMPLE_WIDTH * 8,
compressed_stream_format=speechsdk.AudioStreamContainerFormat.FLAC))
audio_input = speechsdk.AudioConfig(stream=stream)
stream.write(data)
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_input)
speech_recognizer.start_continuous_recognition()
done = False
def stop_recognition(evt):
logger.debug("Stopped MS Azure recognition: %s", evt)
nonlocal done
done = True
def recognized(evt):
logger.info("Recognized MS Azure transcript: %s", evt)
nonlocal text
text += " " + evt.result.text
speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
speech_recognizer.recognized.connect(recognized)
speech_recognizer.session_stopped.connect(stop_recognition)
speech_recognizer.canceled.connect(stop_recognition)
while not done:
time.sleep(.5)
speech_recognizer.stop_continuous_recognition()
相反,我看到了一个5分钟的延迟:
2022-11-13 23:58:19,504 - speech_processing.speech_recognition.speech_recognition - DEBUG - Sending 192000 bytes (6 sec) for recognition
RECOGNIZING: SpeechRecognitionEventArgs(session_id=2e4c92f4fed6498f8f5260199bdcc5d7, result=SpeechRecognitionResult(result_id=50e5c478cdc34e0a8ced3867be493bc3, text="telefon", reason=ResultReason.RecognizingSpeech))
RECOGNIZING: SpeechRecognitionEventArgs(session_id=2e4c92f4fed6498f8f5260199bdcc5d7, result=SpeechRecognitionResult(result_id=d1448833ac8f40ef9c1ebc4cae488bcd, text="telefonspeicher", reason=ResultReason.RecognizingSpeech))
RECOGNIZING: SpeechRecognitionEventArgs(session_id=2e4c92f4fed6498f8f5260199bdcc5d7, result=SpeechRecognitionResult(result_id=cdf9f074c13b4a2c94960ec147db765c, text="telefon speichere", reason=ResultReason.RecognizingSpeech))
RECOGNIZING: SpeechRecognitionEventArgs(session_id=2e4c92f4fed6498f8f5260199bdcc5d7, result=SpeechRecognitionResult(result_id=548133156bb44dc8ae08fd0848fa8ec5, text="telefon speichere als", reason=ResultReason.RecognizingSpeech))
RECOGNIZING: SpeechRecognitionEventArgs(session_id=2e4c92f4fed6498f8f5260199bdcc5d7, result=SpeechRecognitionResult(result_id=c03970619f1e42278b2a2ef19ee4f1fe, text="telefon speichere als bärbel", reason=ResultReason.RecognizingSpeech))
RECOGNIZING: SpeechRecognitionEventArgs(session_id=2e4c92f4fed6498f8f5260199bdcc5d7, result=SpeechRecognitionResult(result_id=ff5f6a18d1e4409cab2661582cb8a693, text="telefon speichere als bärbel 0", reason=ResultReason.RecognizingSpeech))
RECOGNIZING: SpeechRecognitionEventArgs(session_id=2e4c92f4fed6498f8f5260199bdcc5d7, result=SpeechRecognitionResult(result_id=a3cb8f82c62b4235abc2fea2696342f8, text="telefon speichere als bärbel 03", reason=ResultReason.RecognizingSpeech))
RECOGNIZING: SpeechRecognitionEventArgs(session_id=2e4c92f4fed6498f8f5260199bdcc5d7, result=SpeechRecognitionResult(result_id=cafc805031654aa4865a4fe1b742d1cd, text="telefon speichere als bärbel 038", reason=ResultReason.RecognizingSpeech))
RECOGNIZING: SpeechRecognitionEventArgs(session_id=2e4c92f4fed6498f8f5260199bdcc5d7, result=SpeechRecognitionResult(result_id=69782c9485244e3191846b924adb3807, text="telefon speichere als bärbel 0385", reason=ResultReason.RecognizingSpeech))
RECOGNIZED: SpeechRecognitionEventArgs(session_id=2e4c92f4fed6498f8f5260199bdcc5d7, result=SpeechRecognitionResult(result_id=9d92890d52d84b7f926a6977d6324ca1, text="Telefon speichere als Bärbel 0385.", reason=ResultReason.RecognizedSpeech))
2022-11-14 00:03:26,487 - speech_processing.speech_recognition.speech_recognition - INFO - Recognized MS Azure transcript: SpeechRecognitionEventArgs(session_id=2e4c92f4fed6498f8f5260199bdcc5d7, result=SpeechRecognitionResult(result_id=9d92890d52d84b7f926a6977d6324ca1, text="Telefon speichere als Bärbel 0385.", reason=ResultReason.RecognizedSpeech))
发布于 2022-11-13 23:42:31
我发现了我的错误:
流必须关闭:
stream.write(data)
stream.close()
https://stackoverflow.com/questions/74425524
复制相似问题