我正在开发一个电报机器人,用户在其中发送一个语音信息,机器人转录它并发送回在文本中所说的内容。为此,我正在使用python-电-bot库和speech_recognition库与google引擎。我的问题是,用户发送的语音消息是.mp3,但是为了转录它们,我需要将它们转换为.wav。为了做到这一点,我必须下载发送给机器人的文件。有办法避免这种情况吗?我知道这不是一个有效和安全的方法来做到这一点,因为许多活跃用户同时将导致比赛条件,并占用了大量的空间。
def voice_handler(update, context):
bot = context.bot
file = bot.getFile(update.message.voice.file_id)
file.download('voice.mp3')
filename = "voice.wav"
# convert mp3 to wav file
subprocess.call(['ffmpeg', '-i', 'voice.mp3',
'voice.wav', '-y'])
# initialize the recognizer
r = sr.Recognizer()
# open the file
with sr.AudioFile(filename) as source:
# listen for the data (load audio to memory)
audio_data = r.record(source)
# recognize (convert from speech to text)
text = r.recognize_google(audio_data, language='ar-AR')
def main() -> None:
updater.dispatcher.add_handler(MessageHandler(Filters.voice, voice_handler)) 发布于 2022-06-25 06:20:45
正如注释中所指出的,一种选择是将文件下载到内存,而不是磁盘。如果这不适用于您,您可以每次只给文件一个唯一的id -例如使用用户user_id,甚至uuid -这将防止文件被覆盖。
发布于 2022-06-25 12:55:53
有趣的是,Speech Recognition使用Google Speech-To-Text,它必须获得wav,但是Google Speech-To-Text API的文档显示,它也可以与mp3和其他格式一起工作。见所有支持音频编码
当我检查Speech Recognition的源代码时,我看到它得到了wav,但是它在发送到Google Speech-To-Text之前将它转换为flac。
您可以尝试直接使用Speech-To-Text API,但这可能需要在Google上注册自己的应用程序才能获得API Key。请参阅更多演讲对文本
编辑:
我选择了源代码,Speech Recognition在其中使用Google Speech-To-Text,我和我从谷歌文档中提取了一些代码,我创建了自己的版本,可以直接发送mp3。
它使用来自Speech Recognition - 'AIzaSyBOti4mM-6x9WDnZIjIeyEU21OpBXqWBgw'的API密钥。
import requests
import base64
#filename = 'test/audio/audio2-hello-world-of-python.wav'
filename = 'test/audio/audio2-hello-world-of-python.mp3'
with open(filename, 'rb') as fh:
file_data = fh.read()
# --- Google Speech-To-Text ---
data = {
"audio": {
"content": base64.b64encode(file_data)
},
"config": {
"enableAutomaticPunctuation": True,
# "encoding": "LINEAR16", # WAV
"encoding": "MP3", # MP3
"languageCode": "en-US",
"model": "video",
}
}
payload = {
'key': 'AIzaSyBOti4mM-6x9WDnZIjIeyEU21OpBXqWBgw',
}
url = 'https://speech.googleapis.com/v1p1beta1/speech:recognize'
response = requests.post(url, params=payload, json=data)
#print(response.status_code)
#print(response.text)
data = response.json()
text = data['results'][0]['alternatives'][0]['transcript']
print(text)在代码中,我从磁盘读取文件,但是使用io.Bytes,您可能会从bot获得数据,而无需在磁盘上写入。
file = bot.getFile(update.message.voice.file_id)
with io.Bytes() as fh:
file.download(fh)
#fh.seek(0) # move to the beginning of file
#file_data = fh.read()
file_data = fh.getvalue()编辑:
最低限度的工作机器人代码-我用上传的文件.mp3 (而不是语音)测试它
import os
import telegram
from telegram.ext import Updater, MessageHandler, CommandHandler, Filters
import requests
import base64
import io
# --- functions ---
def speech_to_text(file_data, encoding='LINEAR16', lang='en-US'):
data = {
"audio": {
"content": base64.b64encode(file_data)
},
"config": {
"enableAutomaticPunctuation": True,
# "encoding": "LINEAR16", # WAV
# "encoding": "MP3", # MP3
"encoding": encoding,
"languageCode": lang,
"model": "video",
}
}
payload = {
'key': 'AIzaSyBOti4mM-6x9WDnZIjIeyEU21OpBXqWBgw',
}
url = 'https://speech.googleapis.com/v1p1beta1/speech:recognize'
response = requests.post(url, params=payload, json=data)
#print('response:', response.text)
try:
data = response.json()
return data['results'][0]['alternatives'][0]['transcript']
except Exception as ex:
print('Exception:', ex)
print('response:', response.text)
#return None
#return None
# --- init ---
TOKEN = os.getenv('TELEGRAM_TOKEN')
bot = telegram.Bot(TOKEN)
updater = Updater(token=TOKEN, use_context=True)
dispatcher = updater.dispatcher
# --- commands ---
# - upload audio file -
def translate_audio(update, context):
print('translate_audio')
with io.BytesIO() as fh:
#context.bot.get_file(update.message.voice.file_id).download(out=fh)
context.bot.get_file(update.message.audio.file_id).download(out=fh)
file_data = fh.getvalue()
text = speech_to_text(file_data, 'MP3')
if not text:
text = "I don't understand this file"
update.message.reply_text(text)
dispatcher.add_handler(MessageHandler(Filters.audio, translate_audio))
# - record voice -
def translate_voice(update, context):
print('translate_voice')
with io.BytesIO() as fh:
context.bot.get_file(update.message.voice.file_id).download(out=fh)
#context.bot.get_file(update.message.audio.file_id).download(out=fh)
file_data = fh.getvalue()
text = speech_to_text(file_data, 'MP3')
if not text:
text = "I don't understand this file"
update.message.reply_text(text)
dispatcher.add_handler(MessageHandler(Filters.voice, translate_voice))
# --- start ---
print('starting ...')
updater.start_polling()
updater.idle()https://stackoverflow.com/questions/72743713
复制相似问题