文章/答案/技术大牛

发布

社区首页 >问答首页 >不下载音频文件的python-电报-bot语音识别

问不下载音频文件的python-电报-bot语音识别
EN

Stack Overflow用户

提问于 2022-06-24 11:56:06

回答 2查看 519关注 0票数 -2

我正在开发一个电报机器人，用户在其中发送一个语音信息，机器人转录它并发送回在文本中所说的内容。为此，我正在使用python-电-bot库和speech_recognition库与google引擎。我的问题是，用户发送的语音消息是.mp3，但是为了转录它们，我需要将它们转换为.wav。为了做到这一点，我必须下载发送给机器人的文件。有办法避免这种情况吗？我知道这不是一个有效和安全的方法来做到这一点，因为许多活跃用户同时将导致比赛条件，并占用了大量的空间。

def voice_handler(update, context):
    bot = context.bot
    file = bot.getFile(update.message.voice.file_id)
    file.download('voice.mp3')
    filename = "voice.wav"
    
    # convert mp3 to wav file
    subprocess.call(['ffmpeg', '-i', 'voice.mp3',
                         'voice.wav', '-y'])

    # initialize the recognizer
    r = sr.Recognizer()
    
    # open the file
    with sr.AudioFile(filename) as source:
    
        # listen for the data (load audio to memory)
        audio_data = r.record(source)
        # recognize (convert from speech to text)
        text = r.recognize_google(audio_data, language='ar-AR')
        
        
def main() -> None:
    updater.dispatcher.add_handler(MessageHandler(Filters.voice, voice_handler))

ffmpeg

speech-recognition

telegram-bot

python-telegram-bot

python

回答 2

Stack Overflow用户

发布于 2022-06-25 06:20:45

正如注释中所指出的，一种选择是将文件下载到内存，而不是磁盘。如果这不适用于您，您可以每次只给文件一个唯一的id -例如使用用户user_id，甚至uuid -这将防止文件被覆盖。

票数 0

Stack Overflow用户

发布于 2022-06-25 12:55:53

有趣的是，Speech Recognition使用Google Speech-To-Text，它必须获得wav，但是Google Speech-To-Text API的文档显示，它也可以与mp3和其他格式一起工作。见所有支持音频编码

当我检查Speech Recognition的源代码时，我看到它得到了wav，但是它在发送到Google Speech-To-Text之前将它转换为flac。

您可以尝试直接使用Speech-To-Text API，但这可能需要在Google上注册自己的应用程序才能获得API Key。请参阅更多演讲对文本

编辑：

我选择了源代码，Speech Recognition在其中使用Google Speech-To-Text，我和我从谷歌文档中提取了一些代码，我创建了自己的版本，可以直接发送mp3。

它使用来自Speech Recognition - 'AIzaSyBOti4mM-6x9WDnZIjIeyEU21OpBXqWBgw'的API密钥。

import requests
import base64

#filename = 'test/audio/audio2-hello-world-of-python.wav'
filename = 'test/audio/audio2-hello-world-of-python.mp3'

with open(filename, 'rb') as fh:
    file_data = fh.read()

# --- Google Speech-To-Text ---

data = {
  "audio": {
    "content": base64.b64encode(file_data)
  },
  "config": {
    "enableAutomaticPunctuation": True,
#    "encoding": "LINEAR16",  # WAV
    "encoding": "MP3",        # MP3 
    "languageCode": "en-US",
    "model": "video",
  }
}

payload = {
    'key': 'AIzaSyBOti4mM-6x9WDnZIjIeyEU21OpBXqWBgw',
}

url = 'https://speech.googleapis.com/v1p1beta1/speech:recognize'
response = requests.post(url, params=payload, json=data)

#print(response.status_code)
#print(response.text)

data = response.json()
text = data['results'][0]['alternatives'][0]['transcript'] 
print(text)

在代码中，我从磁盘读取文件，但是使用io.Bytes，您可能会从bot获得数据，而无需在磁盘上写入。

file = bot.getFile(update.message.voice.file_id)
with io.Bytes() as fh:
    file.download(fh)
    #fh.seek(0)  # move to the beginning of file
    #file_data = fh.read()
    file_data = fh.getvalue()

编辑：

最低限度的工作机器人代码-我用上传的文件.mp3 (而不是语音)测试它

import os
import telegram
from telegram.ext import Updater, MessageHandler, CommandHandler, Filters
import requests
import base64
import io

# --- functions ---

def speech_to_text(file_data, encoding='LINEAR16', lang='en-US'):
    
    data = {
      "audio": {
        "content": base64.b64encode(file_data)
      },
      "config": {
        "enableAutomaticPunctuation": True,
    #    "encoding": "LINEAR16",  # WAV
    #    "encoding": "MP3",        # MP3
        "encoding": encoding, 
        "languageCode": lang,
        "model": "video",
      }
    }
    
    payload = {
        'key': 'AIzaSyBOti4mM-6x9WDnZIjIeyEU21OpBXqWBgw',
    }

    url = 'https://speech.googleapis.com/v1p1beta1/speech:recognize'
    response = requests.post(url, params=payload, json=data)
    #print('response:', response.text)

    try:
        data = response.json()
        return data['results'][0]['alternatives'][0]['transcript']
    except Exception as ex:
        print('Exception:', ex)
        print('response:', response.text)
        #return None
    
    #return None
        
# --- init ---

TOKEN = os.getenv('TELEGRAM_TOKEN')

bot = telegram.Bot(TOKEN)

updater = Updater(token=TOKEN, use_context=True)
dispatcher = updater.dispatcher

# --- commands ---

# - upload audio file -

def translate_audio(update, context):
    print('translate_audio')

    with io.BytesIO() as fh:
        #context.bot.get_file(update.message.voice.file_id).download(out=fh)
        context.bot.get_file(update.message.audio.file_id).download(out=fh)
        file_data = fh.getvalue()
        
    text = speech_to_text(file_data, 'MP3')
    if not text:
        text = "I don't understand this file"

    update.message.reply_text(text)

dispatcher.add_handler(MessageHandler(Filters.audio, translate_audio))

# - record voice -

def translate_voice(update, context):
    print('translate_voice')

    with io.BytesIO() as fh:
        context.bot.get_file(update.message.voice.file_id).download(out=fh)
        #context.bot.get_file(update.message.audio.file_id).download(out=fh)
        file_data = fh.getvalue()
        
    text = speech_to_text(file_data, 'MP3')
    if not text:
        text = "I don't understand this file"

    update.message.reply_text(text)

dispatcher.add_handler(MessageHandler(Filters.voice, translate_voice))

# --- start ---

print('starting ...')    
updater.start_polling()
updater.idle()

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/72743713

复制

相似问题

问不下载音频文件的python-电报-bot语音识别
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问不下载音频文件的python-电报-bot语音识别EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问不下载音频文件的python-电报-bot语音识别
EN