首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >如何访问Microsoft Speech SDK录制的音频流

如何访问Microsoft Speech SDK录制的音频流
EN

Stack Overflow用户
提问于 2020-04-04 18:34:40
回答 1查看 682关注 0票数 1

我正在用微软的JavaScript语音软件开发工具包转录麦克风数据流。录音和转录都是使用Speech SDK完成的,我还没有找到一种在录音完成后如何访问和保存所录制的音频文件的方法。

用于创建记录器和记录的代码

代码语言:javascript
运行
复制
recognizer = new SpeechSDK.SpeechRecognizer(speechConfig, audioConfig);
// to start the recording
recognizer.startContinuousRecognitionAsync(
    () => {
      portFromCS.postMessage({ type: "started", data: "" });
    },
    err => {
      recognizer.close();
    },
  );
// used after user input to stop the recording
recognizer.stopContinuousRecognitionAsync(
    () => {
      window.console.log("successfully stopped");
      // TODO: somehow need to save the file
    },
    err => {
      window.console.log("error on stop", err);
    },
  );

documentation相当糟糕,我找不到一种内置的方法来使用他们的SDK访问原始音频。我唯一的选择是使用两个音频流进行录制并使用单独的录制流保存文件吗?这意味着什么?

EN

回答 1

Stack Overflow用户

发布于 2020-04-07 00:04:37

SDK不保存音频,也没有内置保存音频的功能。

在1.11.0版本中,connection对象中添加了一个新的API,允许您查看发送到服务的消息,您可以从中提取音频并自行组装wave文件。

下面是一些可以做到这一点的脚本:

代码语言:javascript
运行
复制
import * as SpeechSdk from "microsoft-cognitiveservices-speech-sdk";
import * as fs from "fs";

const filename: string = "input.wav";
const outputFileName: string = "out.wav";
const subscriptionKey: string = "<SUBSCRIPTION_KEY>";
const region: string = "<SUBSCRIPTION_REGION>";

const speechConfig: SpeechSdk.SpeechConfig = SpeechSdk.SpeechConfig.fromSubscription(subscriptionKey, region);

// Load the audio from a file, alternately you could use 
// const audioConfig:SpeechSdk.AudioConfig = SpeechSdk.AudioConfig.fromDefaultMicrophone() in a browser();
const fileContents: Buffer = fs.readFileSync(filename);
const inputStream: SpeechSdk.PushAudioInputStream = SpeechSdk.AudioInputStream.createPushStream();
const audioConfig: SpeechSdk.AudioConfig = SpeechSdk.AudioConfig.fromStreamInput(inputStream);
inputStream.write(fileContents);
inputStream.close();

const r: SpeechSdk.SpeechRecognizer = new SpeechSdk.SpeechRecognizer(speechConfig, audioConfig);
const con: SpeechSdk.Connection = SpeechSdk.Connection.fromRecognizer(r);

let wavFragmentCount: number = 0;

const wavFragments: { [id: number]: ArrayBuffer; } = {};

con.messageSent = (args: SpeechSdk.ConnectionMessageEventArgs): void => {
    // Only record outbound audio mesages that have data in them.
    if (args.message.path === "audio" && args.message.isBinaryMessage && args.message.binaryMessage !== null) {
        wavFragments[wavFragmentCount++] = args.message.binaryMessage;
    }
};

r.recognizeOnceAsync((result: SpeechSdk.SpeechRecognitionResult) => {
    // Find the length of the audio sent.
    let byteCount: number = 0;
    for (let i: number = 0; i < wavFragmentCount; i++) {
        byteCount += wavFragments[i].byteLength;
    }

    // Output array.
    const sentAudio: Uint8Array = new Uint8Array(byteCount);

    byteCount = 0;
    for (let i: number = 0; i < wavFragmentCount; i++) {
        sentAudio.set(new Uint8Array(wavFragments[i]), byteCount);
        byteCount += wavFragments[i].byteLength;
    }

    // Set the file size in the wave header:
    const view = new DataView(sentAudio.buffer);
    view.setUint32(4, byteCount, true);
    view.setUint32(40, byteCount, true);

    // Write the audio back to disk.
    fs.writeFileSync(outputFileName, sentAudio);
    r.close();
});

它从一个文件加载,所以我可以在NodeJS而不是浏览器中测试,但核心部分是相同的。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/61026799

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档