专栏首页小蝌蚪展览腾讯云流式TTS语音合成客户端实现
原创

腾讯云流式TTS语音合成客户端实现

腾讯云流式TTS介绍

接入文档链接:https://cloud.tencent.com/document/api/441/19499

该接口传入参数为json,目前还不支持云api3.0鉴权,输出协议采用了http chunk协议,数据格式包括opus压缩后的分片和pcm原始音频流,本文将从鉴权开始,详细介绍流式tts的客户端实现。

接口鉴权

1.构造json请求参数,为了方便将参数排序,使用TreeMap存储参数

 mRequestMap.put("Action", "TextToStreamAudio");
 mRequestMap.put("Text", text);
 mRequestMap.put("SessionId", "session-1234");
 mRequestMap.put("AppId", "1255824371");
 mRequestMap.put("Timestamp", "" + System.currentTimeMillis() / 1000L);
 mRequestMap.put("Expired", "" + (System.currentTimeMillis() / 1000L + 600));
 mRequestMap.put("Speed", "0");
 mRequestMap.put("SecretId", SECRET_ID);
 mRequestMap.put("VoiceType", 0 + "");
 mRequestBody =  (new JSONObject(mRequestMap)).toString();                     
                                         

2.生成签名串,按要求拼接字符串后加密即可,这里需要注意仔细阅读鉴权文档的说明,不然很容易出错

private static String generateSign(TreeMap<String, String> params) {
        String paramStr = "POST" + DOMAIN_NAME + "?";
        StringBuilder builder = new StringBuilder(paramStr);
        for (Map.Entry<String, String> entry : params.entrySet()) {
            builder.append(String.format(Locale.CHINESE, "%s=%s", entry.getKey(), String.valueOf(entry.getValue())))
                    .append("&");
        }

        //去掉最后一个&
        builder.deleteCharAt(builder.lastIndexOf("&"));

        String sign = "";
        String source = builder.toString();
        System.out.println(source);
        Mac mac = null;
        try {
            mac = Mac.getInstance("HmacSHA1");
            SecretKeySpec keySpec = new SecretKeySpec(SECRET_KEY.getBytes(), "HmacSHA1");
            mac.init(keySpec);
            mac.update(source.getBytes());
            sign = Base64.encodeToString(mac.doFinal(), 2);
        } catch (NoSuchAlgorithmException | InvalidKeyException e) {
            e.printStackTrace();
        }

        System.out.println("生成签名串:" + sign);
        return sign;
    }

到这里我们就获得了一个完整的签名串,接下来就是本文的重点点部分了,网络请求和网络解析

chunk分块传输编码

这里由于腾讯云采用了http chunk协议返回,不同于常规的http诸如json返回,采用多段分片返回数据的方式。消息体由数量未定的块组成,并以最后一个大小为0的块为结束。

每一个非空的块都以该块包含数据的字节数(字节数16进制以表示)开始,跟随一个CRLF (回车及换行),然后是数据本身,最后块CRLF结束。在一些实现中,块大小和CRLF之间填充有白空格(0x20)。

最后一块是单行,由块大小(0),一些可选的填充白空格,以及CRLF。最后一块不再包含任何数据,但是可以发送可选的尾部,包括消息头字段。

消息最后以CRLF结尾。一个完整的chunk返回示例如下:

HTTP/1.1 200 OK
Content-Type: text/plain
Transfer-Encoding: chunked

25
This is the data in the first chunk

1C
and this is the second one

3
con

8
sequence

0

如果对chunk协议希望有一个完整的了解,可以参考这篇wiki:分块传输编码

请求TTS数据

代码如下,我们直接获取返回数据数据流管道,用于数据读取

private static InputStream obtainResponseStreamWithJava(String postJsonBody, TreeMap<String, String> requestMap) throws IOException {
        //发送POST请求
        URL url = new URL(SERVER_URL);
        HttpURLConnection conn = (HttpURLConnection) url.openConnection();
        String authorization = generateSign(requestMap);
        conn.setRequestMethod("POST");
        conn.setRequestProperty("Content-Type", "application/json");
        conn.setRequestProperty("Authorization", authorization);
        conn.connect();
        OutputStream out = conn.getOutputStream();
        out.write(postJsonBody.getBytes("UTF-8"));
        out.flush();
        out.close();
        if (conn.getResponseCode() != HttpURLConnection.HTTP_OK) {//todo
            Log.w(TAG, "HTTP Code: " + conn.getResponseCode());
        }
//        String result = new String(toByteArray(conn.getInputStream()), "UTF-8");
        InputStream inputStream = conn.getInputStream();
        return inputStream;
    }

OPUS

根据官网的文档得知,数据分为两种,opus压缩和pcm原始音频流,题主了解到opus拥有较好的压缩比(10:1),可以很好的节省传输时间和网络带宽。

opus是开源库,但是是用C++编写的,由于Android5.0以上才支持opus格式的播发,所以如果需要兼容5.0的系统,需要编译so库。opus源码地址

TTS数据解析

这里主要参考官网的java示例,循环读取数据,按以下格式说明不断读取头/序号/长度/音频数据,直到到达数据末尾。

tts分片格式

代码示例如下:

private void processProtocolBufferStream(final InputStream inputStream) throws DeserializationException {
            final long start = System.currentTimeMillis();

            YoutuOpusDecoder decoder = null;

            List<PcmData> pcmCache = new ArrayList<>();
            boolean fillSuccess;
            int pbPkgCount = -1;

            while (!Thread.currentThread().isInterrupted()) {
                pbPkgCount++;
                try {
                    //read head
                    byte[] headBuffer = new byte[4];
                    fillSuccess = fill(inputStream, headBuffer);
                    if (!fillSuccess) {
                        throw new ReadBufferException(String.format("read PB pkg#%s size header fail, break;", pbPkgCount));
                    }
                    //read seq
                    byte[] seqBuffer = new byte[4];
                    fillSuccess = fill(inputStream, seqBuffer);
                    if (!fillSuccess) {
                        throw new ReadBufferException(String.format("read PB pkg#%s size header fail, break;", pbPkgCount));
                    }
                    int seq = bytesToInt(seqBuffer);
                    //read pkg size
                    byte[] pbPkgSizeHeader = new byte[4];
                    fillSuccess = fill(inputStream, pbPkgSizeHeader);
                    if (!fillSuccess) {
                        throw new ReadBufferException(String.format("read PB pkg#%s size header fail, break;", pbPkgCount));
                    }
                    int pbPkgSize = bytesToInt(pbPkgSizeHeader);
                    Log.i(TAG, String.format("PB pkg#%s size = %s", pbPkgCount, pbPkgSize));
                    if (pbPkgCount == 0) {
                        sTimeEnd = System.currentTimeMillis();
                        sTimeCost = sTimeEnd - sTimeStart;
                    }
                    if (pbPkgSize <= 0) {
                        throw new ReadBufferException(String.format("PB pkg#%s size %s <= 0, break;", pbPkgCount, pbPkgSize));
                    } else if (pbPkgSize > 5000) {
                        throw new ReadBufferException(String.format("PB pkg#%s size %s > 5000 bytes, too large, break;", pbPkgCount, pbPkgSize));
                    }

                    //read pb pkg
                    byte[] pbPkg = new byte[pbPkgSize];
                    fillSuccess = fill(inputStream, pbPkg);
                    if (!fillSuccess) {
                        throw new ReadBufferException(String.format("read PB pkg#%s fail, break;", pbPkgCount));
                    }

                    //init decoder
                    if (decoder == null) {
                        decoder = new YoutuOpusDecoder();
                        decoder.config();
                    }
                    //decode
                    Log.i("DEBUG-1", "seq:" + seq);
                    Pair<Integer, short[]> pair = decoder.decodeTTSData(seq, pbPkg);
                    short[] pcm = pair.second;

                    Log.d(TAG, (pcm == null ? "fail decode #" : "decode #") + pbPkgCount);

                    //packaging pcm
                    if (pcm == null) {
                        pcm = new short[0];
                    }
                    PcmData pcmData = new PcmData(pcm, seq == -1);

                    //stop check
                    if (Thread.currentThread().isInterrupted()) {
                        Log.w(TAG, "pcm data ready, but thread is interrupted, break;");
                        break;
                    }

                    //init player
                    if (mOpusPlayer == null) {
                        mOpusPlayer = new OpusPlayer();
                        mOpusPlayer.setPcmSampleRate(16000);
                        mOpusPlayer.setUncaughtExceptionHandler(new UncaughtExceptionHandler() {
                            @Override
                            public void uncaughtException(Thread thread, Throwable ex) {
                                if (mTtsExceptionHandler != null) {
                                    mTtsExceptionHandler.onPlayException(thread, ex);
                                }
                            }
                        });
                    }

                    //enqueue
                    if (pbPkgCount < mCacheCount) {//缓冲
                        pcmCache.add(pcmData);
                    } else {//enqueue
                        for (PcmData d : pcmCache) {
                            mOpusPlayer.enqueue(d);
                        }
                        pcmCache.clear();
                        mOpusPlayer.enqueue(pcmData);
                    }

                    //end
                    if (seq == -1) {
                        long ms = System.currentTimeMillis() - start;
                        Log.d(TAG, "finish last pb pkg#" + pbPkgCount + ", total cast time " + ms + " ms");
                        break;
                    }
                } catch (Exception e) {
                    if (mOpusPlayer != null) {
                        mOpusPlayer.forceStop();
                    }
                    if (e instanceof InterruptedIOException) {
                        Log.i(TAG, "Interrupted while reading server response InputStream", e);// 正常流程, 无需抛出异常
                    } else {
                        throw new DeserializationException(e);
                    }
                }
            }
        }

其中,按小端字节读取方式如下:

 /**
     * 从 InputStream 读取内容到 buffer, 直到 buffer 填满
     *
     * @return 如果 InputStream 内容不足以填满 buffer, 则返回 false.
     * @throws IOException 可能抛出的异常
     */
    private static boolean fill(InputStream in, byte[] buffer) throws IOException {
        int length = buffer.length;
        int hasRead = 0;
        while (true) {
            int offset = hasRead;
            int count = length - hasRead;
            int currentRead = in.read(buffer, offset, count);
            if (currentRead >= 0) {
                hasRead += currentRead;
                if (hasRead == length) {
                    return true;
                }
            }
            if (currentRead == -1) {
                return false;
            }
        }
    }

TTS语音播放

TTS完成解析的数据都经由YoutuOpusDecoder类进行播放,此处主要封装了两个功能,第一个功能是封装了AudioTrack播放pcm原始音频,第二个是将解析完成的音频不断送入播放器

完整代码如下:

public class OpusPlayer {
    private static final String TAG = "OpusPlayer";

    private BlockingQueue<PcmData> mPcmQueue = new LinkedBlockingQueue<>();
    private volatile Thread mPlayThread;
    private int mPcmSampleRate;
    private UncaughtExceptionHandler mUncaughtExceptionHandler;

    public void setUncaughtExceptionHandler(UncaughtExceptionHandler handler) {
        mUncaughtExceptionHandler = handler;
    }

    public void setPcmSampleRate(int pcmSampleRate) {
        mPcmSampleRate = pcmSampleRate;
    }
    

    public void enqueue(PcmData pcmData) {
        mPcmQueue.add(pcmData);

        if (mPlayThread == null) {
            mPlayThread = new Thread(new Runnable() {

                PcmPlayer mPlayer;

                @Override
                public void run() {
                    Log.d(TAG, getThreadLogPrefix() + "start");
                    int playerPrepareFailCount = 0;
                    int playCount = 0;
                    long start = System.currentTimeMillis();

                    while (!Thread.currentThread().isInterrupted()) {
                        
                        //准备播放器
                        boolean isPlayerReady = preparePlayerIfNeeded();
                        if (!isPlayerReady) {
                            releasePlayer();
                            playerPrepareFailCount++;
                            if (playerPrepareFailCount > 5) {
                                releasePlayer();
                                throw new RuntimeException("prepare player fail too many times, abort.");//不再尝试了
                            } else {
                                Log.w(TAG, getThreadLogPrefix() + "prepare player fail, retry.");
                                continue;//再尝试
                            }
                        }

                        //出队
                        PcmData pcmData;
                        try {
                            pcmData = mPcmQueue.take();
                        } catch (InterruptedException e) {
                            e.printStackTrace();
                            Log.d(TAG, getThreadLogPrefix() + "force stop");
                            break;
                        }
                        
                        //播放
                        if (pcmData != null) {
                            try {
                                short[] pcm = pcmData.getPcm();
                                if (pcm != null) {
                                    mPlayer.play(pcm);
                                    Log.d(TAG, getThreadLogPrefix() + "play #" + playCount);
                                } else {
                                    Log.d(TAG, getThreadLogPrefix() + "play #" + playCount + " fail, pcm == null !!");
                                }
                                if (pcmData.isLastOne()) {
                                    Log.d(TAG, getThreadLogPrefix() + "finish all task, will stop");
                                    break;
                                }
                                playCount++;
                            } catch (AudioTrackException e) {
                                e.printStackTrace();
                                releasePlayer();//下一个循环会尝试重新初始化 player
                            }
                        } else {
                            Log.w(TAG, getThreadLogPrefix() + "mPcmQueue.take() == null, nothing to play");
                        }
                    }

                    releasePlayer();
                    long time = System.currentTimeMillis() - start;
                    Log.d(TAG, getThreadLogPrefix() + "stop, ran " + time + " ms");
                }

                /**
                 * @return true: player is ready
                 */
                boolean preparePlayerIfNeeded() {
                    if (mPlayer == null) {
                        mPlayer = new PcmPlayer();
                        try {
                            mPlayer.prepare(AudioManager.STREAM_MUSIC, mPcmSampleRate, AudioFormat.CHANNEL_OUT_MONO, AudioFormat.ENCODING_PCM_16BIT);
                        } catch (AudioTrackException e) {
                            e.printStackTrace();
                            releasePlayer();
                        }
                    }
                    return mPlayer != null;
                }

                void releasePlayer() {
                    if (mPlayer != null) {
                        mPlayer.release();
                        mPlayer = null;
                    }
                }

            });
            mPlayThread.setPriority(Thread.NORM_PRIORITY - 1);//播放耗时最长, 优先级比解码线程低一点, 可以让出多一点时间给解码线程
            mPlayThread.setName(TAG + ".mPlayThread");
            if (mUncaughtExceptionHandler != null) {
                mPlayThread.setUncaughtExceptionHandler(mUncaughtExceptionHandler);
            }
            mPlayThread.start();
        }
    }

    private static String getThreadLogPrefix() {
        Thread currentThread = Thread.currentThread();
        String s = currentThread.getName() + "#" + currentThread.getId() + ": ";
        return s;
    }
    
    public void forceStop() {
        if (mPlayThread != null && !mPlayThread.isInterrupted()) {
            mPlayThread.interrupt();
            mPlayThread = null;
        }
        mPcmQueue.clear();
    }

    public static class PcmData {
        private final short[] mPcm;
        private final boolean mIsLastOne;

        public PcmData(short[] pcm, boolean isLastOne) {
            mPcm = pcm;
            mIsLastOne = isLastOne;
        }

        short[] getPcm() {
            return mPcm;
        }

        boolean isLastOne() {
            return mIsLastOne;
        }
    }


}

原创声明,本文系作者授权云+社区发表,未经许可,不得转载。

如有侵权,请联系 yunjia_community@tencent.com 删除。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • 【玩转腾讯云】【腾讯云语音合成】智能语音交互之语音合成篇

    语音合成(Text To Speech,TTS)满足将文本转化成拟人化语音的需求,打通人机交互闭环。 提供多种音色选择,支持自定义音量、语速,让发音更自然、更专...

    ruskin
  • 012.golang 接口interface

    qubianzhong
  • Python | &quot;整理一些模块,不用也能拿来吹&quot;

    2、刚好今天有读者向我提问的时候,看到有这么道题,写出你使用过的模块,并简单描述一下。

    LogicPanda
  • composer使用私有仓库

    "repositories": [ { "type": "vcs", "url": "htt...

    苦咖啡
  • 从头编写 asp.net core 2.0 web api 基础框架 (5) EF CRUD

    Github源码地址:https://github.com/solenovex/Building-asp.net-core-2-web-api-starter-...

    solenovex
  • Jmeter系列(14)- 逻辑控制器Logic Controllers 的入门介绍

    https://www.cnblogs.com/poloyy/category/1746599.html

    小菠萝测试笔记
  • .NET跨平台之旅:在Linux上以本地机器码(native)运行ASP.NET Core站点

    在将“.NET跨平台之旅”示例站点 about.cnblogs.com 从 ASP.NET 5 RC1 升级至 ASP.NET Core 1.0 (博文链接)之...

    逸鹏
  • mac设置端口转发前言解决方案

    macos对于开发者已经相当友好了,使用brew可以很方便地搭建开发环境,可终究与线上有所不同,于是在本地用vagrant搭建了centos环境。由于vagan...

    章鱼喵
  • WPF 绑定密码

    我们发现我们无法绑定密码框的密码,PasswordBox 的 Password 不能绑定。 我们想做 MVVM ,我们需要绑定密码,不能使用前台 xaml.cs...

    林德熙
  • 联合国用腾讯会议企业微信,开史上最大规模全球会议

    ? 本文转载自:腾讯 今天,联合国在纽约总部宣布: 和腾讯达成全球合作伙伴关系。 受全球疫情影响,联合国75周年的数千场活动将搬到线上进行,在腾讯会议和企业...

    鹅老师

扫码关注云+社区

领取腾讯云代金券