前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >腾讯云流式TTS语音合成客户端实现

腾讯云流式TTS语音合成客户端实现

原创
作者头像
用户1530353
修改2019-08-29 01:21:31
8.7K4
修改2019-08-29 01:21:31
举报
文章被收录于专栏:小蝌蚪展览小蝌蚪展览

腾讯云流式TTS介绍

接入文档链接:https://cloud.tencent.com/document/api/441/19499

该接口传入参数为json,目前还不支持云api3.0鉴权,输出协议采用了http chunk协议,数据格式包括opus压缩后的分片和pcm原始音频流,本文将从鉴权开始,详细介绍流式tts的客户端实现。

接口鉴权

1.构造json请求参数,为了方便将参数排序,使用TreeMap存储参数

代码语言:javascript
复制
 mRequestMap.put("Action", "TextToStreamAudio");
 mRequestMap.put("Text", text);
 mRequestMap.put("SessionId", "session-1234");
 mRequestMap.put("AppId", "1255824371");
 mRequestMap.put("Timestamp", "" + System.currentTimeMillis() / 1000L);
 mRequestMap.put("Expired", "" + (System.currentTimeMillis() / 1000L + 600));
 mRequestMap.put("Speed", "0");
 mRequestMap.put("SecretId", SECRET_ID);
 mRequestMap.put("VoiceType", 0 + "");
 mRequestBody =  (new JSONObject(mRequestMap)).toString();                     
                                         

2.生成签名串,按要求拼接字符串后加密即可,这里需要注意仔细阅读鉴权文档的说明,不然很容易出错

代码语言:javascript
复制
private static String generateSign(TreeMap<String, String> params) {
        String paramStr = "POST" + DOMAIN_NAME + "?";
        StringBuilder builder = new StringBuilder(paramStr);
        for (Map.Entry<String, String> entry : params.entrySet()) {
            builder.append(String.format(Locale.CHINESE, "%s=%s", entry.getKey(), String.valueOf(entry.getValue())))
                    .append("&");
        }

        //去掉最后一个&
        builder.deleteCharAt(builder.lastIndexOf("&"));

        String sign = "";
        String source = builder.toString();
        System.out.println(source);
        Mac mac = null;
        try {
            mac = Mac.getInstance("HmacSHA1");
            SecretKeySpec keySpec = new SecretKeySpec(SECRET_KEY.getBytes(), "HmacSHA1");
            mac.init(keySpec);
            mac.update(source.getBytes());
            sign = Base64.encodeToString(mac.doFinal(), 2);
        } catch (NoSuchAlgorithmException | InvalidKeyException e) {
            e.printStackTrace();
        }

        System.out.println("生成签名串:" + sign);
        return sign;
    }

到这里我们就获得了一个完整的签名串,接下来就是本文的重点点部分了,网络请求和网络解析

chunk分块传输编码

这里由于腾讯云采用了http chunk协议返回,不同于常规的http诸如json返回,采用多段分片返回数据的方式。消息体由数量未定的块组成,并以最后一个大小为0的块为结束。

每一个非空的块都以该块包含数据的字节数(字节数16进制以表示)开始,跟随一个CRLF (回车及换行),然后是数据本身,最后块CRLF结束。在一些实现中,块大小和CRLF之间填充有白空格(0x20)。

最后一块是单行,由块大小(0),一些可选的填充白空格,以及CRLF。最后一块不再包含任何数据,但是可以发送可选的尾部,包括消息头字段。

消息最后以CRLF结尾。一个完整的chunk返回示例如下:

代码语言:javascript
复制
HTTP/1.1 200 OK
Content-Type: text/plain
Transfer-Encoding: chunked

25
This is the data in the first chunk

1C
and this is the second one

3
con

8
sequence

0

如果对chunk协议希望有一个完整的了解,可以参考这篇wiki:分块传输编码

请求TTS数据

代码如下,我们直接获取返回数据数据流管道,用于数据读取

代码语言:javascript
复制
private static InputStream obtainResponseStreamWithJava(String postJsonBody, TreeMap<String, String> requestMap) throws IOException {
        //发送POST请求
        URL url = new URL(SERVER_URL);
        HttpURLConnection conn = (HttpURLConnection) url.openConnection();
        String authorization = generateSign(requestMap);
        conn.setRequestMethod("POST");
        conn.setRequestProperty("Content-Type", "application/json");
        conn.setRequestProperty("Authorization", authorization);
        conn.connect();
        OutputStream out = conn.getOutputStream();
        out.write(postJsonBody.getBytes("UTF-8"));
        out.flush();
        out.close();
        if (conn.getResponseCode() != HttpURLConnection.HTTP_OK) {//todo
            Log.w(TAG, "HTTP Code: " + conn.getResponseCode());
        }
//        String result = new String(toByteArray(conn.getInputStream()), "UTF-8");
        InputStream inputStream = conn.getInputStream();
        return inputStream;
    }

OPUS

根据官网的文档得知,数据分为两种,opus压缩和pcm原始音频流,题主了解到opus拥有较好的压缩比(10:1),可以很好的节省传输时间和网络带宽。

opus是开源库,但是是用C++编写的,由于Android5.0以上才支持opus格式的播发,所以如果需要兼容5.0的系统,需要编译so库。opus源码地址

TTS数据解析

这里主要参考官网的java示例,循环读取数据,按以下格式说明不断读取头/序号/长度/音频数据,直到到达数据末尾。

tts分片格式
tts分片格式

代码示例如下:

代码语言:javascript
复制
private void processProtocolBufferStream(final InputStream inputStream) throws DeserializationException {
            final long start = System.currentTimeMillis();

            YoutuOpusDecoder decoder = null;

            List<PcmData> pcmCache = new ArrayList<>();
            boolean fillSuccess;
            int pbPkgCount = -1;

            while (!Thread.currentThread().isInterrupted()) {
                pbPkgCount++;
                try {
                    //read head
                    byte[] headBuffer = new byte[4];
                    fillSuccess = fill(inputStream, headBuffer);
                    if (!fillSuccess) {
                        throw new ReadBufferException(String.format("read PB pkg#%s size header fail, break;", pbPkgCount));
                    }
                    //read seq
                    byte[] seqBuffer = new byte[4];
                    fillSuccess = fill(inputStream, seqBuffer);
                    if (!fillSuccess) {
                        throw new ReadBufferException(String.format("read PB pkg#%s size header fail, break;", pbPkgCount));
                    }
                    int seq = bytesToInt(seqBuffer);
                    //read pkg size
                    byte[] pbPkgSizeHeader = new byte[4];
                    fillSuccess = fill(inputStream, pbPkgSizeHeader);
                    if (!fillSuccess) {
                        throw new ReadBufferException(String.format("read PB pkg#%s size header fail, break;", pbPkgCount));
                    }
                    int pbPkgSize = bytesToInt(pbPkgSizeHeader);
                    Log.i(TAG, String.format("PB pkg#%s size = %s", pbPkgCount, pbPkgSize));
                    if (pbPkgCount == 0) {
                        sTimeEnd = System.currentTimeMillis();
                        sTimeCost = sTimeEnd - sTimeStart;
                    }
                    if (pbPkgSize <= 0) {
                        throw new ReadBufferException(String.format("PB pkg#%s size %s <= 0, break;", pbPkgCount, pbPkgSize));
                    } else if (pbPkgSize > 5000) {
                        throw new ReadBufferException(String.format("PB pkg#%s size %s > 5000 bytes, too large, break;", pbPkgCount, pbPkgSize));
                    }

                    //read pb pkg
                    byte[] pbPkg = new byte[pbPkgSize];
                    fillSuccess = fill(inputStream, pbPkg);
                    if (!fillSuccess) {
                        throw new ReadBufferException(String.format("read PB pkg#%s fail, break;", pbPkgCount));
                    }

                    //init decoder
                    if (decoder == null) {
                        decoder = new YoutuOpusDecoder();
                        decoder.config();
                    }
                    //decode
                    Log.i("DEBUG-1", "seq:" + seq);
                    Pair<Integer, short[]> pair = decoder.decodeTTSData(seq, pbPkg);
                    short[] pcm = pair.second;

                    Log.d(TAG, (pcm == null ? "fail decode #" : "decode #") + pbPkgCount);

                    //packaging pcm
                    if (pcm == null) {
                        pcm = new short[0];
                    }
                    PcmData pcmData = new PcmData(pcm, seq == -1);

                    //stop check
                    if (Thread.currentThread().isInterrupted()) {
                        Log.w(TAG, "pcm data ready, but thread is interrupted, break;");
                        break;
                    }

                    //init player
                    if (mOpusPlayer == null) {
                        mOpusPlayer = new OpusPlayer();
                        mOpusPlayer.setPcmSampleRate(16000);
                        mOpusPlayer.setUncaughtExceptionHandler(new UncaughtExceptionHandler() {
                            @Override
                            public void uncaughtException(Thread thread, Throwable ex) {
                                if (mTtsExceptionHandler != null) {
                                    mTtsExceptionHandler.onPlayException(thread, ex);
                                }
                            }
                        });
                    }

                    //enqueue
                    if (pbPkgCount < mCacheCount) {//缓冲
                        pcmCache.add(pcmData);
                    } else {//enqueue
                        for (PcmData d : pcmCache) {
                            mOpusPlayer.enqueue(d);
                        }
                        pcmCache.clear();
                        mOpusPlayer.enqueue(pcmData);
                    }

                    //end
                    if (seq == -1) {
                        long ms = System.currentTimeMillis() - start;
                        Log.d(TAG, "finish last pb pkg#" + pbPkgCount + ", total cast time " + ms + " ms");
                        break;
                    }
                } catch (Exception e) {
                    if (mOpusPlayer != null) {
                        mOpusPlayer.forceStop();
                    }
                    if (e instanceof InterruptedIOException) {
                        Log.i(TAG, "Interrupted while reading server response InputStream", e);// 正常流程, 无需抛出异常
                    } else {
                        throw new DeserializationException(e);
                    }
                }
            }
        }

其中,按小端字节读取方式如下:

代码语言:javascript
复制
 /**
     * 从 InputStream 读取内容到 buffer, 直到 buffer 填满
     *
     * @return 如果 InputStream 内容不足以填满 buffer, 则返回 false.
     * @throws IOException 可能抛出的异常
     */
    private static boolean fill(InputStream in, byte[] buffer) throws IOException {
        int length = buffer.length;
        int hasRead = 0;
        while (true) {
            int offset = hasRead;
            int count = length - hasRead;
            int currentRead = in.read(buffer, offset, count);
            if (currentRead >= 0) {
                hasRead += currentRead;
                if (hasRead == length) {
                    return true;
                }
            }
            if (currentRead == -1) {
                return false;
            }
        }
    }

TTS语音播放

TTS完成解析的数据都经由YoutuOpusDecoder类进行播放,此处主要封装了两个功能,第一个功能是封装了AudioTrack播放pcm原始音频,第二个是将解析完成的音频不断送入播放器

完整代码如下:

代码语言:javascript
复制
public class OpusPlayer {
    private static final String TAG = "OpusPlayer";

    private BlockingQueue<PcmData> mPcmQueue = new LinkedBlockingQueue<>();
    private volatile Thread mPlayThread;
    private int mPcmSampleRate;
    private UncaughtExceptionHandler mUncaughtExceptionHandler;

    public void setUncaughtExceptionHandler(UncaughtExceptionHandler handler) {
        mUncaughtExceptionHandler = handler;
    }

    public void setPcmSampleRate(int pcmSampleRate) {
        mPcmSampleRate = pcmSampleRate;
    }
    

    public void enqueue(PcmData pcmData) {
        mPcmQueue.add(pcmData);

        if (mPlayThread == null) {
            mPlayThread = new Thread(new Runnable() {

                PcmPlayer mPlayer;

                @Override
                public void run() {
                    Log.d(TAG, getThreadLogPrefix() + "start");
                    int playerPrepareFailCount = 0;
                    int playCount = 0;
                    long start = System.currentTimeMillis();

                    while (!Thread.currentThread().isInterrupted()) {
                        
                        //准备播放器
                        boolean isPlayerReady = preparePlayerIfNeeded();
                        if (!isPlayerReady) {
                            releasePlayer();
                            playerPrepareFailCount++;
                            if (playerPrepareFailCount > 5) {
                                releasePlayer();
                                throw new RuntimeException("prepare player fail too many times, abort.");//不再尝试了
                            } else {
                                Log.w(TAG, getThreadLogPrefix() + "prepare player fail, retry.");
                                continue;//再尝试
                            }
                        }

                        //出队
                        PcmData pcmData;
                        try {
                            pcmData = mPcmQueue.take();
                        } catch (InterruptedException e) {
                            e.printStackTrace();
                            Log.d(TAG, getThreadLogPrefix() + "force stop");
                            break;
                        }
                        
                        //播放
                        if (pcmData != null) {
                            try {
                                short[] pcm = pcmData.getPcm();
                                if (pcm != null) {
                                    mPlayer.play(pcm);
                                    Log.d(TAG, getThreadLogPrefix() + "play #" + playCount);
                                } else {
                                    Log.d(TAG, getThreadLogPrefix() + "play #" + playCount + " fail, pcm == null !!");
                                }
                                if (pcmData.isLastOne()) {
                                    Log.d(TAG, getThreadLogPrefix() + "finish all task, will stop");
                                    break;
                                }
                                playCount++;
                            } catch (AudioTrackException e) {
                                e.printStackTrace();
                                releasePlayer();//下一个循环会尝试重新初始化 player
                            }
                        } else {
                            Log.w(TAG, getThreadLogPrefix() + "mPcmQueue.take() == null, nothing to play");
                        }
                    }

                    releasePlayer();
                    long time = System.currentTimeMillis() - start;
                    Log.d(TAG, getThreadLogPrefix() + "stop, ran " + time + " ms");
                }

                /**
                 * @return true: player is ready
                 */
                boolean preparePlayerIfNeeded() {
                    if (mPlayer == null) {
                        mPlayer = new PcmPlayer();
                        try {
                            mPlayer.prepare(AudioManager.STREAM_MUSIC, mPcmSampleRate, AudioFormat.CHANNEL_OUT_MONO, AudioFormat.ENCODING_PCM_16BIT);
                        } catch (AudioTrackException e) {
                            e.printStackTrace();
                            releasePlayer();
                        }
                    }
                    return mPlayer != null;
                }

                void releasePlayer() {
                    if (mPlayer != null) {
                        mPlayer.release();
                        mPlayer = null;
                    }
                }

            });
            mPlayThread.setPriority(Thread.NORM_PRIORITY - 1);//播放耗时最长, 优先级比解码线程低一点, 可以让出多一点时间给解码线程
            mPlayThread.setName(TAG + ".mPlayThread");
            if (mUncaughtExceptionHandler != null) {
                mPlayThread.setUncaughtExceptionHandler(mUncaughtExceptionHandler);
            }
            mPlayThread.start();
        }
    }

    private static String getThreadLogPrefix() {
        Thread currentThread = Thread.currentThread();
        String s = currentThread.getName() + "#" + currentThread.getId() + ": ";
        return s;
    }
    
    public void forceStop() {
        if (mPlayThread != null && !mPlayThread.isInterrupted()) {
            mPlayThread.interrupt();
            mPlayThread = null;
        }
        mPcmQueue.clear();
    }

    public static class PcmData {
        private final short[] mPcm;
        private final boolean mIsLastOne;

        public PcmData(short[] pcm, boolean isLastOne) {
            mPcm = pcm;
            mIsLastOne = isLastOne;
        }

        short[] getPcm() {
            return mPcm;
        }

        boolean isLastOne() {
            return mIsLastOne;
        }
    }


}

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 腾讯云流式TTS介绍
  • 接口鉴权
  • chunk分块传输编码
  • 请求TTS数据
  • OPUS
  • TTS数据解析
  • TTS语音播放
相关产品与服务
语音合成
语音合成(Text To Speech,TTS)满足将文本转化成拟人化语音的需求,打通人机交互闭环。提供多场景、多语言的音色选择,支持 SSML 标记语言,支持自定义音量、语速等参数,让发音更专业、更符合场景需求。语音合成广泛适用于智能客服、有声阅读、新闻播报、人机交互等业务场景。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档