语音合成实时语音合成

1. 接入准备
1.1  SDK 获取
实时 TTS Android SDK 以及 Demo 的下载地址：接入 SDK 下载。
1.2 接入须知
该 SDK 需要手机能够连接网络，且 Android 设备 API Level 为16及以上版本。
运行 Demo 必须设置 AppID、SecretID、SecretKey，可在 API 密钥管理 中获取。
1.3 包体说明
实时语音合成SDK 包体增量：18.81KB；aar 包体积：46KB。
2. SDK 集成说明
2.1 添加 SDK
将下载的 SDK 文件复制到项目的 libs 文件夹中，并在项目的 build.gradle 文件中添加以下代码：
dependencies {
    implementation fileTree(include: ['*.jar', '*.aar'], dir: 'libs')
    // 实时TTS SDK
    implementation files("libs/realtime_tts-release-VERSION.aar")
    // 实时TTS SDK内部依赖的okhttp库（必选）
    implementation 'com.squareup.okhttp3:okhttp:4.9.3'
    // 实时TTS SDK内部依赖的gson库（必选）
    implementation 'com.google.code.gson:gson:2.8.9'
}
2.2 添加权限
在项目的 AndroidManifest.xml 文件中添加 SDK 需要的权限，例如：
<uses-permission android:name="android.permission.INTERNET" />
3. SDK 接口使用说明
本章节主要介绍 SDK 接口调用详细步骤以及接口调用时机（具体使用方法可参考 SDK 交付 Demo 内的示例代码）。
3.1 构造配置项
RealTimeSpeechSynthesizerRequest request = new RealTimeSpeechSynthesizerRequest();
/************** 配置项含义也可参考官网文档: https://cloud.tencent.com/document/product/1073/94308 **************/
request.setVolume(volume); // 音量大小，范围[-10，10]，对应音量大小。默认为0，代表正常音量，值越大音量越高。
request.setSpeed(speed); // 语速，范围：[-2，6]，分别对应不同语速：-2: 代表0.6倍; -1: 代表0.8倍; 0: 代表1.0倍（默认）; 1: 代表1.2倍; 2: 代表1.5倍; 6: 代表2.5倍
request.setCodec("pcm"); // 返回音频格式：pcm: 返回二进制pcm音频（默认）; mp3: 返回二进制mp3音频
request.setSampleRate(SAMPLE_RATE); // 音频采样率：24000: 24k(部分音色支持); 16000: 16k(默认); 8000: 8k
request.setVoiceType(voiceType); // 音色ID
request.setEnableSubtitle(true); // 是否开启时间戳功能，默认为false。
request.setEmotionCategory("neutral");// 控制合成音频的情感，仅支持多情感音色使用
request.setEmotionIntensity(100); // 控制合成音频情感程度，取值范围为 [50,200]，默认为 100; 只有 EmotionCategory 不为空时生效。
request.setSessionId(UUID.randomUUID().toString());//sessionId，需要保持全局唯一（推荐使用 uuid），遇到问题需要提供该值方便服务端排查
request.setText("你好"); //合成语音的源文本，按UTF-8编码统一计算。中文最大支持 600 个汉字（全角标点符号算一个汉字）；英文最大支持 1800 个字母（半角标点符号算一个字母）
request.set(String_key, Object_value);//设置自定义扩展参数(可选，非必须设置)
/************** 配置项含义也可参考官网文档: https://cloud.tencent.com/document/product/1073/94308 **************/
3.2 账号信息配置
// 账号信息获取可参考 0.2接入须知
Credential credential = new Credential(appId, secretId, secretKey, token);
3.3 构造合成器
// 网络连接的代理，全局唯一即可
private static final SpeechClient proxy = new SpeechClient();
...
RealTimeSpeechSynthesizerListener listener = new RealTimeSpeechSynthesizerListener() {
    @Override
    public void onSynthesisStart(SpeechSynthesizerResponse response) {//合成开始}
﻿
    @Override
    public void onSynthesisEnd(SpeechSynthesizerResponse response) {//合成结束}
﻿
    @Override
    public void onAudioResult(ByteBuffer buffer) {//合成的音频数据}
﻿
    @Override
    public void onTextResult(SpeechSynthesizerResponse response) {//合成文本的信息}
﻿
    @Override
    public void onSynthesisCancel() {//合成取消}
﻿
    @Override
    public void onSynthesisFail(SpeechSynthesizerResponse response) {//合成失败}
};
// 构造合成器
RealTimeSpeechSynthesizer synthesizer = new RealTimeSpeechSynthesizer(proxy, credential, request, listener);
3.4 开始合成
synthesizer.start();
3.5 取消合成
// 会直接断开websocket链接
synthesizer.cancel();
4. 接口详情
4.1 RealTimeSpeechSynthesizer
4.1.1 构造接口
public RealTimeSpeechSynthesizer(SpeechClient client,
                                Credential credential,
                                RealTimeSpeechSynthesizerRequest request,
                                RealTimeSpeechSynthesizerListener listener) throws SynthesizerException
功能：实时 TTS SDK 的核心类的构造接口，注意异常捕获。
参数：
参数类型
参数名称
参数含义
SpeechClient
client
网络连接的代理，全局唯一即可
Credential
credential
鉴权相关信息的实体类
RealTimeSpeechSynthesizerRequest
request
语音合成的配置项
RealTimeSpeechSynthesizerListener
listener
语音合成的关键事件回调
4.1.2 启动合成器接口
public void start() throws Exception
功能：启动合成器，注意异常捕获。
4.1.3 取消合成接口
public void cancel()
功能：取消合成，直接关闭 WebSocket 连接，不向服务端发送结束通知。
4.1.4 SDK版本号
public static String version()
4.2 RealTimeSpeechSynthesizerRequest
SDK 配置类，配置项含义请参见 实时语音合成。
类型
名称
含义
默认值
String
text
合成语音的源文本，按 UTF-8 编码统一计算。中文最大支持 600 个汉字（全角标点符号算一个汉字）；英文最大支持 1800 个字母（半角标点符号算一个字母）
""
Float
volume
音量大小，范围[-10，10]，对应音量大小。默认为0，代表正常音量，值越大音量越高。
0
Float
speed
语速，范围：[-2，6]，分别对应不同语速：
-2：代表0.6倍
-1：代表0.8倍
0：代表1.0倍（默认）
1：代表1.2倍
2：代表1.5倍
6：代表2.5倍
0
String
codec
返回音频格式：
pcm：返回二进制 pcm 音频（默认）
mp3：返回二进制 mp3 音频
pcm
Integer
sampleRate
音频采样率：
24000：24k（部分音色支持）
16000：16k（默认）
8000：8k
16000
Integer
voiceType
音色 ID，取值可参见 基础语音合成音色列表。
0
Boolean
enableSubtitle
是否开启时间戳功能，默认为 false。
false
String
emotionCategory
控制合成音频的情感，仅支持多情感音色使用。取值: neutral(中性)、sad(悲伤)、happy(高兴)、angry(生气)、fear(恐惧)、news(新闻)、story(故事)、radio(广播)、poetry(诗歌)、call(客服)、撒娇(sajiao)、厌恶(disgusted)、震惊(amaze)、平静(peaceful)、兴奋(exciting)、傲娇(aojiao)、解说(jieshuo)
""
Integer
emotionIntensity
控制合成音频情感程度，取值范围为 [50,200]，默认为 100；只有 EmotionCategory 不为空时生效。
100
String
sessionId
需要保持全局唯一（推荐使用 UUID），遇到问题需要提供该值方便服务端排查
""
String
fastVoiceType
一句话版声音复刻音色 ID，使用一句话版声音复刻音色时需填写。
""
4.3 Credential
密钥信息实体类，可在 API 密钥管理 中获取。
类型
名称
含义
默认值
String
appid
appid
""
String
secretId
secretId，在控制台申请。
""
String
secretKey
secretKey，在控制台申请。
""
String
token
用于临时授权场景。
""
4.4 RealTimeSpeechSynthesizerListener
合成监听类
public abstract class RealTimeSpeechSynthesizerListener {
    //合成开始
    public abstract void onSynthesisStart(SpeechSynthesizerResponse response);
        //合成结束
    public abstract void onSynthesisEnd(SpeechSynthesizerResponse response);
        //合成的音频数据
    public abstract void onAudioResult(ByteBuffer data);
        //合成文本的信息
    public abstract void onTextResult(SpeechSynthesizerResponse response);
        //合成取消
    public abstract void onSynthesisCancel();
    //合成失败
    public abstract void onSynthesisFail(SpeechSynthesizerResponse response);
}
4.5 SpeechSynthesizerResponse
合成 Response
类型
名称
含义
String
sessionId
由客户端在握手阶段生成并赋值在调用参数中。
Integer
code
状态码，0代表正常，非0值表示发生错误。
Integer
end
该字段返回1时表示文本全部合成结束，客户端收到后需主动关闭 WebSocket 连接。
String
message
错误说明，发生错误时显示这个错误发生的具体原因，随着业务发展或体验优化，此文本可能会经常保持变更或更新。
String
requestId
音频流唯一 ID，由服务端在握手阶段自动生成。
String
messageId
本 message 唯一 ID。
SpeechSynthesizerResult
result
语音合成文本结果。
4.6 SpeechSynthesizerResult
合成结果
类型
名称
含义
SpeechSynthesizerSubtitle[]
subtitles
词列表信息。
4.7 SpeechSynthesizerSubtitle
合成词信息
类型
名称
含义
String
text
文本信息
Integer
beginTime
文本对应 TTS 语音开始时间戳。
Integer
endTime
文本对应 TTS 语音结束时间戳。
Integer
beginIndex
该字在整句中的开始位置。
Integer
endIndex
该字在整句中的结束位置。
String
phoneme
该字的音素。
5. 错误码
错误码
CLIENT_CANNOT_BE_NULL(-500, "client cannot be null")
CREDENTIAL_CANNOT_BE_NULL(-501, "credential cannot be null")
REQUEST_CANNOT_BE_NULL(-502, "request cannot be null")
LISTENER_CANNOT_BE_NULL(-503, "listener cannot be null")
APPID_IS_EMPTY(-504, "appId cannot be empty")
SECRETID_IS_EMPTY(-505, "secretId cannot be empty")
SECRETKEY_IS_EMPTY(-506,"secretKey cannot be empty")
START_SYNTHESIZER_FAIL(-507, "fail to start synthesizer")
SEND_TEXT_FAIL(-508, "fail to send text")
CONNECT_SERVER_FAIL(-509, "fail to connect server")
INCORRECT_STATE(-510, "")  // error msg 视情况而定
6. 代码混淆规则
﻿
-keep class com.tencent.cloud.realtime.tts.RealTimeSpeechSynthesizer { *; }
-keep class com.tencent.cloud.realtime.tts.RealTimeSpeechSynthesizerRequest { *; }
-keep class com.tencent.cloud.realtime.tts.RealTimeSpeechSynthesizerListener { *; }
-keep class com.tencent.cloud.realtime.tts.SpeechSynthesizer** { *; }
-keep class com.tencent.cloud.realtime.tts.core.ws.CommonRequest { *; }
-keepclassmembers class * extends com.tencent.cloud.realtime.tts.core.ws.CommonRequest { *; }
﻿

参数类型	参数名称	参数含义
SpeechClient	client	网络连接的代理，全局唯一即可
Credential	credential	鉴权相关信息的实体类
RealTimeSpeechSynthesizerRequest	request	语音合成的配置项
RealTimeSpeechSynthesizerListener	listener	语音合成的关键事件回调

类型	名称	含义	默认值
String	text	合成语音的源文本，按 UTF-8 编码统一计算。中文最大支持 600 个汉字（全角标点符号算一个汉字）；英文最大支持 1800 个字母（半角标点符号算一个字母）	""
Float	volume	音量大小，范围[-10，10]，对应音量大小。默认为0，代表正常音量，值越大音量越高。	0
Float	speed	语速，范围：[-2，6]，分别对应不同语速： -2：代表0.6倍 -1：代表0.8倍 0：代表1.0倍（默认） 1：代表1.2倍 2：代表1.5倍 6：代表2.5倍	0
String	codec	返回音频格式： pcm：返回二进制 pcm 音频（默认） mp3：返回二进制 mp3 音频	pcm
Integer	sampleRate	音频采样率： 24000：24k（部分音色支持） 16000：16k（默认） 8000：8k	16000
Integer	voiceType	音色 ID，取值可参见基础语音合成音色列表。	0
Boolean	enableSubtitle	是否开启时间戳功能，默认为 false。	false
String	emotionCategory	控制合成音频的情感，仅支持多情感音色使用。取值: neutral(中性)、sad(悲伤)、happy(高兴)、angry(生气)、fear(恐惧)、news(新闻)、story(故事)、radio(广播)、poetry(诗歌)、call(客服)、撒娇(sajiao)、厌恶(disgusted)、震惊(amaze)、平静(peaceful)、兴奋(exciting)、傲娇(aojiao)、解说(jieshuo)	""
Integer	emotionIntensity	控制合成音频情感程度，取值范围为 [50,200]，默认为 100；只有 EmotionCategory 不为空时生效。	100
String	sessionId	需要保持全局唯一（推荐使用 UUID），遇到问题需要提供该值方便服务端排查	""
String	fastVoiceType	一句话版声音复刻音色 ID，使用一句话版声音复刻音色时需填写。	""

实时语音合成

本页目录：

1. 接入准备

1.1 SDK 获取

1.2 接入须知

1.3 包体说明

2. SDK 集成说明

2.1 添加 SDK

2.2 添加权限

3. SDK 接口使用说明

3.1 构造配置项

3.2 账号信息配置

3.3 构造合成器

3.4 开始合成

3.5 取消合成

4. 接口详情

4.1 RealTimeSpeechSynthesizer

4.1.1 构造接口

4.1.2 启动合成器接口

4.1.3 取消合成接口

4.1.4 SDK版本号

4.2 RealTimeSpeechSynthesizerRequest

4.3 Credential

4.4 RealTimeSpeechSynthesizerListener

4.5 SpeechSynthesizerResponse

4.6 SpeechSynthesizerResult

4.7 SpeechSynthesizerSubtitle

5. 错误码

6. 代码混淆规则