logo
平台介绍
快速接入
密钥管理
模型列表
计费规则
音色列表
文本转语音
文本转语音介绍
POST
HTTP API 非流式
SSE
HTTP API 流式
WSS
WebSocket API
音色克隆
文生音色
语音识别
多模态理解模型
音乐生成
图片生成
视频生成
语音Agent
自定义Agent
常见问题
工作台
立即登录

语音合成 API - 流式 (TTS-SSE)

基于文本到语音(Text-to-Speech, TTS)的流式语音合成功能,单次请求支持的最大文本长度为 10000 字符,适用于低延迟、边合成边播放的实时语音生成需求。

接口概览

  • 接口地址: https://api.senseaudio.cn/v1/t2a_v2
  • 请求方式: POST
  • Content-Type: application/json
  • 鉴权方式: Bearer Token

请求配置

请求头 (Request Headers)

参数名必填说明示例
Authorization是鉴权 Token。格式:Bearer SENSEAUDIO_API_KEYBearer sk-123456…
Content-Type是内容类型。固定为 application/jsonapplication/json

请求参数 (Request Body)

核心参数

参数名类型必填描述示例值
modelstring是模型名称senseaudio-tts-1.5-260319
textstring是待合成的文本内容。支持中英文,最大 10000 字符。<break time=500>详解见下方停顿符说明你好,<break time=500>世界
streamboolean是流式输出。固定为 true。true
voice_settingobject是音色相关设置。详见下表。{ "voice_id": "…" }
audio_settingobject否音频格式设置。详见下表。{ "sample_rate": 32000 }
dictionaryarray否多音字配置列表。详见下表(模型必须为senseaudio-tts-1.5-260319)[{"original": "好干净","replacement": "[hao4]干净"}]
stream_optionsobject否流式输出配置列表。详见下表{"exclude_aggregated_audio": true}

<break> 停顿符说明

<break> 用于在语音合成中插入停顿。

xml
复制
<break time=500>
  • time 单位为毫秒(ms)
  • 500 表示停顿 500 毫秒
  • 最小值为 100 毫秒,最大值无限制

示例:

text
复制
你好<break time=500>欢迎使用我们的服务

voice_setting (音色设置)

参数名类型必填描述默认值取值范围
voice_idstring是音色 ID。请参考 音色服务说明。--
speedfloat否语速调节。1.0[0.5, 2.0]
volfloat否音量调节。1.0[0.01, 10.0]
pitchint否音调调节。0[-12, 12]
latex_readboolean否数学公式朗读,支持 LaTeX、MathML、Unicode 数学符号等格式。(会产生额外的性能损耗)false-

audio_setting (音频设置)

参数名类型必填描述默认值选项
formatstring否音频编码格式。"mp3"mp3, wav, pcm, flac
sample_rateint否音频采样率 (Hz)。320008000, 16000, 22050, 24000, 32000, 44100
bitrateint否比特率 (仅 MP3)。12800032000, 64000, 128000, 256000
channelint否声道数。21 (单声道), 2 (双声道)

dictionary (多音字纠正)

参数名类型必填描述默认值示例
originalstring是原始文本。无好干净
replacementstring是多音字配置。无[hao4]干净

stream_options(流式输出配置)

参数名类型必填描述默认值示例
exclude_aggregated_audioboolean否是否排除聚合音频。falsetrue/false

响应结构

响应使用 SSE (Server-Sent Events) 格式,Content-Type 为 text/event-stream; charset=utf-8。

每个数据块以 data: 开头,后跟 JSON 对象。

响应参数

参数名类型说明
dataobject返回的合成数据对象,可能为 null,需进行非空判断
data.audiostring合成后的音频数据,采用 hex 编码,格式与请求中指定的输出格式一致
data.statusint64当前音频流状态:1 表示合成中,2 表示合成结束
extra_infoobject音频的附加信息。流式返回时只有最后一个 chunk 会返回
extra_info.audio_lengthint64音频时长(毫秒)
extra_info.audio_sample_rateint64音频采样率
extra_info.audio_sizeint64音频文件大小(字节)
extra_info.bitrateint64音频比特率
extra_info.audio_formatstring生成音频文件的格式。取值范围:mp3, pcm, flac, wav
extra_info.audio_channelint生成音频声道数。1:单声道,2:双声道
extra_info.word_countint64字数:按 grapheme cluster 统计合成文本内容,且排除纯空白/标点/控制符的簇
extra_info.usage_charactersint64字符数:按 Unicode 码点统计合成文本内容
trace_idstring链路追踪 ID
base_respobject本次请求的状态码和详情
base_resp.status_codeint64状态码(HTTP status code)
base_resp.status_msgstring状态详情

流式响应示例

plaintext
复制
data: {"data":{"audio":"49443304...","status":1},"extra_info":null,"trace_id":"69c20e38c8761996a85d57881fe4d817","base_resp":{"status_code":0,"status_msg":""}} data: {"data":{"audio":"fffb9864...","status":1},"extra_info":null,"trace_id":"69c20e38c8761996a85d57881fe4d817","base_resp":{"status_code":0,"status_msg":""}} data: {"data":{"audio":"fffb9864...","status":2},"extra_info":{"audio_length":2306,"audio_sample_rate":32000,"audio_size":36908,"bitrate":128000,"audio_format":"mp3","audio_channel":2,"word_count":24,"usage_characters":30},"trace_id":"69c20e38c8761996a85d57881fe4d817","base_resp":{"status_code":0,"status_msg":"success"}}

代码示例

CURL

bash
复制
curl -X POST https://api.senseaudio.cn/v1/t2a_v2 \ -H "Authorization: Bearer $SENSEAUDIO_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "senseaudio-tts-1.5-260319", "text": "这是一个流式输出的例子。", "stream": true, "voice_setting": { "voice_id": "male_0004_a", "latex_read":false }, "stream_options": { "exclude_aggregated_audio": true } }'

Python

python
复制
import requests import json API_URL = "https://api.senseaudio.cn/v1/t2a_v2" HEADERS = { "Authorization": "Bearer SENSEAUDIO_API_KEY", "Content-Type": "application/json" } # 流式合成 (推荐用于长文本或实时场景) def tts_stream(): payload = { "model": "senseaudio-tts-1.5-260319", "text": "这是一个流式输出的例子。", "stream": True, "voice_setting": { "voice_id": "male_0004_a", "latex_read": False }, "stream_options": { "exclude_aggregated_audio": True } } with requests.post(API_URL, json=payload, headers=HEADERS, stream=True) as r: with open("stream_output.mp3", "wb") as f: for line in r.iter_lines(): if line: # 去掉 "data: " 前缀 line_str = line.decode('utf-8') if line_str.startswith("data: "): line_str = line_str[6:] resp = json.loads(line_str) if "data" in resp and "audio" in resp["data"]: f.write(bytes.fromhex(resp["data"]["audio"])) print("流式合成完成") if __name__ == "__main__": tts_stream()

JavaScript

javascript
复制
const axios = require('axios'); const fs = require('fs'); const API_URL = 'https://api.senseaudio.cn/v1/t2a_v2'; const HEADERS = { 'Authorization': 'Bearer SENSEAUDIO_API_KEY', 'Content-Type': 'application/json' }; async function ttsStream() { try { const payload = { model: 'senseaudio-tts-1.5-260319', text: '这是一个流式输出的例子。', stream: true, voice_setting: { voice_id: 'male_0004_a', latex_read: false }, stream_options: { exclude_aggregated_audio: true } }; const res = await axios.post(API_URL, payload, { headers: HEADERS, responseType: 'stream' }); const writeStream = fs.createWriteStream('stream_output.mp3'); res.data.on('data', (chunk) => { const lines = chunk.toString().split('\n'); for (const line of lines) { if (line.startsWith('data: ')) { try { const json = JSON.parse(line.slice(6)); if (json.data && json.data.audio) { writeStream.write(Buffer.from(json.data.audio, 'hex')); } } catch (e) {} } } }); res.data.on('end', () => { writeStream.end(); console.log('流式合成完成'); }); } catch (err) { console.error('请求异常:', err.message); } } ttsStream();

Go

go
复制
package main import ( "bufio" "bytes" "encoding/hex" "encoding/json" "fmt" "net/http" "os" "strings" ) const ( APIURL = "https://api.senseaudio.cn/v1/t2a_v2" SENSEAUDIO_API_KEY = "SENSEAUDIO_API_KEY" ) type TTSRequest struct { Model string `json:"model"` Text string `json:"text"` Stream bool `json:"stream"` VoiceSetting VoiceSetting `json:"voice_setting"` StreamOptions StreamOptions `json:"stream_options"` } type StreamOptions struct { ExcludeAggregatedAudio bool `json:"exclude_aggregated_audio"` } type VoiceSetting struct { VoiceID string `json:"voice_id"` LatexRead bool `json:"latex_read"` } type SSEResponse struct { Data struct { Audio string `json:"audio"` Status int `json:"status"` } `json:"data"` BaseResp struct { StatusCode int `json:"status_code"` StatusMessage string `json:"status_msg"` } `json:"base_resp"` } func main() { payload := TTSRequest{ Model: "senseaudio-tts-1.5-260319", Text: "这是一个流式输出的例子。", Stream: true, VoiceSetting: VoiceSetting{ VoiceID: "male_0004_a", LatexRead: false, }, StreamOptions: StreamOptions{ ExcludeAggregatedAudio: true, }, } jsonData, _ := json.Marshal(payload) req, _ := http.NewRequest("POST", APIURL, bytes.NewBuffer(jsonData)) req.Header.Set("Authorization", "Bearer "+SENSEAUDIO_API_KEY) req.Header.Set("Content-Type", "application/json") client := &http.Client{} resp, err := client.Do(req) if err != nil { fmt.Println("请求失败:", err) return } defer resp.Body.Close() file, _ := os.Create("stream_output.mp3") defer file.Close() scanner := bufio.NewScanner(resp.Body) for scanner.Scan() { line := scanner.Text() if strings.HasPrefix(line, "data: ") { var result SSEResponse json.Unmarshal([]byte(line[6:]), &result) if result.Data.Audio != "" { audioData, _ := hex.DecodeString(result.Data.Audio) file.Write(audioData) } } } fmt.Println("流式合成完成") }

Java

java
复制
import java.io.*; import java.net.HttpURLConnection; import java.net.URL; import org.json.JSONObject; public class SenseAudioTTSStream { private static final String API_URL = "https://api.senseaudio.cn/v1/t2a_v2"; private static final String SENSEAUDIO_API_KEY = "SENSEAUDIO_API_KEY"; public static void main(String[] args) { try { // 构建请求体 JSONObject voiceSetting = new JSONObject(); voiceSetting.put("voice_id", "male_0004_a"); voiceSetting.put("latex_read", false); JSONObject payload = new JSONObject(); payload.put("model", "senseaudio-tts-1.5-260319"); payload.put("text", "这是一个流式输出的例子。"); payload.put("stream", true); payload.put("voice_setting", voiceSetting); payload.put("stream_options", new JSONObject().put("exclude_aggregated_audio", true)); // 发送请求 URL url = new URL(API_URL); HttpURLConnection conn = (HttpURLConnection) url.openConnection(); conn.setRequestMethod("POST"); conn.setRequestProperty("Authorization", "Bearer " + SENSEAUDIO_API_KEY); conn.setRequestProperty("Content-Type", "application/json"); conn.setDoOutput(true); try (OutputStream os = conn.getOutputStream()) { byte[] input = payload.toString().getBytes("utf-8"); os.write(input, 0, input.length); } // 读取 SSE 流式响应 try (BufferedReader br = new BufferedReader( new InputStreamReader(conn.getInputStream(), "utf-8")); FileOutputStream fos = new FileOutputStream("stream_output.mp3")) { String line; while ((line = br.readLine()) != null) { if (line.startsWith("data: ")) { String jsonStr = line.substring(6); JSONObject result = new JSONObject(jsonStr); if (result.has("data")) { JSONObject data = result.getJSONObject("data"); if (data.has("audio")) { String audioHex = data.getString("audio"); // 手动解析 hex 字符串 byte[] audioData = new byte[audioHex.length() / 2]; for (int i = 0; i < audioData.length; i++) { int index = i * 2; int val = Integer.parseInt(audioHex.substring(index, index + 2), 16); audioData[i] = (byte) val; } fos.write(audioData); } } } } System.out.println("流式合成完成"); } } catch (Exception e) { System.out.println("请求异常: " + e.getMessage()); e.printStackTrace(); } } }