CosyVoice V3 Flash API 接口、参数 & 代码示例

tongyi/cosyvoice-v3-flash

CosyVoice V3 Flash 是阿里巴巴通义实验室（FunAudioLLM 团队）推出的新一代轻量级、超低延迟的大语言模型语音合成（TTS）与声音克隆模型。作为 CosyVoice 3.0 家族中的重要一员，“Flash” 版本核心主打极致的响应速度（低延迟）与高性价比的商业化落地。

模型 ID: tongyi/cosyvoice-v3-flash
模型系列: TongYi
更新日期: 2026-06-27
模型能力: 语音合成
模型价格（每千字符）: ¥ 0.15
默认音色: longanyang
默认音频格式: mp3

CosyVoice V3 Flash 模型介绍：

CosyVoice V3 Flash 是阿里巴巴通义实验室（FunAudioLLM 团队）推出的新一代轻量级、超低延迟的大语言模型语音合成（TTS）与声音克隆模型。

作为 CosyVoice 3.0 家族中的重要一员，“Flash” 版本核心主打极致的响应速度（低延迟）与高性价比的商业化落地。以下是该模型的核心特性与架构亮点：

核心特性

极致低延迟（双流式双向传输） 支持“文本输入流式处理”与“音频输出流式生成”（Bi-Streaming），端到端延迟低至 150ms 左右。这使得它非常适合需要即时对答的 AI 智能体（AI Agent）、同声传译和实时客服场景。
零样本声音克隆（Zero-Shot Cloning） 无需针对新说话人进行繁琐的微调。只需提供一段 3 到 10 秒的参考音频（甚至只需一个音频 URL），模型就能极高相似度地复刻该说话人的音色、情感和语调，并用其朗读全新的文本。
强大的多语言与跨语言能力
核心语言： 官方原生支持中、英、日、韩、德、西、法、意、俄等 9 种主流语言。
方言覆盖： 支持粤语、闽南语、四川话、东北话、上海话、天津话、山东话等 18 种以上的中国地方方言与口音。
跨语言克隆： 即使参考音频说的是中文，克隆出的声音也能流畅、自然地朗读英文或日文，且保持原本的音色特征。
富情感与精细化控制（Instruct 支持） 模型具备优秀的指令遵循能力。用户可以通过指令调整生成音频的语言、方言、情绪（如喜悦、愤怒、悲伤）、语速和音量。此外，它能自然地在合成语音中加入呼吸声、笑声或停顿，听起来更像真人。

关键技术升级

CosyVoice V3 相比前代之所以有质的飞跃，主要得益于以下架构和训练的优化：

多任务语音 Tokenizer（多维语义理解） 早期的 TTS 模型只学习发音。CosyVoice V3 的语音离散编码器（Tokenizer）是在语音识别、情绪检测、语种识别和说话人分析等多任务上共同训练的。这意味着它提取的“语音 Token”不仅包含“说了什么”，还包含了“怎么说”的情感与风格。
DiffRO（可微奖励优化） 引入了基于强化学习的 DiffRO 算法。通过直接在语音 Token 层面施加奖励（如字准率、情感匹配度），有效解决了传统大模型 TTS 容易出现的漏字、错字、多字及多音字发音错误的问题。
拼音/音标修复（Pronunciation Inpainting） 针对生僻字或特定专业术语，支持文本与拼音（或英文 CMU 音标）混合输入。用户可以通过人工干预手段，精准控制某个词的绝对发音。

适用模式与版本划分

在实际 API 调用或开源部署中，通常提供三种工作模式：

Speech 模式（预置精选音色）： 使用官方内置的高质量商业化音色进行快速合成。
Clone 模式（声音克隆）： 提供少量音频样本，即时克隆新声音。
Design 模式（声音特征设计）： 通过调整参数或文本描述，凭空“设计”出一种全新的虚拟声音。

API 接口地址：

https://wcode.net/api/gpt/v1/audio/speech

此 API 接口兼容 OpenAI 的 Text-to-Speech 接口规范，可直接使用 OpenAI 的 SDK 来调用。仅需替换以下配置即可：

base_url 替换为 https://wcode.net/api/gpt/v1

api_key 替换为从 https://platform.wcode.net 获取到的 API Key

具体可参考下方的各编程语言代码示例中的 OpenAI SDK 调用示例。

请求方法：

POST

各编程语言代码示例：

# TODO: 以下代码中的 API_KEY 需要替换，获取 API Key 入口：https://platform.wcode.net
# 响应为音频二进制，可使用 --output speech.mp3 保存文件

curl --request POST 'https://wcode.net/api/gpt/v1/audio/speech' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer API_KEY' \
--output speech.mp3 \
--data '{
    "model": "tongyi/cosyvoice-v3-flash",
    "input": "你好，今天天气怎么样。",
    "voice": "longanyang",
    "response_format": "mp3"
}'

import Foundation

let headers = [
  "Authorization": "Bearer API_KEY",  // TODO: 这里的 API_KEY 需要替换，获取 API Key 入口：https://platform.wcode.net
  "content-type": "application/json"
]
let parameters = [
  "model": "tongyi/cosyvoice-v3-flash",
  "input": "你好，今天天气怎么样。",
  "voice": "longanyang",
  "response_format": "mp3"
] as [String : Any]

let postData = JSONSerialization.data(withJSONObject: parameters, options: [])

let request = NSMutableURLRequest(url: NSURL(string: "https://wcode.net/api/gpt/v1/audio/speech")! as URL,
                                        cachePolicy: .useProtocolCachePolicy,
                                    timeoutInterval: 60.0)
request.httpMethod = "POST"
request.allHTTPHeaderFields = headers
request.httpBody = postData as Data

let session = URLSession.shared
let dataTask = session.dataTask(with: request as URLRequest, completionHandler: { (data, response, error) -> Void in
  if (error != nil) {
    print(error as Any)
  } else if let data = data {
    try? data.write(to: URL(fileURLWithPath: "speech.mp3"))
  }
})

dataTask.resume()

import 'dart:convert';
import 'dart:io';
import 'package:http/http.dart' as http;

Future<void> main() async {
  var headers = {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer API_KEY'  // TODO: 这里的 API_KEY 需要替换，获取 API Key 入口：https://platform.wcode.net
  };
  var request = http.Request('POST', Uri.parse('https://wcode.net/api/gpt/v1/audio/speech'));
  request.body = json.encode({
    "model": "tongyi/cosyvoice-v3-flash",
    "input": "你好，今天天气怎么样。",
    "voice": "longanyang",
    "response_format": "mp3"
  });
  request.headers.addAll(headers);

  http.StreamedResponse response = await request.send();

  if (response.statusCode == 200) {
    var bytes = await response.stream.toBytes();
    File('speech.mp3').writeAsBytesSync(bytes);
  }
  else {
    print(response.reasonPhrase);
  }
}

require 'uri'
require 'net/http'

url = URI("https://wcode.net/api/gpt/v1/audio/speech")

http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true

request = Net::HTTP::Post.new(url)
request["Authorization"] = 'Bearer API_KEY'  # TODO: 这里的 API_KEY 需要替换，获取 API Key 入口：https://platform.wcode.net
request["content-type"] = 'application/json'
request.body = "{\"model\":\"tongyi/cosyvoice-v3-flash\",\"input\":\"你好，今天天气怎么样。\",\"voice\":\"longanyang\",\"response_format\":\"mp3\"}"

response = http.request(request)
File.write("speech.mp3", response.body) if response.is_a?(Net::HTTPSuccess)

use serde_json::json;
use reqwest;
use std::fs;

#[tokio::main]
pub async fn main() {
  let url = "https://wcode.net/api/gpt/v1/audio/speech";

  let payload = json!({
    "model": "tongyi/cosyvoice-v3-flash",
    "input": "你好，今天天气怎么样。",
    "voice": "longanyang",
    "response_format": "mp3"
  });

  let mut headers = reqwest::header::HeaderMap::new();
  headers.insert("Authorization", "Bearer API_KEY".parse().unwrap());  // TODO: 这里的 API_KEY 需要替换，获取 API Key 入口：https://platform.wcode.net
  headers.insert("content-type", "application/json".parse().unwrap());

  let client = reqwest::Client::new();
  let response = client.post(url)
    .headers(headers)
    .json(&payload)
    .send()
    .await
    .unwrap();

  let audio_bytes = response.bytes().await.unwrap();
  fs::write("speech.mp3", audio_bytes).unwrap();
}

#include <stdio.h>
#include <curl/curl.h>

static size_t write_callback(void *contents, size_t size, size_t nmemb, void *userp) {
  return fwrite(contents, size, nmemb, (FILE *)userp);
}

CURL *hnd = curl_easy_init();

curl_easy_setopt(hnd, CURLOPT_CUSTOMREQUEST, "POST");
curl_easy_setopt(hnd, CURLOPT_URL, "https://wcode.net/api/gpt/v1/audio/speech");

struct curl_slist *headers = NULL;
headers = curl_slist_append(headers, "Authorization: Bearer API_KEY");  // TODO: 这里的 API_KEY 需要替换，获取 API Key 入口：https://platform.wcode.net
headers = curl_slist_append(headers, "content-type: application/json");
curl_easy_setopt(hnd, CURLOPT_HTTPHEADER, headers);

curl_easy_setopt(hnd, CURLOPT_POSTFIELDS, "{\"model\":\"tongyi/cosyvoice-v3-flash\",\"input\":\"你好，今天天气怎么样。\",\"voice\":\"longanyang\",\"response_format\":\"mp3\"}");

FILE *fp = fopen("speech.mp3", "wb");
curl_easy_setopt(hnd, CURLOPT_WRITEFUNCTION, write_callback);
curl_easy_setopt(hnd, CURLOPT_WRITEDATA, fp);

CURLcode ret = curl_easy_perform(hnd);

fclose(fp);
curl_easy_cleanup(hnd);
curl_slist_free_all(headers);

package main

import (
  "fmt"
  "os"
  "strings"
  "net/http"
  "io"
)

func main() {
  url := "https://wcode.net/api/gpt/v1/audio/speech"

  payload := strings.NewReader("{\"model\":\"tongyi/cosyvoice-v3-flash\",\"input\":\"你好，今天天气怎么样。\",\"voice\":\"longanyang\",\"response_format\":\"mp3\"}")

  req, _ := http.NewRequest("POST", url, payload)

  req.Header.Add("Authorization", "Bearer API_KEY")  // TODO: 这里的 API_KEY 需要替换，获取 API Key 入口：https://platform.wcode.net
  req.Header.Add("content-type", "application/json")

  res, _ := http.DefaultClient.Do(req)

  defer res.Body.Close()
  body, _ := io.ReadAll(res.Body)

  os.WriteFile("speech.mp3", body, 0644)
  fmt.Println(res.Status)
}

using System.Net.Http.Headers;


var client = new HttpClient();

var request = new HttpRequestMessage(HttpMethod.Post, "https://wcode.net/api/gpt/v1/audio/speech");

request.Headers.Add("Authorization", "Bearer API_KEY");  // TODO: 这里的 API_KEY 需要替换，获取 API Key 入口：https://platform.wcode.net

request.Content = new StringContent("{\"model\":\"tongyi/cosyvoice-v3-flash\",\"input\":\"你好，今天天气怎么样。\",\"voice\":\"longanyang\",\"response_format\":\"mp3\"}", null, "application/json");

var response = await client.SendAsync(request);

response.EnsureSuccessStatusCode();

await File.WriteAllBytesAsync("speech.mp3", await response.Content.ReadAsByteArrayAsync());

var client = new RestClient("https://wcode.net/api/gpt/v1/audio/speech");

var request = new RestRequest("", Method.Post);

request.AddHeader("Authorization", "Bearer API_KEY");  // TODO: 这里的 API_KEY 需要替换，获取 API Key 入口：https://platform.wcode.net

request.AddHeader("content-type", "application/json");

request.AddParameter("application/json", "{\"model\":\"tongyi/cosyvoice-v3-flash\",\"input\":\"你好，今天天气怎么样。\",\"voice\":\"longanyang\",\"response_format\":\"mp3\"}", ParameterType.RequestBody);

var response = client.Execute(request);

if (response.IsSuccessful && response.RawBytes != null)
    await File.WriteAllBytesAsync("speech.mp3", response.RawBytes);

const axios = require('axios');
const fs = require('fs');

let data = JSON.stringify({
  "model": "tongyi/cosyvoice-v3-flash",
  "input": "你好，今天天气怎么样。",
  "voice": "longanyang",
  "response_format": "mp3"
});

let config = {
  method: 'post',
  maxBodyLength: Infinity,
  url: 'https://wcode.net/api/gpt/v1/audio/speech',
  responseType: 'arraybuffer',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer API_KEY'  // TODO: 这里的 API_KEY 需要替换，获取 API Key 入口：https://platform.wcode.net
  },
  data : data
};

axios.request(config).then((response) => {
  fs.writeFileSync('speech.mp3', response.data);
}).catch((error) => {
  console.log(error);
});

import java.nio.file.Files;
import java.nio.file.Path;

OkHttpClient client = new OkHttpClient();

MediaType mediaType = MediaType.parse("application/json");

RequestBody body = RequestBody.create(mediaType, "{\"model\":\"tongyi/cosyvoice-v3-flash\",\"input\":\"你好，今天天气怎么样。\",\"voice\":\"longanyang\",\"response_format\":\"mp3\"}");

Request request = new Request.Builder()
  .url("https://wcode.net/api/gpt/v1/audio/speech")
  .post(body)
  .addHeader("Authorization", "Bearer API_KEY")  // TODO: 这里的 API_KEY 需要替换，获取 API Key 入口：https://platform.wcode.net
  .addHeader("content-type", "application/json")
  .build();

try (Response response = client.newCall(request).execute()) {
  if (response.isSuccessful() && response.body() != null) {
    Files.write(Path.of("speech.mp3"), response.body().bytes());
  }
}

$client = new \GuzzleHttp\Client();

$headers = [
  'Content-Type' => 'application/json',
  'Authorization' => 'Bearer API_KEY',  // TODO: 这里的 API_KEY 需要替换，获取 API Key 入口：https://platform.wcode.net
];

$body = '{
  "model": "tongyi/cosyvoice-v3-flash",
  "input": "你好，今天天气怎么样。",
  "voice": "longanyang",
  "response_format": "mp3"
}';

$request = new \GuzzleHttp\Psr7\Request('POST', 'https://wcode.net/api/gpt/v1/audio/speech', $headers, $body);

$response = $client->sendAsync($request)->wait();

file_put_contents('speech.mp3', $response->getBody()->getContents());

$curl = curl_init();

curl_setopt_array($curl, [
  CURLOPT_URL => "https://wcode.net/api/gpt/v1/audio/speech",
  CURLOPT_RETURNTRANSFER => true,
  CURLOPT_ENCODING => "",
  CURLOPT_MAXREDIRS => 5,
  CURLOPT_TIMEOUT => 300,
  CURLOPT_CUSTOMREQUEST => "POST",
  CURLOPT_POSTFIELDS => json_encode([
    'model' => 'tongyi/cosyvoice-v3-flash',
    'input' => '你好，今天天气怎么样。',
    'voice' => 'longanyang',
    'response_format' => 'mp3',
  ]),
  CURLOPT_HTTPHEADER => [
    "Authorization: Bearer API_KEY",  // TODO: 这里的 API_KEY 需要替换，获取 API Key 入口：https://platform.wcode.net
    "content-type: application/json",
  ],
]);

$response = curl_exec($curl);
$error = curl_error($curl);

curl_close($curl);

if ($error) {
  echo "cURL Error #:" . $error;
} else {
  file_put_contents('speech.mp3', $response);
}

import requests

url = "https://wcode.net/api/gpt/v1/audio/speech"

payload = {
  "model": "tongyi/cosyvoice-v3-flash",
  "input": "你好，今天天气怎么样。",
  "voice": "longanyang",
  "response_format": "mp3"
}

headers = {
  "Authorization": "Bearer API_KEY",  # TODO: 这里的 API_KEY 需要替换，获取 API Key 入口：https://platform.wcode.net
  "content-type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

with open("speech.mp3", "wb") as f:
  f.write(response.content)

from openai import OpenAI

client = OpenAI(
  base_url="https://wcode.net/api/gpt/v1",
  api_key="API_KEY"  # TODO: 这里的 API_KEY 需要替换，获取 API Key 入口：https://platform.wcode.net
)

response = client.audio.speech.create(
  model="tongyi/cosyvoice-v3-flash",
  input="你好，今天天气怎么样。",
  voice="longanyang",
  response_format="mp3",
)

response.write_to_file("speech.mp3")

重要提示：由于模型架构不同，部分参数可能仅适用于特定的模型。

model（模型 ID）

参数：model
必选，string

语音合成模型 ID。调用时使用模型详情页的模型 ID。

input（输入文本）

参数：input
必选，string

需要进行语音合成的文本内容。

bit_rate（比特率）

参数：bit_rate
可选，integer，6 到 510
默认：32

指定 Opus 编码的比特率（kbps）。

仅当 response_format 为 opus 时生效。

sample_rate（采样率）

参数：sample_rate
可选，integer
取值范围：8000 | 16000 | 22050 | 24000 | 44100 | 48000
默认：22050

指定返回音频的采样率（Hz）。

enable_ssml（启用 SSML）

参数：enable_ssml
可选，boolean
默认：false

是否将输入文本按 SSML 语法解析。

voice（音色）

参数：voice
可选，string
默认：longanyang

语音合成所使用的音色 ID。

CosyVoice 音色列表：https://help.aliyun.com/zh/model-studio/cosyvoice-voice-list

stream（流式响应）

参数：stream
可选，boolean
取值范围：true | false
默认：false

是否以流式方式返回音频数据。

当前接口暂不支持流式响应；请勿设置 stream: true，否则将返回错误。

seed（随机种子）

参数：seed
可选，integer，0 到 65535
默认：0

控制合成结果的随机性；相同 seed 与参数组合可复现相同音频。

volume（音量）

参数：volume
可选，integer，0 到 100
默认：50

控制合成语音的音量大小。

pitch（音调）

参数：pitch
可选，float，0.5 到 2.0
默认：1.0

控制合成语音的音调。

speed（语速）

参数：speed
可选，float，0.5 到 2.0
默认：1.0

控制合成语音的语速。

response_format（音频格式）

参数：response_format
可选，string
取值范围：mp3 | pcm | wav | opus
默认：mp3

指定返回音频的编码格式。

以上文档为标准版 API 接口文档，可直接用于项目开发和系统调用。如果标准版 API 接口无法满足您的需求，需要定制开发 API 接口，请联系我们的 IT 技术支持工程师：

（沟通需求✅ → 确认技术方案✅ → 沟通费用与工期✅ → 开发&测试✅ → 验收交付✅ → 维护升级✅）

最受关注模型

2026-06-27

DeepSeek V4 Pro

文本生成、深度思考

2026-06-27

DeepSeek V4 Flash

文本生成、深度思考

2026-05-01

XiaoMi MiMo V2.5 Pro

文本生成、深度思考

2026-04-23

Tencent Hunyuan Hy3 Preview

文本生成、深度思考

2026-05-01

XiaoMi MiMo V2.5

文本生成、深度思考

Kimi K3

文本生成、深度思考、工具调用

2026-07-14

KAT Coder Pro V2.5

文本生成、代码补全

2026-07-09

Doubao Seedream 5.0 Pro

图片生成

2026-07-08

Qwen3 Rerank

文本重排序

2026-07-08

Tongyi GTE Rerank V2

文本重排序

向量化模型

2026-04-20

GLM Embedding 3

文本向量化

2025-10-28

Qwen3 Embedding 8B

文本嵌入、文本向量化

2025-05-15

Doubao Embedding Large Text 250515

文本向量化

2025-06-05

Qwen Text Embedding V4

文本向量化

2024-04-09

Qwen Text Embedding V1

文本向量化

语音识别模型

2026-07-08

MiMo V2.5 ASR

音频识别

2026-06-27

Fun ASR Flash

语音识别、方言识别

2026-06-27

Qwen3 ASR Flash

语音识别

2026-06-24

GLM ASR 2512

语音识别

语音合成模型

2026-06-28

CosyVoice V3 Plus

语音合成

2026-06-27