概述
DeepSeek 系列模型已接入大模型服务平台 TokenHub,支持 OpenAI Chat Completions 和 Anthropic 两种协议,开发者无需更换 SDK 即可快速接入。本文介绍通用调用示例以及 DeepSeek 特有的思考模式、Function Calling 等核心能力。
前提条件
已注册腾讯云账号并开通 TokenHub 服务。
已在 TokenHub 控制台 获取 API Key。
已根据所用语言安装对应 SDK 或具备 HTTP 请求能力。
支持的模型
模型 ID | 类型 | 思考能力 | 上下文窗口 | 最大输入 | 最大输出 |
deepseek-v4-flash | 通用对话模型 | 可开关(默认开启) | 1M | 1M | 384K |
deepseek-v4-pro | 通用对话模型 | 可开关(默认开启) | 1M | 1M | 384K |
deepseek-v3.2 | 通用对话模型 | 可开关(默认开启) | 128K | 96K | 32K |
说明:
DeepSeek V4-Flash / V4-Pro / V3.2 同时是对话模型和思考模型,无需像其他厂商那样在普通模型和思考模型之间切换 model ID,只需通过
thinking 参数控制是否启用思考能力即可。与其他模型的关键差异
维度 | DeepSeek V4-Flash / V4-Pro / V3.2 | OpenAI / Claude / GLM 等 |
思考能力开关 | 通过 thinking.type 参数显式控制 | 通常通过切换 model 或单独的 reasoning 参数控制 |
推理过程字段 | 响应中独立返回 reasoning_content | 多数模型不暴露推理过程 |
OpenAI SDK 访问推理字段 | 必须用 hasattr / getattr | - |
temperature | 0~2,默认 1,可自由调节 | 默认可在 0~2 自由调节 |
max_tokens 推荐值 | 普通任务 1024~4096;思考模式建议 ≥ 2048 | 通常 1024~4096 即可 |
上下文窗口 | 最高 1M tokens | 通常 128K tokens |
最大输出 | 最高 384K tokens | 通常 16K tokens |
多轮对话 messages 回写 | 只需回写 content,无需回写 reasoning_content | 通常只需回写 content |
快速开始
以下示例展示最简单的单轮对话调用,请将
YOUR_API_KEY 替换为您创建的 API Key。curl https://tokenhub.tencentmaas.com/v1/chat/completions \\-H "Content-Type: application/json" \\-H "Authorization: Bearer YOUR_API_KEY" \\-d '{"model": "deepseek-v4-flash","messages": [{"role": "user", "content": "你好,请介绍一下你自己"}],"max_tokens": 1024}'
Running Environment
Operating System: Ubuntu 24.04.3 LTS / x86_64
Runtime Version: GNU bash, version 5.2.21(1)-release (x86_64-pc-linux-gnu)
# pip install openaifrom openai import OpenAIclient = OpenAI(api_key="YOUR_API_KEY",base_url="https://tokenhub.tencentmaas.com/v1",)response = client.chat.completions.create(model="deepseek-v4-flash",messages=[{"role": "user", "content": "你好,请介绍一下你自己"}],max_tokens=1024,)print(response.choices[0].message.content)
// npm install openaiimport OpenAI from "openai";const client = new OpenAI({apiKey: "YOUR_API_KEY",baseURL: "https://tokenhub.tencentmaas.com/v1",});const response = await client.chat.completions.create({model: "deepseek-v4-flash",messages: [{ role: "user", content: "你好,请介绍一下你自己" }],max_tokens: 1024,});console.log(response.choices[0].message.content);
// 使用 OkHttp,添加依赖:implementation("com.squareup.okhttp3:okhttp:4.12.0")import okhttp3.*;import org.json.*;OkHttpClient httpClient = new OkHttpClient();JSONObject body = new JSONObject();body.put("model", "deepseek-v4-flash");body.put("max_tokens", 1024);JSONArray messages = new JSONArray();JSONObject userMsg = new JSONObject();userMsg.put("role", "user");userMsg.put("content", "你好,请介绍一下你自己");messages.put(userMsg);body.put("messages", messages);Request request = new Request.Builder().url("https://tokenhub.tencentmaas.com/v1/chat/completions").addHeader("Authorization", "Bearer YOUR_API_KEY").addHeader("Content-Type", "application/json").post(RequestBody.create(body.toString(), MediaType.get("application/json"))).build();try (Response response = httpClient.newCall(request).execute()) {JSONObject result = new JSONObject(response.body().string());System.out.println(result.getJSONArray("choices").getJSONObject(0).getJSONObject("message").getString("content"));}
package mainimport ("bytes""encoding/json""fmt""io""net/http")func main() {body := map[string]interface{}{"model": "deepseek-v4-flash","messages": []map[string]string{{"role": "user", "content": "你好,请介绍一下你自己"},},"max_tokens": 1024,}data, _ := json.Marshal(body)req, _ := http.NewRequest("POST","https://tokenhub.tencentmaas.com/v1/chat/completions",bytes.NewBuffer(data))req.Header.Set("Authorization", "Bearer YOUR_API_KEY")req.Header.Set("Content-Type", "application/json")resp, _ := http.DefaultClient.Do(req)defer resp.Body.Close()respBody, _ := io.ReadAll(resp.Body)var result map[string]interface{}json.Unmarshal(respBody, &result)choices := result["choices"].([]interface{})msg := choices[0].(map[string]interface{})["message"].(map[string]interface{})fmt.Println(msg["content"])}
通用调用示例
基础对话
发送单轮对话请求,获取模型回复。
curl https://tokenhub.tencentmaas.com/v1/chat/completions \\-H "Content-Type: application/json" \\-H "Authorization: Bearer YOUR_API_KEY" \\-d '{"model": "deepseek-v4-flash","messages": [{"role": "user", "content": "介绍一下大语言模型"}],"max_tokens": 1024,"thinking": {"type": "disabled"}}'
from openai import OpenAIclient = OpenAI(api_key="YOUR_API_KEY",base_url="https://tokenhub.tencentmaas.com/v1",)response = client.chat.completions.create(model="deepseek-v4-flash",messages=[{"role": "user", "content": "介绍一下大语言模型"}],max_tokens=1024,extra_body={"thinking": {"type": "disabled"}}, # 关闭思考模式,减少 token 消耗)print(response.choices[0].message.content)
import OpenAI from "openai";const client = new OpenAI({apiKey: "YOUR_API_KEY",baseURL: "https://tokenhub.tencentmaas.com/v1",});const response = await client.chat.completions.create({model: "deepseek-v4-flash",messages: [{ role: "user", content: "介绍一下大语言模型" }],max_tokens: 1024,// @ts-ignore - thinking 为 DeepSeek 扩展字段thinking: { type: "disabled" },});console.log(response.choices[0].message.content);
import okhttp3.*;import org.json.*;OkHttpClient httpClient = new OkHttpClient();JSONObject body = new JSONObject();body.put("model", "deepseek-v4-flash");body.put("max_tokens", 1024);JSONArray messages = new JSONArray();JSONObject userMsg = new JSONObject();userMsg.put("role", "user");userMsg.put("content", "介绍一下大语言模型");messages.put(userMsg);body.put("messages", messages);JSONObject thinking = new JSONObject();thinking.put("type", "disabled");body.put("thinking", thinking);Request request = new Request.Builder().url("https://tokenhub.tencentmaas.com/v1/chat/completions").addHeader("Authorization", "Bearer YOUR_API_KEY").addHeader("Content-Type", "application/json").post(RequestBody.create(body.toString(), MediaType.get("application/json"))).build();try (Response response = httpClient.newCall(request).execute()) {JSONObject result = new JSONObject(response.body().string());System.out.println(result.getJSONArray("choices").getJSONObject(0).getJSONObject("message").getString("content"));}
body := map[string]interface{}{"model": "deepseek-v4-flash","messages": []map[string]string{{"role": "user", "content": "介绍一下大语言模型"},},"max_tokens": 1024,"thinking": map[string]string{"type": "disabled"},}// ... 其余请求代码同快速开始示例
流式输出
将
stream 设置为 true 开启 SSE 流式输出,适合长文本生成场景,可有效避免超时,改善用户体验。curl https://tokenhub.tencentmaas.com/v1/chat/completions \\-H "Content-Type: application/json" \\-H "Authorization: Bearer YOUR_API_KEY" \\-d '{"model": "deepseek-v4-flash","messages": [{"role": "user", "content": "写一首关于春天的短诗"}],"max_tokens": 512,"stream": true,"thinking": {"type": "disabled"}}'
from openai import OpenAIclient = OpenAI(api_key="YOUR_API_KEY",base_url="https://tokenhub.tencentmaas.com/v1",)stream = client.chat.completions.create(model="deepseek-v4-flash",messages=[{"role": "user", "content": "写一首关于春天的短诗"}],max_tokens=512,stream=True,extra_body={"thinking": {"type": "disabled"}},)for chunk in stream:if chunk.choices and chunk.choices[0].delta.content:print(chunk.choices[0].delta.content, end="", flush=True)
import OpenAI from "openai";const client = new OpenAI({apiKey: "YOUR_API_KEY",baseURL: "https://tokenhub.tencentmaas.com/v1",});const stream = await client.chat.completions.create({model: "deepseek-v4-flash",messages: [{ role: "user", content: "写一首关于春天的短诗" }],max_tokens: 512,stream: true,// @ts-ignorethinking: { type: "disabled" },});for await (const chunk of stream) {const content = chunk.choices[0]?.delta?.content;if (content) process.stdout.write(content);}
// 流式输出建议使用 OkHttp EventSourceimport okhttp3.*;import okhttp3.sse.*;import org.json.*;OkHttpClient httpClient = new OkHttpClient();JSONObject body = new JSONObject();body.put("model", "deepseek-v4-flash");body.put("max_tokens", 512);body.put("stream", true);JSONArray messages = new JSONArray();JSONObject msg = new JSONObject();msg.put("role", "user");msg.put("content", "写一首关于春天的短诗");messages.put(msg);body.put("messages", messages);body.put("thinking", new JSONObject().put("type", "disabled"));Request request = new Request.Builder().url("https://tokenhub.tencentmaas.com/v1/chat/completions").addHeader("Authorization", "Bearer YOUR_API_KEY").addHeader("Content-Type", "application/json").post(RequestBody.create(body.toString(), MediaType.get("application/json"))).build();EventSources.createFactory(httpClient).newEventSource(request, new EventSourceListener() {@Overridepublic void onEvent(EventSource source, String id, String type, String data) {if ("[DONE]".equals(data)) return;try {JSONObject json = new JSONObject(data);String content = json.getJSONArray("choices").getJSONObject(0).getJSONObject("delta").optString("content", "");if (!content.isEmpty()) System.out.print(content);} catch (JSONException ignored) {}}});
import ("bufio""bytes""encoding/json""fmt""net/http""strings")body := map[string]interface{}{"model": "deepseek-v4-flash","messages": []map[string]string{{"role": "user", "content": "写一首关于春天的短诗"}},"max_tokens": 512,"stream": true,"thinking": map[string]string{"type": "disabled"},}data, _ := json.Marshal(body)req, _ := http.NewRequest("POST","https://tokenhub.tencentmaas.com/v1/chat/completions",bytes.NewBuffer(data))req.Header.Set("Authorization", "Bearer YOUR_API_KEY")req.Header.Set("Content-Type", "application/json")resp, _ := http.DefaultClient.Do(req)defer resp.Body.Close()scanner := bufio.NewScanner(resp.Body)for scanner.Scan() {line := scanner.Text()if !strings.HasPrefix(line, "data: ") || line == "data: [DONE]" {continue}var chunk map[string]interface{}json.Unmarshal([]byte(strings.TrimPrefix(line, "data: ")), &chunk)choices := chunk["choices"].([]interface{})delta := choices[0].(map[string]interface{})["delta"].(map[string]interface{})if content, ok := delta["content"].(string); ok {fmt.Print(content)}}
System Prompt
通过
system 角色消息设置模型的行为指令和背景信息。curl https://tokenhub.tencentmaas.com/v1/chat/completions \\-H "Content-Type: application/json" \\-H "Authorization: Bearer YOUR_API_KEY" \\-d '{"model": "deepseek-v4-flash","messages": [{"role": "system", "content": "你是一位专业的 Python 编程助手,只回答与 Python 相关的问题,回答简洁明了。"},{"role": "user", "content": "如何读取一个 CSV 文件?"}],"max_tokens": 512,"thinking": {"type": "disabled"}}'
from openai import OpenAIclient = OpenAI(api_key="YOUR_API_KEY",base_url="https://tokenhub.tencentmaas.com/v1",)response = client.chat.completions.create(model="deepseek-v4-flash",messages=[{"role": "system","content": "你是一位专业的 Python 编程助手,只回答与 Python 相关的问题,回答简洁明了。",},{"role": "user", "content": "如何读取一个 CSV 文件?"},],max_tokens=512,extra_body={"thinking": {"type": "disabled"}},)print(response.choices[0].message.content)
import OpenAI from "openai";const client = new OpenAI({apiKey: "YOUR_API_KEY",baseURL: "https://tokenhub.tencentmaas.com/v1",});const response = await client.chat.completions.create({model: "deepseek-v4-flash",messages: [{role: "system",content: "你是一位专业的 Python 编程助手,只回答与 Python 相关的问题,回答简洁明了。",},{ role: "user", content: "如何读取一个 CSV 文件?" },],max_tokens: 512,// @ts-ignorethinking: { type: "disabled" },});console.log(response.choices[0].message.content);
JSONObject body = new JSONObject();body.put("model", "deepseek-v4-flash");body.put("max_tokens", 512);body.put("thinking", new JSONObject().put("type", "disabled"));JSONArray messages = new JSONArray();messages.put(new JSONObject().put("role", "system").put("content", "你是一位专业的 Python 编程助手,只回答与 Python 相关的问题,回答简洁明了。"));messages.put(new JSONObject().put("role", "user").put("content", "如何读取一个 CSV 文件?"));body.put("messages", messages);// ... 发送请求代码同上
body := map[string]interface{}{"model": "deepseek-v4-flash","messages": []map[string]string{{"role": "system", "content": "你是一位专业的 Python 编程助手,只回答与 Python 相关的问题,回答简洁明了。"},{"role": "user", "content": "如何读取一个 CSV 文件?"},},"max_tokens": 512,"thinking": map[string]string{"type": "disabled"},}// ... 发送请求代码同快速开始
多轮对话
将历史消息一并传入
messages 数组,即可实现上下文记忆的多轮对话。curl https://tokenhub.tencentmaas.com/v1/chat/completions \\-H "Content-Type: application/json" \\-H "Authorization: Bearer YOUR_API_KEY" \\-d '{"model": "deepseek-v4-flash","messages": [{"role": "user", "content": "我叫小明,我喜欢打篮球"},{"role": "assistant", "content": "你好,小明!打篮球是一项很棒的运动。"},{"role": "user", "content": "你还记得我的名字和爱好吗?"}],"max_tokens": 256,"thinking": {"type": "disabled"}}'
from openai import OpenAIclient = OpenAI(api_key="YOUR_API_KEY",base_url="https://tokenhub.tencentmaas.com/v1",)# 维护对话历史conversation = [{"role": "system", "content": "你是一个友好的 AI 助手。"},]def chat(user_input):conversation.append({"role": "user", "content": user_input})response = client.chat.completions.create(model="deepseek-v4-flash",messages=conversation,max_tokens=1024,extra_body={"thinking": {"type": "disabled"}},)reply = response.choices[0].message.contentconversation.append({"role": "assistant", "content": reply})return replyprint(chat("我叫小明,我喜欢打篮球"))print(chat("你还记得我的名字和爱好吗?"))
import OpenAI from "openai";const client = new OpenAI({apiKey: "YOUR_API_KEY",baseURL: "https://tokenhub.tencentmaas.com/v1",});const conversation = [{ role: "system", content: "你是一个友好的 AI 助手。" },];async function chat(userInput) {conversation.push({ role: "user", content: userInput });const response = await client.chat.completions.create({model: "deepseek-v4-flash",messages: conversation,max_tokens: 1024,// @ts-ignorethinking: { type: "disabled" },});const reply = response.choices[0].message.content;conversation.push({ role: "assistant", content: reply });return reply;}console.log(await chat("我叫小明,我喜欢打篮球"));console.log(await chat("你还记得我的名字和爱好吗?"));
// 多轮对话核心:将 messages 数组累积传入JSONArray messages = new JSONArray();messages.put(new JSONObject().put("role", "system").put("content", "你是一个友好的 AI 助手。"));messages.put(new JSONObject().put("role", "user").put("content", "我叫小明,我喜欢打篮球"));messages.put(new JSONObject().put("role", "assistant").put("content", "你好,小明!打篮球是一项很棒的运动。"));messages.put(new JSONObject().put("role", "user").put("content", "你还记得我的名字和爱好吗?"));JSONObject body = new JSONObject();body.put("model", "deepseek-v4-flash");body.put("messages", messages);body.put("max_tokens", 1024);body.put("thinking", new JSONObject().put("type", "disabled"));// ... 发送请求代码同上
body := map[string]interface{}{"model": "deepseek-v4-flash","messages": []map[string]string{{"role": "system", "content": "你是一个友好的 AI 助手。"},{"role": "user", "content": "我叫小明,我喜欢打篮球"},{"role": "assistant", "content": "你好,小明!打篮球是一项很棒的运动。"},{"role": "user", "content": "你还记得我的名字和爱好吗?"},},"max_tokens": 1024,"thinking": map[string]string{"type": "disabled"},}// ... 发送请求代码同快速开始
Function Calling(工具调用)
Function Calling 允许模型调用外部工具获取实时数据。模型本身不执行函数,而是返回应调用的函数名和参数,由用户代码执行后将结果传回模型,最终得到自然语言回答。
调用流程:
1. 用户提问 → 模型返回
tool_calls(包含函数名和参数)。2. 用户代码执行该函数 → 将结果以
role: tool 消息传回。3. 模型根据函数结果生成最终自然语言回答。
# 第一轮:发送问题 + 工具定义curl https://tokenhub.tencentmaas.com/v1/chat/completions \\-H "Content-Type: application/json" \\-H "Authorization: Bearer YOUR_API_KEY" \\-d '{"model": "deepseek-v4-flash","messages": [{"role": "user", "content": "北京今天天气怎么样?"}],"tools": [{"type": "function","function": {"name": "get_weather","description": "获取指定城市的天气信息","parameters": {"type": "object","properties": {"city": {"type": "string", "description": "城市名称,如北京"}},"required": ["city"]}}}],"thinking": {"type": "disabled"}}'# 第二轮:将工具执行结果传回(tool_call_id 替换为实际返回的 id)curl https://tokenhub.tencentmaas.com/v1/chat/completions \\-H "Content-Type: application/json" \\-H "Authorization: Bearer YOUR_API_KEY" \\-d '{"model": "deepseek-v4-flash","messages": [{"role": "user", "content": "北京今天天气怎么样?"},{"role": "assistant", "tool_calls": [{"id": "call_xxx", "type": "function", "function": {"name": "get_weather", "arguments": "{\\"city\\": \\"北京\\"}"}}]},{"role": "tool", "tool_call_id": "call_xxx", "content": "晴,气温28℃,湿度50%"}],"tools": [{"type": "function", "function": {"name": "get_weather", "description": "获取指定城市的天气信息", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}}}],"thinking": {"type": "disabled"}}'
from openai import OpenAIclient = OpenAI(api_key="YOUR_API_KEY",base_url="https://tokenhub.tencentmaas.com/v1",)# 定义工具tools = [{"type": "function","function": {"name": "get_weather","description": "获取指定城市的天气信息","parameters": {"type": "object","properties": {"city": {"type": "string", "description": "城市名称,如北京"}},"required": ["city"],},},}]# 第一轮:发送问题messages = [{"role": "user", "content": "北京今天天气怎么样?"}]response = client.chat.completions.create(model="deepseek-v4-flash",messages=messages,tools=tools,extra_body={"thinking": {"type": "disabled"}},)assistant_message = response.choices[0].message# 模型发起工具调用if response.choices[0].finish_reason == "tool_calls":tool_call = assistant_message.tool_calls[0]print(f"模型调用工具:{tool_call.function.name},参数:{tool_call.function.arguments}")# 执行工具(此处为模拟返回)tool_result = "晴,气温28℃,湿度50%"# 第二轮:将工具结果传回模型messages.append(assistant_message)messages.append({"role": "tool","tool_call_id": tool_call.id,"content": tool_result,})final_response = client.chat.completions.create(model="deepseek-v4-flash",messages=messages,tools=tools,extra_body={"thinking": {"type": "disabled"}},)print(final_response.choices[0].message.content)
import OpenAI from "openai";const client = new OpenAI({apiKey: "YOUR_API_KEY",baseURL: "https://tokenhub.tencentmaas.com/v1",});const tools = [{type: "function",function: {name: "get_weather",description: "获取指定城市的天气信息",parameters: {type: "object",properties: {city: { type: "string", description: "城市名称,如北京" },},required: ["city"],},},},];// 第一轮const messages = [{ role: "user", content: "北京今天天气怎么样?" }];const response1 = await client.chat.completions.create({model: "deepseek-v4-flash",messages,tools,// @ts-ignorethinking: { type: "disabled" },});const assistantMsg = response1.choices[0].message;if (response1.choices[0].finish_reason === "tool_calls") {const toolCall = assistantMsg.tool_calls[0];console.log(`工具调用:${toolCall.function.name},参数:${toolCall.function.arguments}`);const toolResult = "晴,气温28℃,湿度50%";messages.push(assistantMsg);messages.push({ role: "tool", tool_call_id: toolCall.id, content: toolResult });const response2 = await client.chat.completions.create({model: "deepseek-v4-flash",messages,tools,// @ts-ignorethinking: { type: "disabled" },});console.log(response2.choices[0].message.content);}
JSONObject toolFunc = new JSONObject().put("name", "get_weather").put("description", "获取指定城市的天气信息").put("parameters", new JSONObject().put("type", "object").put("properties", new JSONObject().put("city", new JSONObject().put("type", "string").put("description", "城市名称"))).put("required", new JSONArray().put("city")));JSONArray tools = new JSONArray().put(new JSONObject().put("type", "function").put("function", toolFunc));JSONObject body = new JSONObject();body.put("model", "deepseek-v4-flash");body.put("messages", new JSONArray().put(new JSONObject().put("role", "user").put("content", "北京今天天气怎么样?")));body.put("tools", tools);body.put("thinking", new JSONObject().put("type", "disabled"));// ... 发送请求,解析 tool_calls,执行工具,构造第二轮请求
body := map[string]interface{}{"model": "deepseek-v4-flash","messages": []map[string]string{{"role": "user", "content": "北京今天天气怎么样?"},},"tools": []map[string]interface{}{{"type": "function","function": map[string]interface{}{"name": "get_weather","description": "获取指定城市的天气信息","parameters": map[string]interface{}{"type": "object","properties": map[string]interface{}{"city": map[string]string{"type": "string", "description": "城市名称"},},"required": []string{"city"},},},}},"thinking": map[string]string{"type": "disabled"},}// ... 发送请求,解析 tool_calls,构造第二轮请求
思考模式
DeepSeek 模型支持通过
thinking 参数控制是否开启推理思考模式,无需切换模型 ID。开启思考模式后,模型会先进行内部推理再给出最终答案,适合需要精确推理的复杂任务。thinking 参数说明
字段 | 类型 | 默认值 | 取值范围 | 说明 |
type | string | "enabled" | "enabled" / "disabled" | 控制思考模式开关 |
reasoning_effort | string | "high" | "high" / "max" | 思考深度, max 适合复杂 Agent 场景;low/medium 映射为 high,xhigh 映射为 max |
开启或关闭思考
curl https://tokenhub.tencentmaas.com/v1/chat/completions \\-H "Content-Type: application/json" \\-H "Authorization: Bearer YOUR_API_KEY" \\-d '{"model": "deepseek-v4-flash","messages": [{"role": "user", "content": "解方程 x^2 - 5x + 6 = 0"}],"max_tokens": 2048,"thinking": {"type": "enabled", "reasoning_effort": "high"}}'
from openai import OpenAIclient = OpenAI(api_key="YOUR_API_KEY",base_url="https://tokenhub.tencentmaas.com/v1",)response = client.chat.completions.create(model="deepseek-v4-flash",messages=[{"role": "user", "content": "解方程 x^2 - 5x + 6 = 0"}],max_tokens=2048,extra_body={"thinking": {"type": "enabled", "reasoning_effort": "high"}},)msg = response.choices[0].message# 获取推理过程(思考模式专属字段)reasoning = getattr(msg, "reasoning_content", None)if reasoning:print("=== 推理过程 ===")print(reasoning)print("=== 最终答案 ===")print(msg.content)
import OpenAI from "openai";const client = new OpenAI({apiKey: "YOUR_API_KEY",baseURL: "https://tokenhub.tencentmaas.com/v1",});const response = await client.chat.completions.create({model: "deepseek-v4-flash",messages: [{ role: "user", content: "解方程 x^2 - 5x + 6 = 0" }],max_tokens: 2048,// @ts-ignorethinking: { type: "enabled", reasoning_effort: "high" },});const msg = response.choices[0].message;const reasoning = (msg as any).reasoning_content;if (reasoning) {console.log("=== 推理过程 ===");console.log(reasoning);}console.log("=== 最终答案 ===");console.log(msg.content);
JSONObject body = new JSONObject();body.put("model", "deepseek-v4-flash");body.put("max_tokens", 2048);body.put("messages", new JSONArray().put(new JSONObject().put("role", "user").put("content", "解方程 x^2 - 5x + 6 = 0")));body.put("thinking", new JSONObject().put("type", "enabled").put("reasoning_effort", "high"));// ... 发送请求try (Response response = httpClient.newCall(request).execute()) {JSONObject result = new JSONObject(response.body().string());JSONObject message = result.getJSONArray("choices").getJSONObject(0).getJSONObject("message");String reasoning = message.optString("reasoning_content", "");String content = message.getString("content");System.out.println("推理过程: " + reasoning);System.out.println("最终答案: " + content);}
body := map[string]interface{}{"model": "deepseek-v4-flash","max_tokens": 2048,"messages": []map[string]string{{"role": "user", "content": "解方程 x^2 - 5x + 6 = 0"},},"thinking": map[string]string{"type": "enabled", "reasoning_effort": "high"},}// ... 发送请求,从响应中解析 reasoning_content 和 content 字段
响应结构示例
开启思考模式后,响应的
message 中会包含 reasoning_content 字段:{"choices": [{"message": {"role": "assistant","reasoning_content": "我需要解二次方程 x^2 - 5x + 6 = 0。\\n因式分解:(x-2)(x-3) = 0\\n所以 x = 2 或 x = 3。","content": "方程 x² - 5x + 6 = 0 的解为:**x = 2** 或 **x = 3**"},"finish_reason": "stop"}],"usage": {"completion_tokens": 120,"completion_tokens_details": {"reasoning_tokens": 80}}}
流式思考输出
开启流式输出时,
reasoning_content 和 content 均以增量 delta 形式返回,需分别处理:from openai import OpenAIclient = OpenAI(api_key="YOUR_API_KEY",base_url="https://tokenhub.tencentmaas.com/v1",)stream = client.chat.completions.create(model="deepseek-v4-flash",messages=[{"role": "user", "content": "分析一下量子计算的优势和挑战"}],max_tokens=2048,stream=True,extra_body={"thinking": {"type": "enabled"}},)print("=== 推理过程(实时)===")answer_started = Falsefor chunk in stream:if not chunk.choices:continuedelta = chunk.choices[0].deltareasoning_delta = getattr(delta, "reasoning_content", None)if reasoning_delta:print(reasoning_delta, end="", flush=True)if delta.content:if not answer_started:print("\\n\\n=== 最终答案(实时)===")answer_started = Trueprint(delta.content, end="", flush=True)
import OpenAI from "openai";const client = new OpenAI({apiKey: "YOUR_API_KEY",baseURL: "https://tokenhub.tencentmaas.com/v1",});const stream = await client.chat.completions.create({model: "deepseek-v4-flash",messages: [{ role: "user", content: "分析一下量子计算的优势和挑战" }],max_tokens: 2048,stream: true,// @ts-ignorethinking: { type: "enabled" },});let answerStarted = false;process.stdout.write("=== 推理过程(实时)===\\n");for await (const chunk of stream) {const delta = chunk.choices[0]?.delta;if (!delta) continue;const reasoning = (delta as any).reasoning_content;if (reasoning) process.stdout.write(reasoning);if (delta.content) {if (!answerStarted) {process.stdout.write("\\n\\n=== 最终答案(实时)===\\n");answerStarted = true;}process.stdout.write(delta.content);}}
多轮对话中使用思考模式
在多轮对话中,无需将上一轮的
reasoning_content 回传给模型,只需回传 content 字段即可。from openai import OpenAIclient = OpenAI(api_key="YOUR_API_KEY",base_url="https://tokenhub.tencentmaas.com/v1",)messages = [{"role": "user", "content": "斐波那契数列的第10项是多少?"}]response = client.chat.completions.create(model="deepseek-v4-flash",messages=messages,max_tokens=1024,extra_body={"thinking": {"type": "enabled"}},)assistant_msg = response.choices[0].messageprint("第一轮回答:", assistant_msg.content)# 多轮对话:只回传 content,不回传 reasoning_contentmessages.append({"role": "assistant", "content": assistant_msg.content})messages.append({"role": "user", "content": "那第20项呢?"})response2 = client.chat.completions.create(model="deepseek-v4-flash",messages=messages,max_tokens=1024,extra_body={"thinking": {"type": "enabled"}},)print("第二轮回答:", response2.choices[0].message.content)
说明:
多轮对话回写
assistant 消息时,只需将 content 字段传入,无需传入 reasoning_content。JSON 模式
设置
response_format 为 json_object 可以确保模型输出合法的 JSON 字符串,适合需要结构化数据的场景。注意:
使用 JSON 模式时,必须在
system 或 user 消息中明确要求模型输出 JSON 格式,否则可能导致模型一直输出空内容。curl https://tokenhub.tencentmaas.com/v1/chat/completions \\-H "Content-Type: application/json" \\-H "Authorization: Bearer YOUR_API_KEY" \\-d '{"model": "deepseek-v4-flash","messages": [{"role": "system", "content": "请以 JSON 格式返回结果。"},{"role": "user", "content": "返回三座中国城市的信息,每个包含 name、province、population 字段"}],"max_tokens": 512,"response_format": {"type": "json_object"},"thinking": {"type": "disabled"}}'
import jsonfrom openai import OpenAIclient = OpenAI(api_key="YOUR_API_KEY",base_url="https://tokenhub.tencentmaas.com/v1",)response = client.chat.completions.create(model="deepseek-v4-flash",messages=[{"role": "system", "content": "请以 JSON 格式返回结果。"},{"role": "user","content": "返回三座中国城市的信息,每个包含 name、province、population 字段",},],max_tokens=512,response_format={"type": "json_object"},extra_body={"thinking": {"type": "disabled"}},)result = json.loads(response.choices[0].message.content)print(json.dumps(result, ensure_ascii=False, indent=2))
import OpenAI from "openai";const client = new OpenAI({apiKey: "YOUR_API_KEY",baseURL: "https://tokenhub.tencentmaas.com/v1",});const response = await client.chat.completions.create({model: "deepseek-v4-flash",messages: [{ role: "system", content: "请以 JSON 格式返回结果。" },{role: "user",content: "返回三座中国城市的信息,每个包含 name、province、population 字段",},],max_tokens: 512,response_format: { type: "json_object" },// @ts-ignorethinking: { type: "disabled" },});const result = JSON.parse(response.choices[0].message.content);console.log(JSON.stringify(result, null, 2));
JSONObject body = new JSONObject();body.put("model", "deepseek-v4-flash");body.put("max_tokens", 512);body.put("response_format", new JSONObject().put("type", "json_object"));body.put("thinking", new JSONObject().put("type", "disabled"));body.put("messages", new JSONArray().put(new JSONObject().put("role", "system").put("content", "请以 JSON 格式返回结果。")).put(new JSONObject().put("role", "user").put("content","返回三座中国城市的信息,每个包含 name、province、population 字段")));// ... 发送请求,解析返回的 JSON 字符串
body := map[string]interface{}{"model": "deepseek-v4-flash","max_tokens": 512,"response_format": map[string]string{"type": "json_object"},"thinking": map[string]string{"type": "disabled"},"messages": []map[string]string{{"role": "system", "content": "请以 JSON 格式返回结果。"},{"role": "user", "content": "返回三座中国城市的信息,每个包含 name、province、population 字段"},},}// ... 发送请求
推荐参数与最佳实践
参数 / 实践 | 建议 | 说明 |
max_tokens | 普通任务 1024~4096;思考模式建议 ≥ 2048 | 思考内容和回答共享 token 配额 |
thinking | 简单问答用 disabled;逻辑推理、数学题用 enabled | 合理使用可降低成本 |
stream | 长文本生成建议开启 | 避免请求超时,提升响应体验 |
temperature | 一般无需修改,使用默认值 1 | 创意写作可调高至 1.3-1.5;代码生成可调低至 0.2-0.5 |
多轮对话 | 只将 content 回传,不回传 reasoning_content | 减少 token 消耗 |
SDK 访问推理字段 | Python 用 getattr(msg, "reasoning_content", None);Node.js 用 (msg as any).reasoning_content | OpenAI SDK 类型定义中无此字段 |
模型选择 | 日常任务用 deepseek-v4-flash;高精度任务用 deepseek-v4-pro | Flash 并发上限更高(2500 vs 500) |
使用限制
限制项 | 说明 |
思考模式与 JSON 模式 | 不建议同时开启 thinking.type=enabled 和 response_format.type=json_object。 |
frequency_penalty / presence_penalty | 已废弃,传入无效果。 |
超时风险 | 思考模式开启时响应时间较长,建议配合 stream=true 使用,避免超时。 |
相关文档
语言模型调用概览:TokenHub 语言模型通用调用文档,包含 BaseURL、API Key、多轮对话、Function Calling、Anthropic 协议等通用说明。