Embedding

最近更新时间:2025-08-06 14:36:42

我的收藏

接口定义

atomicEmbedding() 接口用于根据指定的 Embedding 模型将输入的文本信息转化为特征向量。
public AtomicEmbeddingRes atomicEmbedding(AtomicEmbeddingParam param) {
return this.stub.atomicEmbedding(param);
}

使用示例

import com.tencent.tcvectordb.client.VectorDBClient;
import com.tencent.tcvectordb.model.param.database.ConnectParam;
import com.tencent.tcvectordb.model.param.enums.ReadConsistencyEnum;
import com.tencent.tcvectordb.client.RPCVectorDBClient;

AtomicEmbeddingParam param = AtomicEmbeddingParam.newBuilder()
.withModel("BAAI/bge-m3")
.withDataType("text")
.withData(Arrays.asList("什么是腾讯云向量数据库"))
.withModelParam(ModelParam.newBuilder()
.withRetrieveDenseVector(true)
.withRetrieveSparseVector(true)
.build())
.build();
AtomicEmbeddingRes atomicEmbeddingRes = client.atomicEmbedding(param);
System.out.println(atomicEmbeddingRes.toString());
输出信息,如下所示:
change user password res: 0 Operation success

入参描述

参数名
参数含义
是否必须
参数配置及限制
Model
指定需使用的 Embedding 模型的名称
根据业务的语言类型、数据维度要求等综合选择合适的模型。具体信息,参见 Embedding 介绍。取值如下所示:
bge-large-zh-v1.5:适用中文,1024维,推荐使用。
bge-base-zh-v1.5:适用中文,768维。
bge-large-zh:适用中文,1024维。
bge-base-zh:适用中文,768维。
m3e-base:适用中文,768维。
e5-large-v2:适用英文,1024维。
text2vec-large-chinese:适用中文,1024维。
multilingual-e5-base:适用于多种语言类型,768维。
BAAI/bge-m3:适用于多种语言类型,1024维,支持生成稀疏向量。
ModelParam
模型参数,仅部分模型需要配置
RetrieveDenseVector:指定是否返回稠密向量,默认为 true。
RetrieveSparseVector:指定是否返回稀疏向量,默认为 false。仅 BAAI/bge-m3 模型,支持生成稀疏向量。
DataType
传入数据的类型
目前仅支持传入 text。
Data
需要向量化的数据
目前仅支持传入文本,支持批量传入,最大支持写入100条数据。

输出参数

{"code":0,"msg":"Operation success","warning":null,"count":0,"requestId":"387a7deead425418e38a923291234acb","tokenUsed":7,"denseVector":[[-0.07324566,-0.025056118,-0.03692802,0.010666399,-0.0011940568,0.025147676,8.821906E-6,-0.029633973,0.0035650036,0.0065196264,-0.003332296,0.0068324464,3.7552704E-4,0.012283907,0.022248369,-0.0034734465,-0.0075267544,-0.03479169,0.031983938,-0.03531051,-0.019791586,0.021332799,0.028352173,0.045168158,0.016053006,0.019822106,-0.02531553,-0.01724325,-0.01188716,-0.0018120671,0.01658709,0.009735568,0.053255696,-0.031068366,-0.0487694,-0.03500532,-8.720573E-5,-0.064089954,-0.03253328,0.017792592,0.032106012,-0.032044977,0.020386709,-0.045564905,0.07385604,-0.04751812,0.008743701,-0.030610582,-0.049562894,-0.0115896,-0.029954422,-0.03155667,0.0079959845,0.0030175685,0.090031125,0.024873005,-0.060488705,-0.018311415,-0.07233009,-0.054506976,0.0038110632,-0.022599338,0.036470234,0.032106012,0.022965565,0.041505873,0.005127196,0.005031824,0.0074809757,-0.019898403,0.025132416,0.001894087,-0.018174078,0.008781849,-0.019761069,-0.0035955226,-0.0067408895,0.001955125,0.045595422,-0.018036744,0.025819095,-0.026337918,-0.040681858,-0.01316133,-0.023896396,0.025468126,-0.010971589,0.017456882,0.017929927,0.08728441,0.0029870495,0.03317418,-0.0011206204,-0.04409999,-0.041750025,-0.018418232,-0.016129304,0.032930028,0.026047988,0.022263627,0.008209618,0.016678646,0.02282823,0.0117498245,-0.0012093163,-0.0018311414,0.010719807,-0.004039956,-0.06988856,0.015625741,-0.015152696,0.04037667,0.012627246,0.031068366,-0.035798814,0.0017042968,0.0035077804,-0.01753318,0.010155206,0.0193184,0.02472041,0.008400361,0.0055468325,0.01032306,-0.06360165,0.031068366,-0.025788575,-0.004757153,-0.0082859155,0.010078908,0.018326674,-0.011406485,0.046724625,0.010902922,-0.026994077,-0.08301175,-0.004822006,-0.016754944,0.020752937,-0.005596426,-0.01311555,-0.026780443,-0.07440538,0.048738882,0.077701434,-0.01844875,0.0023385203,-0.020691898,-0.044191547,-0.002363317,0.023255497,0.029191447,-0.022065254,0.013825118,0.0034791688,-0.0073970486],"sparseVector":[{"云":0.2199707,"什么是":0.16760254,"向":0.28076172,"数据库":0.29223633,"腾讯":0.23168945,"量":0.28833008}]}
参数名
参数含义
tokenUsed
Embedding 消耗的 Token 数量。
denseVector
稠密向量。
sparseVector
稀疏向量,仅当模型支持生成稀疏向量,且 returnSparseVector 设置为 true 时返回该参数。