前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
社区首页 >专栏 >探索腾讯云向量数据库:从入门开始

探索腾讯云向量数据库:从入门开始

原创
作者头像
不惑
发布于 2023-11-23 07:54:12
发布于 2023-11-23 07:54:12
82900
代码可运行
举报
文章被收录于专栏:GoboyGoboy
运行总次数:0
代码可运行

概述

向量数据库是一种专门用于存储和查询向量数据的数据库。向量数据的典型结构是一个一维数组,其中的元素是数值(通常是浮点数)。这些数值表示对象或数据点在多维空间中的位置、特征或属性。例如: 在自然语言处理中,一篇文章可以用一个词向量表示,每个词可以用一个数字表示其在词向量空间中的位置。 在图像处理中,一张图片可以用一个像素向量表示,每个像素可以用三个数字表示其RGB值。 在推荐系统中,一个用户可以用一个用户向量表示,每个维度可以表示不同的兴趣爱好或行为偏好。 腾讯云向量数据库(Tencent Cloud VectorDB)是一款全托管的自研企业级分布式数据库服务,单索引支持 10 亿级向量规模,可支持百万级 QPS 及毫秒级查询延迟。不仅能提高大模型回答的准确性,还可广泛应用于推荐系统、自然语言处理等领域。

什么是向量数据库

向量数据库是一种专门用于存储、检索和计算向量的数据库系统。它通过将数据表示为向量(数学上的一种数据结构),从而能够高效地处理相似性搜索和聚类等任务。这种数据库通常用于处理大规模的高维数据,如图像、文本和音频等。通过使用向量数据库,可以更有效地进行复杂的数据分析和模式识别。

举个例子

想象一下你有一组电影,每部电影都可以用一个向量表示,其中包含了影片的不同特征,比如类型、导演、演员等。现在,你想找到与给定电影最相似的电影。

如果使用向量数据库,你可以将每部电影表示为一个向量,比如3,1,4,2,...3,1,4,2,...,其中每个数字代表某个特征的值。现在,通过计算向量之间的相似性,你可以找到与给定电影向量最接近的其他电影向量。

这样,当你想要寻找与某部电影相似的电影时,不需要遍历整个电影库,而是可以利用向量数据库迅速找到最匹配的电影,使得相似性计算更加高效。

倒序索引

向量数据库和倒排索引有一些相似之处,尤其是在处理相似性搜索的情境下。

在倒排索引中,你会创建一个映射,将每个关键词(或特征)与包含该关键词的文档关联起来。这样,在搜索时,你可以快速找到包含特定关键词的文档,而不必遍历整个文档集合。

类似地,在向量数据库中,你将每个数据点(如图像或电影)表示为向量,并通过计算向量之间的相似性来实现搜索。这就使得在高维空间中寻找相似项变得更加高效,因为你可以快速排除那些在向量空间中距离较远的项。

所以,两者都是为了提高相似性搜索的效率,但实现的方式略有不同。倒排索引更注重关键词的匹配,而向量数据库则更专注于在高维空间中寻找相似的向量。

产品架构

腾讯云向量数据库(Tencent Cloud VectorDB)采用分布式部署架构,每个节点相互通信和协调,实现数据存储与检索。客户端请求通过 Load balance 分发到各节点上。

负载均衡(Load Balancer,LB):是对多台后端服务器进行流量分发的服务。向量数据库集群架构节点数量 >= 3,自动通过 Load balance 来均衡访问。

分布式 Storage Node:向量数据库集群由多个节点构成,每个节点均可直接进行读/写操作,负责数据的计算及存储。Collection 是向量数据的基本组织形式,将向量集合拆分成多个分片,并分配到不同的节点上进行存储和处理;每个分片还会在其他节点上同步产生多个副本,以保证数据库服务的可扩展性与高可用性。

Meta Server:集群管理模块,由一组 Master 节点组成,负责存储集群的节点信息、数据分片信息等元数据信息。

Embedding Service:是一种将非结构化数据(如文本、图像、音频等)转换为向量表示的服务,从而方便进行分析、聚类等操作。具体信息,请参见 Embedding 介绍

Split Service:是一种将文本拆分成短语或句子等的服务。

说明:腾讯云向量数据库提供的 Split Service 模型能力,目前在开发调试中。具体上线时间,请关注 产品动态

Object Service:负责将数据批量导入到指定集合,支持多种数据导入格式。

Object Storage:用于存储和管理数据导入服务中上传的数据文件。

不同数据类型的对比

我们简单地描述它们的区别,并配以一些具体的例子:

特点

关系型数据库

非关系型数据库

向量数据库

数据组织方式

表格形式,例如学生表、课程表

键值对(Key-Value),例如存储用户配置信息的键值对

向量表示,例如图像向量表示

数据结构

表、行、列

可能是键值对、文档、列族等

高维向量,每个维度表示特征

查询语言

SQL

通常没有统一的查询语言

专注于相似性搜索的查询语言

一致性和事务

强调一致性和事务处理

强调分布式和横向扩展

强调相似性计算和搜索

应用场景

复杂查询、事务处理

大规模、分布式、动态数据

相似性搜索、推荐系统、图像识别等

例子:

关系型数据库

代码语言:txt
AI代码解释
复制
- _表格形式的数据表示:_ 学生表包含学生的学号、姓名、课程表包含课程信息。
- _SQL查询:_ `SELECT * FROM Students WHERE Grade > 90;`

非关系型数据库:

代码语言:txt
AI代码解释
复制
- _键值对形式的数据表示:_ 存储用户配置信息,如`{"username": "user1", "email": "user1@example.com"}`。
- _动态数据:_ 社交媒体中用户的实时更新。

向量数据库:

代码语言:txt
AI代码解释
复制
- _向量表示的数据:_ 图像可以被表示为高维向量,其中每个维度表示图像的某个特征。
- _相似性搜索:_ 寻找与给定图像向量相似的其他图像向量。

快速入门

购买数据库实例

操作场景

您可根据本文的介绍,购买和配置您的第一台腾讯云向量数据库(Tencent Cloud VectorDB)。

地域

当前支持北京、上海、广州、上海自动驾驶云、中国香港、新加坡,其他地域在规划准备中。

前提条件

已注册腾讯云账号并完成实名认证。

如需注册腾讯云账号:请单击 注册腾讯云账号

如需完成实名认证:请单击 实名认证

已规划数据库实例需满足的规格。具体信息,请参见 产品规格

已规划数据库实例的私有网络与安全组,请参见 私有网络安全组

操作步骤

  1. 使用腾讯云账号登录 向量数据库控制台
  2. 单击新建,进入新建向量数据库实例页面。
  3. 请参见下表,配置如下参数,购买实例。

内网登录

新建数据库

开启外网登录

测试连接

HTTP API

腾讯云向量数据库(Tencent Cloud VectorDB)通过 HTTP 协议进行数据写入和查询等操作。您可以将不同类型的请求消息以 JSON 格式放入 HTTP 请求消息 Body 中,将请求发送到 VectorDB 的 HTTP API 地址即可。VectorDB 将自动解析请求消息 Body 中的 JSON 数据,并将其存储到数据库中。

API 列表

接口层级

接口名

接口含义

请求方式

URL 拼接地址

Database

/database/create

创建数据库

POST

http://{实例内网IP地址}:{实例网络端口}/database/create

Database

/database/drop

删除数据库

POST

http://{实例内网IP地址}:{实例网络端口}/database/drop

Database

/database/list

查询所有数据库

GET

http://{实例内网IP地址}:{实例网络端口}/database/list

Collection

/collection/create

创建集合

POST

http://{实例内网IP地址}:{实例网络端口}/collection/create

Collection

/collection/drop

删除集合

POST

http://{实例内网IP地址}:{实例网络端口}/collection/drop

Collection

/collection/list

查询集合

POST

http://{实例内网IP地址}:{实例网络端口}/collection/list

Collection

/collection/describe

查询指定集合

POST

http://{实例内网IP地址}:{实例网络端口}/collection/describe

Collection

/collection/truncate

清空集合别名

POST

http://{实例内网IP地址}:{实例网络端口}/collection/truncate

Alias

/alias/set

给集合创建别名

POST

http://{实例内网IP地址}:{实例网络端口}/alias/set

Alias

/alias/delete

删除集合别名

POST

http://{实例内网IP地址}:{实例网络端口}/alias/delete

Document

/document/upsert

插入数据

POST

http://{实例内网IP地址}:{实例网络端口}/document/upsert

Document

/document/query

精确查找数据

POST

http://{实例内网IP地址}:{实例网络端口}/document/query

Document

/document/search

检索相似向量

POST

http://{实例内网IP地址}:{实例网络端口}/document/search

Document

/document/delete

删除数据

POST

http://{实例内网IP地址}:{实例网络端口}/document/delete

Document

/document/update

更新数据

POST

http://{实例内网IP地址}:{实例网络端口}/document/update

Index

/index/rebuild

重建索引

POST

http://{实例内网IP地址}:{实例网络端口}/index/rebuild

Java SDK

腾讯云向量数据库(Tencent Cloud VectorDB)的 Java SDK 是将 HTTP API 封装成易于使用的 Java 函数或类。开发者可以通过 Java SDK 更加方便地操作数据库。

SDK 信息

语言

版本

SDK 下载

SDK 源码

API

Java

Java 8 或更高版本

vectordb-sdk-java.tar.gz 说明:SDK 最新版本为:1.0.3。

创建客户端: VectorDBClient() 管理数据库:createDatabase() 管理 Collection:createCollection() 管理 Document:upsert()

接入方式

如下为 Gradle 与 Maven 项目添加 SDK 最新版本 1.0.3 依赖的不同方式,请依据实际需求添加。

Gradle 引入

请在 Gradle 项目的 build.gradle 文件中添加如下依赖。

代码语言:javascript
代码运行次数:0
运行
AI代码解释
复制
com.tencent.tcvectordb:vectordatabase-sdk-java:1.0.3

Maven 引入

请在 Maven 项目的 pom.xml 文件中添加如下依赖。

代码语言:javascript
代码运行次数:0
运行
AI代码解释
复制
 <dependency>
      <groupId>com.tencent.tcvectordb</groupId>
      <artifactId>vectordatabase-sdk-java</artifactId>
      <version>1.0.3</version>
 </dependency>
 
<dependency>
	<groupId>org.web3j</groupId>
	<artifactId>core</artifactId>
	<version>5.0.0</version>
</dependency>

<dependency>
	<groupId>com.squareup.okhttp3</groupId>
	<artifactId>okhttp</artifactId>
	<version>4.9.2</version>
</dependency>

okhttp3版本是4.9.2因为vectordatabase-sdk-java1.0.3的引用

代码示例

代码语言:java
AI代码解释
复制
package com.example.demo.controller;
import com.tencent.tcvectordb.client.VectorDBClient;
import com.tencent.tcvectordb.exception.VectorDBException;
import com.tencent.tcvectordb.model.Collection;
import com.tencent.tcvectordb.model.Database;
import com.tencent.tcvectordb.model.DocField;
import com.tencent.tcvectordb.model.Document;
import com.tencent.tcvectordb.model.param.collection.*;
import com.tencent.tcvectordb.model.param.database.ConnectParam;
import com.tencent.tcvectordb.model.param.dml.*;
import com.tencent.tcvectordb.model.param.entity.AffectRes;
import com.tencent.tcvectordb.model.param.enums.ReadConsistencyEnum;


import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
/**
 * VectorDB Java SDK usage example
 */
public class VectorDBExample {

    private static final String DBNAME = "book";
    private static final String COLL_NAME = "book_segments";
    private static final String COLL_NAME_ALIAS = "collection_alias";

    public static void main(String[] args) throws InterruptedException {
        example();
    }

    public static void example() throws InterruptedException {
        // 创建 VectorDB Client
        ConnectParam connectParam = initConnectParam();
        VectorDBClient client = new VectorDBClient(connectParam, ReadConsistencyEnum.EVENTUAL_CONSISTENCY);

        // 测试前清理环境
        System.out.println("---------------------- clear before test ----------------------");
        anySafe(() -> clear(client));
        createDatabaseAndCollection(client);
        upsertData(client);
        queryData(client);
        updateAndDelete(client);
        deleteAndDrop(client);
        testFilter();
    }

    /**
     * init connect parameter
     * @return {@link ConnectParam}
     */
    private static ConnectParam initConnectParam() {
//        System.out.println("\tvdb_url: " + System.getProperty("vdb_url"));
//        System.out.println("\tvdb_key: " + System.getProperty("vdb_key"));
        return ConnectParam.newBuilder()
                .withUrl("http://lb-xxxxxxxxxx.com:30000")
                .withUsername("你自己的用户名")
                .withKey("你自己的密码")
                .withTimeout(30)
                .build();
    }

    /**
     * 执行 {@link Runnable} 捕获所有异常
     * @param runnable {@link Runnable}
     */
    private static void anySafe(Runnable runnable) {
        try {
            runnable.run();
        } catch (VectorDBException e) {
            System.err.println(e);
            e.printStackTrace();
        }
    }

    private static void createDatabaseAndCollection(VectorDBClient client) {
        // 1. 创建数据库
        System.out.println("---------------------- createDatabase ----------------------");
        Database db = client.createDatabase(DBNAME);

        // 2. 列出所有数据库
        System.out.println("---------------------- listCollections ----------------------");
        List<String> database = client.listDatabase();
        for (String s : database) {
            System.out.println("\tres: " + s);
        }

        // 3. 创建 collection
        System.out.println("---------------------- createCollection ----------------------");
        CreateCollectionParam collectionParam = initCreateCollectionParam(COLL_NAME);
        db.createCollection(collectionParam);

        // 4. 列出所有 collection
        System.out.println("---------------------- listCollections ----------------------");
        List<Collection> cols = db.listCollections();
        for (Collection col : cols) {
            System.out.println("\tres: " + col.toString());
        }

        // 5. 设置 collection 别名
        System.out.println("---------------------- setAlias ----------------------");
        AffectRes affectRes = db.setAlias(COLL_NAME, COLL_NAME_ALIAS);
        System.out.println("\tres: " + affectRes.toString());

        // 6. describe collection
        System.out.println("---------------------- describeCollection ----------------------");
        Collection descCollRes = db.describeCollection(COLL_NAME);
        System.out.println("\tres: " + descCollRes.toString());

        // 7. delete alias
        System.out.println("---------------------- deleteAlias ----------------------");
        AffectRes affectRes1 = db.deleteAlias(COLL_NAME_ALIAS);
        System.out.println("\tres: " + affectRes1);

        // 8. describe collection
        System.out.println("---------------------- describeCollection ----------------------");
        Collection descCollRes1 = db.describeCollection(COLL_NAME);
        System.out.println("\tres: " + descCollRes1.toString());
    }

    private static void upsertData(VectorDBClient client) throws InterruptedException {
        Database database = client.database(DBNAME);
        Collection collection = database.describeCollection(COLL_NAME);
        List<Document> documentList = new ArrayList<>(Arrays.asList(
                Document.newBuilder()
                        .withId("0001")
                        .withVector(Arrays.asList(0.2123, 0.21, 0.213))
                        .addDocField(new DocField("bookName", "西游记"))
                        .addDocField(new DocField("author", "吴承恩"))
                        .addDocField(new DocField("page", 21))
                        .addDocField(new DocField("segment", "富贵功名,前缘分定,为人切莫欺心。"))
                        .build(),
                Document.newBuilder()
                        .withId("0002")
                        .withVector(Arrays.asList(0.2123, 0.22, 0.213))
                        .addDocField(new DocField("bookName", "西游记"))
                        .addDocField(new DocField("author", "吴承恩"))
                        .addDocField(new DocField("page", 22))
                        .addDocField(new DocField("segment",
                                "正大光明,忠良善果弥深。些些狂妄天加谴,眼前不遇待时临。"))
                        .build(),
                Document.newBuilder()
                        .withId("0003")
                        .withVector(Arrays.asList(0.2123, 0.23, 0.213))
                        .addDocField(new DocField("bookName", "三国演义"))
                        .addDocField(new DocField("author", "罗贯中"))
                        .addDocField(new DocField("page", 23))
                        .addDocField(new DocField("segment", "细作探知这个消息,飞报吕布。"))
                        .build(),
                Document.newBuilder()
                        .withId("0004")
                        .withVector(Arrays.asList(0.2123, 0.24, 0.213))
                        .addDocField(new DocField("bookName", "三国演义"))
                        .addDocField(new DocField("author", "罗贯中"))
                        .addDocField(new DocField("page", 24))
                        .addDocField(new DocField("segment", "富贵功名,前缘分定,为人切莫欺心。"))
                        .build(),
                Document.newBuilder()
                        .withId("0005")
                        .withVector(Arrays.asList(0.2123, 0.25, 0.213))
                        .addDocField(new DocField("bookName", "三国演义"))
                        .addDocField(new DocField("author", "罗贯中"))
                        .addDocField(new DocField("page", 25))
                        .addDocField(new DocField("segment",
                                "布大惊,与陈宫商议。宫曰:“闻刘玄德新领徐州,可往投之。"))
                        .build()));
        System.out.println("---------------------- upsert ----------------------");
        InsertParam insertParam = InsertParam.newBuilder().addAllDocument(documentList).withBuildIndex(true).build();
        collection.upsert(insertParam);

        // notice:upsert 操作可用会有延迟
        Thread.sleep(1000 * 5);
    }

    private static void queryData(VectorDBClient client) {
        Database database = client.database(DBNAME);
        Collection collection = database.describeCollection(COLL_NAME);

        System.out.println("---------------------- query ----------------------");
        List<String> documentIds = Arrays.asList("0001", "0002", "0003", "0004", "0005");
        Filter filterParam = new Filter("bookName=\"三国演义\"");
        List<String> outputFields = Arrays.asList("id", "bookName");
        QueryParam queryParam = QueryParam.newBuilder()
                .withDocumentIds(documentIds)
                // 使用 filter 过滤数据
                .withFilter(filterParam)
                // limit 限制返回行数,1 到 16384 之间
                .withLimit(2)
                // 偏移
                .withOffset(1)
                // 指定返回的 fields
                .withOutputFields(outputFields)
                // 是否返回 vector 数据
                .withRetrieveVector(false)
                .build();
        List<Document> qdos = collection.query(queryParam);
        for (Document doc : qdos) {
            System.out.println("\tres: " + doc.toString());
        }

        // searchById
        // 1. searchById 提供按 id 搜索的能力
        // 2. 支持通过 filter 过滤数据
        // 3. 如果仅需要部分 field 的数据,可以指定 output_fields 用于指定返回数据包含哪些 field,不指定默认全部返回
        // 4. limit 用于限制每个单元搜索条件的条数,如 vector 传入三组向量,limit 为 3,则 limit 限制的是每组向量返回 top 3 的相似度向量

        System.out.println("---------------------- searchById ----------------------");
        SearchByIdParam searchByIdParam = SearchByIdParam.newBuilder()
                .withDocumentIds(documentIds)
                // 若使用 HNSW 索引,则需要指定参数 ef,ef 越大,召回率越高,但也会影响检索速度
                .withParams(new HNSWSearchParams(100))
                // 指定 Top K 的 K 值
                .withLimit(2)
                // 过滤获取到结果
                .withFilter(filterParam)
                .build();
        List<List<Document>> siDocs = collection.searchById(searchByIdParam);
        int i = 0;
        for (List<Document> docs : siDocs) {
            System.out.println("\tres: " + i++);
            for (Document doc : docs) {
                System.out.println("\tres: " + doc.toString());
            }
        }

        // search
        // 1. search 提供按照 vector 搜索的能力
        // 其他选项类似 search 接口

        System.out.println("---------------------- search ----------------------");
        SearchByVectorParam searchByVectorParam = SearchByVectorParam.newBuilder()
                .addVector(Arrays.asList(0.2123, 0.23, 0.213))
                // 若使用 HNSW 索引,则需要指定参数ef,ef越大,召回率越高,但也会影响检索速度
                .withParams(new HNSWSearchParams(100))
                // 指定 Top K 的 K 值
                .withLimit(2)
                // 过滤获取到结果
                .withFilter(filterParam)
                .build();
        // 输出相似性检索结果,检索结果为二维数组,每一位为一组返回结果,分别对应 search 时指定的多个向量
        List<List<Document>> svDocs = collection.search(searchByVectorParam);
        i = 0;
        for (List<Document> docs : svDocs) {
            System.out.println("\tres: " + i);
            i++;
            for (Document doc : docs) {
                System.out.println("\tres: " + doc.toString());
            }
        }
    }

    private static void updateAndDelete(VectorDBClient client) throws InterruptedException {
        Database database = client.database(DBNAME);
        Collection collection = database.describeCollection(COLL_NAME);

        System.out.println("---------------------- update ----------------------");
        // update
        // 1. update 提供基于 [主键查询] 和 [Filter 过滤] 的部分字段更新或者非索引字段新增

        // filter 限制仅会更新 id = "0003"
        Filter filterParam = new Filter("bookName=\"三国演义\"");
        List<String> documentIds = Arrays.asList("0001", "0003");
        UpdateParam updateParam = UpdateParam
                .newBuilder()
                .addAllDocumentId(documentIds)
                .withFilter(filterParam)
                .build();
        Document updateDoc = Document
                .newBuilder()
                .addDocField(new DocField("page", 100))
                // 支持添加新的内容
                .addDocField(new DocField("extend", "extendContent"))
                .build();
        collection.update(updateParam, updateDoc);


        System.out.println("---------------------- delete ----------------------");
        // delete
        // 1. delete 提供基于[ 主键查询]和[Filter 过滤]的数据删除能力
        // 2. 删除功能会受限于 collection 的索引类型,部分索引类型不支持删除操作

        // filter 限制只会删除 id = "00001" 成功
        filterParam = new Filter("bookName=\"西游记\"");
        DeleteParam build = DeleteParam
                .newBuilder()
                .addAllDocumentId(documentIds)
                .withFilter(filterParam)
                .build();
        collection.delete(build);

        // rebuild index
        System.out.println("---------------------- rebuildIndex ----------------------");

        RebuildIndexParam rebuildIndexParam = RebuildIndexParam
                .newBuilder()
                .withDropBeforeRebuild(false)
                .withThrottle(1)
                .build();
        collection.rebuildIndex(rebuildIndexParam);

        Thread.sleep(1000 * 5);

        // query
        System.out.println("----------------------  query ----------------------");
        documentIds = Arrays.asList("0001", "0002", "0003", "0004", "0005");
        List<String> outputFields = Arrays.asList("id", "bookName", "page", "extend");
        QueryParam queryParam = QueryParam.newBuilder()
                .withDocumentIds(documentIds)
                // 使用 filter 过滤数据
                .withOutputFields(outputFields)
                // 是否返回 vector 数据
                .withRetrieveVector(false)
                .build();
        List<Document> qdos = collection.query(queryParam);
        for (Document doc : qdos) {
            System.out.println("\tres: " + doc.toString());
        }

        // truncate 会清除整个 Collection 的数据,包括索引
        System.out.println("---------------------- truncate collection ----------------------");
        AffectRes affectRes = database.truncateCollections(COLL_NAME);
        System.out.println("\tres: " + affectRes.toString());

        // notice:delete操作可用会有延迟
        Thread.sleep(1000 * 5);
    }

    private static void deleteAndDrop(VectorDBClient client) {
        Database database = client.database(DBNAME);

        // 删除 collection
        System.out.println("---------------------- truncate collection ----------------------");
        database.dropCollection(COLL_NAME);

        // 删除 database
        System.out.println("---------------------- truncate collection ----------------------");
        client.dropDatabase(DBNAME);
    }

    private static void clear(VectorDBClient client) {
        List<String> databases = client.listDatabase();
        for (String database : databases) {
            client.dropDatabase(database);
        }
    }

    /**
     * 初始化创建 Collection 参数
     * 通过调用 addField 方法设计索引(不是设计 Collection 的结构)
     * <ol>
     *     <li>【重要的事】向量对应的文本字段不要建立索引,会浪费较大的内存,并且没有任何作用。</li>
     *     <li>【必须的索引】:主键id、向量字段 vector 这两个字段目前是固定且必须的,参考下面的例子;</li>
     *     <li>【其他索引】:检索时需作为条件查询的字段,比如要按书籍的作者进行过滤,这个时候author字段就需要建立索引,
     *     否则无法在查询的时候对 author 字段进行过滤,不需要过滤的字段无需加索引,会浪费内存;</li>
     *     <li>向量数据库支持动态 Schema,写入数据时可以写入任何字段,无需提前定义,类似MongoDB.</li>
     *     <li><例子中创建一个书籍片段的索引,例如书籍片段的信息包括 {id, vector, segment, bookName, author, page},
     *     id 为主键需要全局唯一,segment 为文本片段, vector 字段需要建立向量索引,假如我们在查询的时候要查询指定书籍
     * @param collName
     * @return
     */
    private static CreateCollectionParam initCreateCollectionParam(String collName) {
        return CreateCollectionParam.newBuilder()
                .withName(collName)
                .withShardNum(2)
                .withReplicaNum(0)
                .withDescription("test collection0")
                .addField(new FilterIndex("id", FieldType.String, IndexType.PRIMARY_KEY))
                .addField(new VectorIndex("vector", 3, IndexType.HNSW,
                        MetricType.COSINE, new HNSWParams(16, 200)))
                .addField(new FilterIndex("bookName", FieldType.String, IndexType.FILTER))
                .addField(new FilterIndex("author", FieldType.String, IndexType.FILTER))
                .build();
    }

    /**
     * 测试 Filter
     */
    public static void testFilter() {
        System.out.println("---------------------- testFilter ----------------------");
        System.out.println("\tres: " + new Filter("author=\"jerry\"")
                .and("a=1")
                .or("r=\"or\"")
                .orNot("rn=2")
                .andNot("an=\"andNot\"")
                .getCond());
        System.out.println("\tres: " + Filter.in("key", Arrays.asList("v1", "v2", "v3")));
        System.out.println("\tres: " + Filter.in("key", Arrays.asList(1, 2, 3)));
    }
}

     

执行完成

代码语言:javascript
代码运行次数:0
运行
AI代码解释
复制
执行日志!

15:30:56.058 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - header: Bearer account=root&api_key=xxx
---------------------- clear before test ----------------------
15:30:56.539 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/database/list, code=200, msg=OK, result={"code":0,"msg":"operation success","databases":["book"],"info":{"book":{"createTime":"2023-11-23 15:29:46","dbType":"BASE"}}}
15:30:56.594 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/database/drop, body={"database":"book"}
15:30:56.656 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/database/drop, code=200, msg=OK, result={"code":0,"msg":"operation success","affectedCount":1}
---------------------- createDatabase ----------------------
15:30:56.657 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/database/create, body={"database":"book"}
15:30:56.717 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/database/create, code=200, msg=OK, result={"code":0,"msg":"operation success","affectedCount":1}
---------------------- listCollections ----------------------
15:30:56.776 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/database/list, code=200, msg=OK, result={"code":0,"msg":"operation success","databases":["book"],"info":{"book":{"createTime":"2023-11-23 15:30:56","dbType":"BASE"}}}
	res: book
---------------------- createCollection ----------------------
15:30:56.890 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/collection/create, body={"database":"book","collection":"book_segments","replicaNum":0,"shardNum":2,"description":"test collection0","indexes":[{"fieldName":"id","fieldType":"string","indexType":"primaryKey"},{"fieldName":"vector","fieldType":"vector","indexType":"HNSW","metricType":"COSINE","params":{"efConstruction":200,"M":16},"dimension":3},{"fieldName":"bookName","fieldType":"string","indexType":"filter"},{"fieldName":"author","fieldType":"string","indexType":"filter"}],"documentCount":0}
15:31:00.478 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/collection/create, code=200, msg=OK, result={"code":0,"msg":"operation success","affectedCount":1}
---------------------- listCollections ----------------------
15:31:00.478 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/collection/list, body={"database":"book"}
15:31:00.537 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/collection/list, code=200, msg=OK, result={"code":0,"msg":"operation success","collections":[{"database":"book","collection":"book_segments","documentCount":0,"replicaNum":0,"shardNum":2,"createTime":"2023-11-23 15:30:56","description":"test collection0","indexes":[{"fieldName":"id","fieldType":"string","indexType":"primaryKey"},{"fieldName":"author","fieldType":"string","indexType":"filter"},{"fieldName":"vector","fieldType":"vector","indexType":"HNSW","indexedCount":0,"dimension":3,"metricType":"COSINE","params":{"M":16,"efConstruction":200}},{"fieldName":"bookName","fieldType":"string","indexType":"filter"}],"indexStatus":{"status":"ready","startTime":""}}]}
	res: {"database":"book","collection":"book_segments","replicaNum":0,"shardNum":2,"description":"test collection0","indexes":[{"fieldName":"id","fieldType":"string","indexType":"primaryKey"},{"fieldName":"author","fieldType":"string","indexType":"filter"},{"fieldName":"vector","fieldType":"vector","indexType":"HNSW","metricType":"COSINE","params":{"efConstruction":200,"M":16},"dimension":3,"indexedCount":0},{"fieldName":"bookName","fieldType":"string","indexType":"filter"}],"createTime":"2023-11-23 15:30:56","documentCount":0,"indexStatus":{"status":"ready"}}
---------------------- setAlias ----------------------
15:31:00.625 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/alias/set, body={"database":"book","collection":"book_segments","alias":"collection_alias"}
15:31:00.684 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/alias/set, code=200, msg=OK, result={"code":0,"msg":"operation success","affectedCount":1}
	res: AffectRes{affectedCount=1, code=0, msg='operation success'}
---------------------- describeCollection ----------------------
15:31:00.687 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/collection/describe, body={"database":"book","collection":"book_segments"}
15:31:00.746 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/collection/describe, code=200, msg=OK, result={"code":0,"msg":"operation success","collection":{"database":"book","collection":"book_segments","documentCount":0,"alias":["collection_alias"],"replicaNum":0,"shardNum":2,"createTime":"2023-11-23 15:30:56","description":"test collection0","indexes":[{"fieldName":"author","fieldType":"string","indexType":"filter"},{"fieldName":"id","fieldType":"string","indexType":"primaryKey"},{"fieldName":"vector","fieldType":"vector","indexType":"HNSW","indexedCount":0,"dimension":3,"metricType":"COSINE","params":{"M":16,"efConstruction":200}},{"fieldName":"bookName","fieldType":"string","indexType":"filter"}],"indexStatus":{"status":"ready","startTime":""}}}
	res: {"database":"book","collection":"book_segments","replicaNum":0,"shardNum":2,"description":"test collection0","indexes":[{"fieldName":"author","fieldType":"string","indexType":"filter"},{"fieldName":"id","fieldType":"string","indexType":"primaryKey"},{"fieldName":"vector","fieldType":"vector","indexType":"HNSW","metricType":"COSINE","params":{"efConstruction":200,"M":16},"dimension":3,"indexedCount":0},{"fieldName":"bookName","fieldType":"string","indexType":"filter"}],"createTime":"2023-11-23 15:30:56","documentCount":0,"indexStatus":{"status":"ready"},"alias":["collection_alias"]}
---------------------- deleteAlias ----------------------
15:31:00.759 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/alias/delete, body={"database":"book","alias":"collection_alias"}
15:31:00.820 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/alias/delete, code=200, msg=OK, result={"code":0,"msg":"operation success","affectedCount":1}
	res: AffectRes{affectedCount=1, code=0, msg='operation success'}
---------------------- describeCollection ----------------------
15:31:00.822 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/collection/describe, body={"database":"book","collection":"book_segments"}
15:31:00.881 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/collection/describe, code=200, msg=OK, result={"code":0,"msg":"operation success","collection":{"database":"book","collection":"book_segments","documentCount":0,"replicaNum":0,"shardNum":2,"createTime":"2023-11-23 15:30:56","description":"test collection0","indexes":[{"fieldName":"vector","fieldType":"vector","indexType":"HNSW","indexedCount":0,"dimension":3,"metricType":"COSINE","params":{"M":16,"efConstruction":200}},{"fieldName":"bookName","fieldType":"string","indexType":"filter"},{"fieldName":"author","fieldType":"string","indexType":"filter"},{"fieldName":"id","fieldType":"string","indexType":"primaryKey"}],"indexStatus":{"status":"ready","startTime":""}}}
	res: {"database":"book","collection":"book_segments","replicaNum":0,"shardNum":2,"description":"test collection0","indexes":[{"fieldName":"vector","fieldType":"vector","indexType":"HNSW","metricType":"COSINE","params":{"efConstruction":200,"M":16},"dimension":3,"indexedCount":0},{"fieldName":"bookName","fieldType":"string","indexType":"filter"},{"fieldName":"author","fieldType":"string","indexType":"filter"},{"fieldName":"id","fieldType":"string","indexType":"primaryKey"}],"createTime":"2023-11-23 15:30:56","documentCount":0,"indexStatus":{"status":"ready"}}
15:31:00.894 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/collection/describe, body={"database":"book","collection":"book_segments"}
15:31:00.956 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/collection/describe, code=200, msg=OK, result={"code":0,"msg":"operation success","collection":{"database":"book","collection":"book_segments","documentCount":0,"replicaNum":0,"shardNum":2,"createTime":"2023-11-23 15:30:56","description":"test collection0","indexes":[{"fieldName":"id","fieldType":"string","indexType":"primaryKey"},{"fieldName":"author","fieldType":"string","indexType":"filter"},{"fieldName":"bookName","fieldType":"string","indexType":"filter"},{"fieldName":"vector","fieldType":"vector","indexType":"HNSW","indexedCount":0,"dimension":3,"metricType":"COSINE","params":{"M":16,"efConstruction":200}}],"indexStatus":{"status":"ready","startTime":""}}}
---------------------- upsert ----------------------
15:31:00.977 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/document/upsert, body={"database":"book","collection":"book_segments","buildIndex":true,"documents":[{"id":"0001","vector":[0.2123,0.21,0.213],"bookName":"西游记","author":"吴承恩","page":21,"segment":"富贵功名,前缘分定,为人切莫欺心。"},{"id":"0002","vector":[0.2123,0.22,0.213],"bookName":"西游记","author":"吴承恩","page":22,"segment":"正大光明,忠良善果弥深。些些狂妄天加谴,眼前不遇待时临。"},{"id":"0003","vector":[0.2123,0.23,0.213],"bookName":"三国演义","author":"罗贯中","page":23,"segment":"细作探知这个消息,飞报吕布。"},{"id":"0004","vector":[0.2123,0.24,0.213],"bookName":"三国演义","author":"罗贯中","page":24,"segment":"富贵功名,前缘分定,为人切莫欺心。"},{"id":"0005","vector":[0.2123,0.25,0.213],"bookName":"三国演义","author":"罗贯中","page":25,"segment":"布大惊,与陈宫商议。宫曰:“闻刘玄德新领徐州,可往投之。"}]}
15:31:01.049 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/document/upsert, code=200, msg=OK, result={"code":0,"msg":"operation success","affectedCount":5}
15:31:06.051 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/collection/describe, body={"database":"book","collection":"book_segments"}
15:31:06.111 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/collection/describe, code=200, msg=OK, result={"code":0,"msg":"operation success","collection":{"database":"book","collection":"book_segments","documentCount":5,"replicaNum":0,"shardNum":2,"createTime":"2023-11-23 15:30:56","description":"test collection0","indexes":[{"fieldName":"bookName","fieldType":"string","indexType":"filter"},{"fieldName":"vector","fieldType":"vector","indexType":"HNSW","indexedCount":5,"dimension":3,"metricType":"COSINE","params":{"M":16,"efConstruction":200}},{"fieldName":"id","fieldType":"string","indexType":"primaryKey"},{"fieldName":"author","fieldType":"string","indexType":"filter"}],"indexStatus":{"status":"ready","startTime":""}}}
---------------------- query ----------------------
15:31:06.130 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/document/query, body={"database":"book","collection":"book_segments","readConsistency":"eventualConsistency","query":{"filter":"bookName=\"三国演义\"","documentIds":["0001","0002","0003","0004","0005"],"retrieveVector":false,"limit":2,"offset":1,"outputFields":["id","bookName"]}}
15:31:06.189 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/document/query, code=200, msg=OK, result={"code":0,"msg":"operation success","count":2,"documents":[{"id":"0003","bookName":"三国演义"},{"id":"0004","bookName":"三国演义"}]}
	res: {"id":"0003","bookName":"三国演义"}
	res: {"id":"0004","bookName":"三国演义"}
---------------------- searchById ----------------------
15:31:06.202 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/document/search, body={"database":"book","collection":"book_segments","search":{"params":{"ef":100},"filter":"bookName=\"三国演义\"","retrieveVector":false,"limit":2,"documentIds":["0001","0002","0003","0004","0005"]},"readConsistency":"eventualConsistency"}
15:31:06.264 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/document/search, code=200, msg=OK, result={"code":0,"msg":"operation success","documents":[[{"id":"0003","score":0.999062,"bookName":"三国演义","author":"罗贯中","segment":"细作探知这个消息,飞报吕布。","page":23},{"id":"0004","score":0.997955,"bookName":"三国演义","page":24,"author":"罗贯中","segment":"富贵功名,前缘分定,为人切莫欺心。"}],[{"id":"0003","score":0.999773,"author":"罗贯中","segment":"细作探知这个消息,飞报吕布。","page":23,"bookName":"三国演义"},{"id":"0004","score":0.99912,"author":"罗贯中","bookName":"三国演义","segment":"富贵功名,前缘分定,为人切莫欺心。","page":24}],[{"id":"0003","score":1.0,"bookName":"三国演义","segment":"细作探知这个消息,飞报吕布。","author":"罗贯中","page":23},{"id":"0004","score":0.999787,"page":24,"bookName":"三国演义","segment":"富贵功名,前缘分定,为人切莫欺心。","author":"罗贯中"}],[{"id":"0004","score":1.0,"segment":"富贵功名,前缘分定,为人切莫欺心。","author":"罗贯中","page":24,"bookName":"三国演义"},{"id":"0005","score":0.9998,"bookName":"三国演义","page":25,"segment":"布大惊,与陈宫商议。宫曰:“闻刘玄德新领徐州,可往投之。","author":"罗贯中"}],[{"id":"0005","score":1.0,"author":"罗贯中","bookName":"三国演义","segment":"布大惊,与陈宫商议。宫曰:“闻刘玄德新领徐州,可往投之。","page":25},{"id":"0004","score":0.9998,"segment":"富贵功名,前缘分定,为人切莫欺心。","bookName":"三国演义","page":24,"author":"罗贯中"}]]}
	res: 0
	res: {"id":"0003","score":0.999062,"bookName":"三国演义","author":"罗贯中","segment":"细作探知这个消息,飞报吕布。","page":23}
	res: {"id":"0004","score":0.997955,"bookName":"三国演义","page":24,"author":"罗贯中","segment":"富贵功名,前缘分定,为人切莫欺心。"}
	res: 1
	res: {"id":"0003","score":0.999773,"author":"罗贯中","segment":"细作探知这个消息,飞报吕布。","page":23,"bookName":"三国演义"}
	res: {"id":"0004","score":0.99912,"author":"罗贯中","bookName":"三国演义","segment":"富贵功名,前缘分定,为人切莫欺心。","page":24}
	res: 2
	res: {"id":"0003","score":1.0,"bookName":"三国演义","segment":"细作探知这个消息,飞报吕布。","author":"罗贯中","page":23}
	res: {"id":"0004","score":0.999787,"page":24,"bookName":"三国演义","segment":"富贵功名,前缘分定,为人切莫欺心。","author":"罗贯中"}
	res: 3
	res: {"id":"0004","score":1.0,"segment":"富贵功名,前缘分定,为人切莫欺心。","author":"罗贯中","page":24,"bookName":"三国演义"}
	res: {"id":"0005","score":0.9998,"bookName":"三国演义","page":25,"segment":"布大惊,与陈宫商议。宫曰:“闻刘玄德新领徐州,可往投之。","author":"罗贯中"}
	res: 4
	res: {"id":"0005","score":1.0,"author":"罗贯中","bookName":"三国演义","segment":"布大惊,与陈宫商议。宫曰:“闻刘玄德新领徐州,可往投之。","page":25}
	res: {"id":"0004","score":0.9998,"segment":"富贵功名,前缘分定,为人切莫欺心。","bookName":"三国演义","page":24,"author":"罗贯中"}
---------------------- search ----------------------
15:31:06.277 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/document/search, body={"database":"book","collection":"book_segments","search":{"params":{"ef":100},"filter":"bookName=\"三国演义\"","retrieveVector":false,"limit":2,"vectors":[[0.2123,0.23,0.213]]},"readConsistency":"eventualConsistency"}
15:31:06.336 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/document/search, code=200, msg=OK, result={"code":0,"msg":"operation success","documents":[[{"id":"0003","score":1.0,"author":"罗贯中","page":23,"bookName":"三国演义","segment":"细作探知这个消息,飞报吕布。"},{"id":"0004","score":0.999787,"author":"罗贯中","bookName":"三国演义","segment":"富贵功名,前缘分定,为人切莫欺心。","page":24}]]}
	res: 0
	res: {"id":"0003","score":1.0,"author":"罗贯中","page":23,"bookName":"三国演义","segment":"细作探知这个消息,飞报吕布。"}
	res: {"id":"0004","score":0.999787,"author":"罗贯中","bookName":"三国演义","segment":"富贵功名,前缘分定,为人切莫欺心。","page":24}
15:31:06.337 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/collection/describe, body={"database":"book","collection":"book_segments"}
15:31:06.396 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/collection/describe, code=200, msg=OK, result={"code":0,"msg":"operation success","collection":{"database":"book","collection":"book_segments","documentCount":5,"replicaNum":0,"shardNum":2,"createTime":"2023-11-23 15:30:56","description":"test collection0","indexes":[{"fieldName":"vector","fieldType":"vector","indexType":"HNSW","indexedCount":5,"dimension":3,"metricType":"COSINE","params":{"M":16,"efConstruction":200}},{"fieldName":"id","fieldType":"string","indexType":"primaryKey"},{"fieldName":"author","fieldType":"string","indexType":"filter"},{"fieldName":"bookName","fieldType":"string","indexType":"filter"}],"indexStatus":{"status":"ready","startTime":""}}}
---------------------- update ----------------------
15:31:06.411 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/document/update, body={"database":"book","collection":"book_segments","query":{"filter":"bookName=\"三国演义\"","documentIds":["0001","0003"]},"update":{"page":100,"extend":"extendContent"}}
15:31:06.471 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/document/update, code=200, msg=OK, result={"code":0,"msg":"operation success","affectedCount":1}
---------------------- delete ----------------------
15:31:06.478 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/document/delete, body={"database":"book","collection":"book_segments","query":{"filter":"bookName=\"西游记\"","documentIds":["0001","0003"]}}
15:31:06.549 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/document/delete, code=200, msg=OK, result={"code":0,"msg":"operation success","affectedCount":1}
---------------------- rebuildIndex ----------------------
15:31:06.554 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/index/rebuild, body={"database":"book","collection":"book_segments","dropBeforeRebuild":false,"throttle":1}
15:31:06.627 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/index/rebuild, code=200, msg=OK, result={"code":0,"msg":"operation success"}
----------------------  query ----------------------
15:31:11.637 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/document/query, body={"database":"book","collection":"book_segments","readConsistency":"eventualConsistency","query":{"documentIds":["0001","0002","0003","0004","0005"],"retrieveVector":false,"limit":10,"offset":0,"outputFields":["id","bookName","page","extend"]}}
15:31:11.695 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/document/query, code=200, msg=OK, result={"code":0,"msg":"operation success","count":4,"documents":[{"id":"0002","bookName":"西游记","page":22},{"id":"0003","extend":"extendContent","bookName":"三国演义","page":100},{"id":"0004","page":24,"bookName":"三国演义"},{"id":"0005","page":25,"bookName":"三国演义"}]}
	res: {"id":"0002","bookName":"西游记","page":22}
	res: {"id":"0003","extend":"extendContent","bookName":"三国演义","page":100}
	res: {"id":"0004","page":24,"bookName":"三国演义"}
	res: {"id":"0005","page":25,"bookName":"三国演义"}
---------------------- truncate collection ----------------------
15:31:11.697 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/collection/truncate, body={"database":"book","collection":"book_segments"}
15:31:11.788 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/collection/truncate, code=200, msg=OK, result={"code":0,"msg":"operation success","affectedCount":1}
	res: AffectRes{affectedCount=1, code=0, msg='operation success'}
---------------------- truncate collection ----------------------
15:31:16.803 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/collection/drop, body={"database":"book","collection":"book_segments"}
15:31:16.874 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/collection/drop, code=200, msg=OK, result={"code":0,"msg":"operation success","affectedCount":1}
---------------------- truncate collection ----------------------
15:31:16.874 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/database/drop, body={"database":"book"}
15:31:16.935 [main] DEBUG com.tencent.tcvectordb.service.HttpStub - Query http://lb-baqexj1q-r8cmvkl9h0wsp52u.clb.ap-guangzhou.tencentclb.com:30000/database/drop, code=200, msg=OK, result={"code":0,"msg":"operation success","affectedCount":1}
---------------------- testFilter ----------------------
	res: author="jerry" and a=1 or r="or" or not rn=2 and not an="andNot"
	res: key in ("v1","v2","v3")
	res: key in (1,2,3)

相似度计算

相似性计算方法是向量检索的基础,用于衡量高维向量数据之间的相似度。在创建 Collection 时,需要依据数据特征,选择合适的相似性计算方法。下表展示了这些广泛使用的相似性计算方法如何与各种输入数据形式和腾讯云向量数据库(Tencent Cloud VectorDB)索引相匹配。

相似性计算方法

数据格式

索引类型

内积(IP)

浮点型

FLAT、HNSW、IVF 系列

欧式距离(L2)

余弦相似度(COSINE)

内积(IP)

全称为 Inner Product,内积也称点积,计算结果是一个数。它计算两个向量之间的点积(内积),其计算公式如下所示。其中,a = (a1, a2,..., an) 和 b = (b1, b2,..., bn) ,是 n 维空间中的两个点。计算所得值越大,越与搜索值相似。

添加描述

欧式距离(L2)

欧式距离(L2)全称为 Euclidean distance,指欧几里得距离。它计算两个向量点在空间中的直线距离。计算公式如下所示。其中,a = (a1, a2,..., an) 和 b = (b1, b2,..., bn) 是 n 维空间中的两个点。它是最常用的距离度量。计算所得的值越小,越与搜索值相似。L2在低维空间中表现良好,但是在高维空间中,由于维度灾难的影响,L2的效果会逐渐变差。

添加描述

余弦相似度(COSINE)

余弦相似度(Cosine Similarity)算法,是一种常用的文本相似度计算方法。它通过计算两个向量在多维空间中的夹角余弦值来衡量它们的相似程度。其计算公式如下所示。其中,a = (a1, a2,..., an) 和 b = (b1, b2,..., bn) 是 n 维空间中的两个点。|a|与|b|分别代表 a 和 b 归一化后的值。cosθ 代表 a 与 b 之间的余弦夹角。计算所得值越大,越与搜索值相似。取值范围为-1,1。

说明:

在向量归一化之后,内积与余弦相似度等价。余弦相似性只考虑向量夹角大小,而内积不仅考虑向量夹角大小,也考虑了向量的长度差。

更多细节操作请仔细阅读官方API

API文档 :https://cloud.tencent.com/document/product/1709/95110

我正在参与2023腾讯技术创作特训营第三期有奖征文,组队打卡瓜分大奖!

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
暂无评论
推荐阅读
编辑精选文章
换一批
Java开发者的Python快速实战指南:探索向量数据库之文本搜索
如果说Python是跟随我的步伐学习的话,我觉得我在日常开发方面已经没有太大的问题了。然而,由于我没有Python开发经验,我思考着应该写些什么内容。我回想起学习Java时的学习路线,直接操作数据库是其中一项重要内容,无论使用哪种编程语言,与数据库的交互都是不可避免的。然而,直接操作MySQL数据库似乎缺乏趣味性,毕竟每天都在写SQL语句。突然我想到了我之前写过的一系列私人知识库文章,于是我想到了向量数据库,毕竟这是当前非常热门的技术之一。
努力的小雨
2023/11/29
4600
Java开发者的Python快速实战指南:探索向量数据库之文本搜索
如果说Python是跟随我的步伐学习的话,我觉得我在日常开发方面已经没有太大的问题了。然而,由于我没有Python开发经验,我思考着应该写些什么内容。我回想起学习Java时的学习路线,直接操作数据库是其中一项重要内容,无论使用哪种编程语言,与数据库的交互都是不可避免的。然而,直接操作MySQL数据库似乎缺乏趣味性,毕竟每天都在写SQL语句。突然我想到了我之前写过的一系列私人知识库文章,于是我想到了向量数据库,毕竟这是当前非常热门的技术之一。
用户11255458
2024/08/24
900
SpringBoot JDBC/AOP
JDBC 工程结构: pom.xml <?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/PO
HUC思梦
2020/09/03
2490
SpringBoot JDBC/AOP
【腾讯云云上实验室】用向量数据库——实现高效文本检索功能
想必各位开发者一定使用过关系型数据库MySQL去存储我们的项目的数据,也有部分人使用过非关系型数据库Redis去存储我们的一些热点数据作为缓存,提高我们系统的响应速度,减小我们MySQL的压力。那么你有听说过向量数据库吗?知道向量数据库是用来做什么的吗?
熬夜磕代码
2023/11/25
9461
【腾讯云云上实验室】用向量数据库——实现高效文本检索功能
明了 | MongoDB 外键的基本使用
而在MongoDB中,表示表关系,使用的是嵌套,即,一个文档嵌套一个文档的方法,作为MongoDB的两个文档的关联,以及使用,reference link作为文档和文档之间的关联。
mySoul
2020/07/22
1.9K0
腾讯向量数据库——Embedding
Embedding 功能提供将非结构化数据转换为向量数据的能力,自动将原始文本转换为向量数据后插入数据库或进行相似性计算,更简单地使用向量数据库。
红目香薰
2023/11/19
1.3K0
腾讯向量数据库——Embedding
【腾讯云云上实验室】用向量数据库解决如何快速上线智能问答应用
腾讯云向量数据库专门存储和检索向量数据的服务提供给用户, 在高性能、高可用、大规模、低成本、简单易用、稳定可靠、智能运维等方面体现出显著优势,在免费的版本中虽然无法创建副本,但是能满足几乎所有的测试需求。当下腾讯云向量数据库官方给了一些应用示例,例如有:大规模知识库、推荐系统、问答系统、文本/图像检索。
红目香薰
2023/11/23
5132
【腾讯云云上实验室】用向量数据库解决如何快速上线智能问答应用
【腾讯云云上实验室】向量数据库+LangChain+LLM搭建智慧辅导系统实践
得益于深度学习的快速发展和数据规模的不断扩大,以GPT、混元、T5等为代表的大语言模型具备了前所未有的自然语言处理和生成能力,然而,在实际应用中,大语言模型的高效存储、检索和推理成为了一个新的挑战。
中杯可乐多加冰
2023/11/25
1.6K1
【腾讯云云上实验室】向量数据库与数据挖掘分析的黄金组合指南
前几天,和往常一样下班后回家打开电脑学一会,偶然机会看到了腾讯云刚发布的向量数据库体验活动,刚好最近手头的工作也忙完了,于是下意识也报名申请了一个体验名额。在体验使用的时候,也融入了一些对数据进行分析和挖掘的算法。
万物
2023/11/24
3422
【腾讯云云上实验室】向量数据库与数据挖掘分析的黄金组合指南
【腾讯云云上实验室】用向量数据库在金融信数据库分析中的实战运用
这篇文章将带领读者探索数据库的多样化解决方案及其演进历程,特别关注向量数据库的重要性和在实际项目中的应用。
指剑
2023/11/27
8741
【腾讯云云上实验室】用向量数据库在金融信数据库分析中的实战运用
【腾讯云云上实验室】从零开始搭建爬虫+向量数据库+LLM大模型构建企业私有化知识库
目前流行的中文开源大模型非ChatGLM(智普)、baichuan(百川)等莫属。虽然认知能力赶不上ChatGPT 3.5,但是它的开源吸引了广大的AI研究者。
刘秀君
2023/11/29
2.2K4
【腾讯云云上实验室】从零开始搭建爬虫+向量数据库+LLM大模型构建企业私有化知识库
【腾讯云云上实验室-向量数据库】Tencent Cloud VectorDB在实战项目中替换Milvus测试
亮点:Tencent Cloud VectorDB支持Embedding,免去自己搭建模型的负担(搭建一个生产环境的模型实在耗费精力和体力)。
用户8441651
2023/11/23
7051
【腾讯云云上实验室-向量数据库】——添加测试数据方法对照实验
测试数据来自:【腾讯云云上实验室-向量数据库】——测试数据集——Embedding——text数据(json序列化)-CSDN博客
红目香薰
2023/11/21
3480
【腾讯云云上实验室-向量数据库】——添加测试数据方法对照实验
数据库入门基础[通俗易懂]
首先文件保存数据有以下几个缺点: · 文件的安全性问题 · 文件不利于数据查询和管理 · 文件不利于存储海量数据 · 文件在程序中控制不方便
全栈程序员站长
2022/08/31
2.5K0
数据库入门基础[通俗易懂]
【腾讯云云上实验室】《手把手带你 5 分钟构建以图搜图系统》
向量在数学中是一个可以表示多个维度或特性的对象。在我们日常生活中,也可以用来描述一个物体的多个属性。
码农学习联盟
2023/11/24
7320
【腾讯云云上实验室】《手把手带你 5 分钟构建以图搜图系统》
【Java】实现图书管理系统
前言: 对于图书管理系统,小编进行了一次完整的模拟,小编将从我的思路尽量将过程呈现出来,希望能够帮到屏幕前的你。^ . ^
用户11288949
2024/09/24
1040
【Java】实现图书管理系统
从零开始学 Web 之 Ajax(五)同步异步请求,数据格式
同步请求:在用户进行请求发送之后,浏览器会一直等待服务器的数据返回,如果网络延迟比较高,浏览器就一直卡在当前界面,直到服务器返回数据才可进行其他操作。
Daotin
2018/08/31
8900
django学习-day06
###2.ORM模型 把行映射成类,把列映射成实例,把字段映射成方法 首先需要在setting中设置datebase数据库信息,然后在新建的app中的models中创建类,继承自models.Model,一定要在installe_apps中添加app,不然不会映射生效
kirin
2020/05/09
4030
【腾讯云云上实验室】探索向量数据库背后的安全监控机制
当今数字化时代,数据安全成为了企业和个人最为关注的重要议题之一。随着数据规模的不断增长和数据应用的广泛普及,如何保护数据的安全性和隐私性成为了迫切的需求。
亦世凡华、
2023/11/24
5101
【腾讯云云上实验室】探索向量数据库背后的安全监控机制
向量数据库Chroma极简教程
向量数据库其实最早在传统的人工智能和机器学习场景中就有所应用。在大模型兴起后,由于目前大模型的token数限制,很多开发者倾向于将数据量庞大的知识、新闻、文献、语料等先通过嵌入(embedding)算法转变为向量数据,然后存储在Chroma等向量数据库中。当用户在大模型输入问题后,将问题本身也embedding,转化为向量,在向量数据库中查找与之最匹配的相关知识,组成大模型的上下文,将其输入给大模型,最终返回大模型处理后的文本给用户,这种方式不仅降低大模型的计算量,提高响应速度,也降低成本,并避免了大模型的tokens限制,是一种简单高效的处理手段。此外,向量数据库还在大模型记忆存储等领域发挥其不可替代的作用。
Rude3Knife的公众号
2023/11/08
2.1K0
向量数据库Chroma极简教程
推荐阅读
相关推荐
Java开发者的Python快速实战指南:探索向量数据库之文本搜索
更多 >
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档
本文部分代码块支持一键运行,欢迎体验
本文部分代码块支持一键运行,欢迎体验