框架下载地址:https://github.com/OpenSPG/KAG
申请DeepSeek API Key
服务端安装
编写docker-compose.yml,内容如下
version: "3.7"
services:
server:
restart: always
image: spg-registry.cn-hangzhou.cr.aliyuncs.com/spg/openspg-server:latest
container_name: release-openspg-server
ports:
- "8887:8887"
depends_on:
- mysql
- neo4j
volumes:
- /etc/localtime:/etc/localtime:ro
- /home/ubuntu/Downloads/mysql:/var/lib/mysql
environment:
TZ: Asia/Shanghai
LANG: C.UTF-8
command: [
"java",
"-Dfile.encoding=UTF-8",
"-Xms2048m",
"-Xmx8192m",
"-jar",
"arks-sofaboot-0.0.1-SNAPSHOT-executable.jar",
'--server.repository.impl.jdbc.host=mysql',
'--server.repository.impl.jdbc.password=openspg',
'--builder.model.execute.num=5',
'--cloudext.graphstore.url=neo4j://release-openspg-neo4j:7687?user=neo4j&password=neo4j@openspg&database=neo4j',
'--cloudext.searchengine.url=neo4j://release-openspg-neo4j:7687?user=neo4j&password=neo4j@openspg&database=neo4j'
]
mysql:
restart: always
image: spg-registry.cn-hangzhou.cr.aliyuncs.com/spg/openspg-mysql:latest
container_name: release-openspg-mysql
volumes:
- /etc/localtime:/etc/localtime:ro
environment:
TZ: Asia/Shanghai
LANG: C.UTF-8
MYSQL_ROOT_PASSWORD: openspg
MYSQL_DATABASE: openspg
ports:
- "3306:3306"
command: [
'--character-set-server=utf8mb4',
'--collation-server=utf8mb4_general_ci'
]
neo4j:
image: spg-registry.cn-hangzhou.cr.aliyuncs.com/spg/openspg-neo4j:latest
container_name: release-openspg-neo4j
ports:
- "7474:7474"
- "7687:7687"
environment:
- TZ=Asia/Shanghai
- NEO4J_AUTH=neo4j/neo4j@openspg
- NEO4J_PLUGINS=["apoc"]
- NEO4J_server_memory_heap_initial__size=1G
- NEO4J_server_memory_heap_max__size=4G
- NEO4J_server_memory_pagecache_size=1G
- NEO4J_apoc_export_file_enabled=true
- NEO4J_apoc_import_file_enabled=true
- NEO4J_dbms_security_procedures_unrestricted=*
- NEO4J_dbms_security_procedures_allowlist=*
volumes:
- /etc/localtime:/etc/localtime:ro
- $HOME/dozerdb/logs:/logs
- /home/ubuntu/Downloads/neo4j:/data
在本地Downloads下需要创建mysql和neo4j两个文件夹。
执行
docker compose -f docker-compose.yml up -d
启动后查看
docker ps
访问界面
浏览器输入 http://127.0.0.1:8887, web界面,点击Chinese进入中文界面。
点击创建知识库
中文名称:我的知识库(根据你的需要填写)
英文名称:MyKAG0806
图存储配置:
{
"uri":"neo4j://release-openspg-neo4j:7687",
"user":"neo4j"
}
模型配置
{
"client_type": "maas",
"model": "deepseek-chat",
"base_url": "https://api.deepseek.com",
"api_key": "******"
}
向量配置
{
"vectorizer":"kag.common.vectorizer.OpenAIVectorizer",
"model":"BAAI/bge-m3",
"base_url":"https://api.siliconflow.cn/v1",
"api_key":"******"
}
提示词中英文配置
{
"biz_scene":"default",
"language":"zh"
}
创建好后,如下图所示
知识库的使用
点击知识库管理
点创建任务
知识任务:车辆使用
上传文件:test_pdf.pdf
点击下一步
分段最大长度,这里填200
点击下一步
抽取模型选择Default,点完成。
此时会进行知识抽取,需要一定的时间,可以查看抽取日志,完成后如下所示
结果查看
登录127.0.0.1:7474
选择数据库
查看所有节点
MATCH (n) RETURN n;
放大部分区域
推理问答
点击回到首页,点击知识库问答
进入知识库问答页面
输入我们要提出的问题
如果丢了钥匙该怎么办?
它会进行一系列的推理并回答问题
它会先在neo4j库中向量检索,并最终给出回答
问题:如果丢了钥匙该怎么办?
答案:如果丢了钥匙,应采取以下步骤: 1. **停用丢失的钥匙**:首先应到具有资质的专业服务中心停用丢失的钥匙,以防止未经授
权的使用。 2. **通知保险公司**:立即将钥匙丢失的情况通知车辆保险公司,以便在必要时获得相应的保险支持。 3. **考虑更换锁
止装置**:如果担心安全问题,可以考虑更换锁止装置,以确保车辆或财产的安全。 4. **检查钥匙电池**:检查钥匙电池的状态,如
果电池电量不足,应及时更换。 5. **使用紧急钥匙单元**:在紧急情况下,可以使用紧急钥匙单元来锁止或解锁车辆。 6. **全面检
查钥匙**:建议到专业服务中心对钥匙进行全面检查,以确认是否存在其他潜在问题。 **理由**:这些步骤基于引用信息中的建议,旨
在有效应对钥匙丢失带来的不便和风险,确保车辆或财产的安全,并在必要时获得保险支持。
conda create -n kag python=3.10
conda activate kag
cd KAG-master
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install -e .
编辑主目录下kag/examples/example.cfg文件,内容如下
[project]
namespace = KagDemo
host_addr = http://localhost:8887
# vectorizer loaded by ollma
[vectorizer]
vectorizer = kag.common.vectorizer.OpenAIVectorizer
model = bge-m3
api_key = ******
base_url = https://api.siliconflow.cn/v1
vector_dimensions = 1024
[llm]
client_type = maas
base_url = https://api.deepseek.com
api_key = ******
model = deepseek-chat
[prompt]
biz_scene = default
language = zh
[log]
level = INFO
继续执行
cd kag/examples
knext project create --config_path ./example.cfg
这里需要注意的是,我们可以更换example.cfg中的namespace来任意创建新项目
[project]
namespace = KagDemo
我这里创建的是
[project]
namespace = KagCar
knext命令执行完后,如下图所示
整个项目结构如下图所示
使用事务要注意什么?
如何建立SQL索引
编写代码时有什么安全问题需要注意?
我们在编写代码时,一行代码能写多长?
能给我写个例子出来么?
LightRAG在面对一篇文档做了3个处理
做完这三步之后就是构建知识图谱,对于实体来说包括实体的名称、类型、实体的描述以及实体来源(哪篇文档的文本块)。对于关系来说,包含头实体、尾实体、描述和来源以及提取出的关系关键词。对于问答,会提取出两种关键词——low level,high level。low level通常包括问题中的实体,high level会提取一些主题、概念等更为广泛的关键词。然后去图数据库、关系数据库、向量数据库去检索出相关的内容。最后将这部分内容构造成提示词提交给大模型。
下载地址:https://github.com/HKUDS/LightRAG
环境安装
下载完成后解压,进入lightrag主目录
conda create -n lightrag python=3.10
conda activate lightrag
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install -e .
这里暂时不牵扯任何数据库的安装,后续再讨论这里
使用Ollama启动Qwen2.5:14b和bge-m3(embedding模型)
conda activate ollama
ollama run qwen2.5:14b
ollama pull bge-m3
helloworld
进入examples文件夹,新建lightrag_ollama_demo_test.py,内容如下
import asyncio
import os
import inspect
import logging
from lightrag import LightRAG, QueryParam
from lightrag.llm import ollama_model_complete, ollama_embed
from lightrag.utils import EmbeddingFunc
WORKING_DIR = "./dickens"
logging.basicConfig(format="%(levelname)s:%(message)s", level=logging.INFO)
if not os.path.exists(WORKING_DIR):
os.mkdir(WORKING_DIR)
rag = LightRAG(
working_dir=WORKING_DIR,
llm_model_func=ollama_model_complete,
llm_model_name="qwen2.5:14b",
llm_model_max_async=4,
llm_model_max_token_size=32768,
llm_model_kwargs={"host": "http://127.0.0.1:11434", "options": {"num_ctx": 32768}},
embedding_func=EmbeddingFunc(
embedding_dim=1024,
max_token_size=8192,
func=lambda texts: ollama_embed(
texts=texts, embed_model="bge-m3:latest", host="http://127.0.0.1:11434"
),
),
)
with open("./book.txt", "r", encoding="utf-8") as f:
rag.insert(f.read())
# Perform naive search
print(
rag.query("What are the top themes in this story?", param=QueryParam(mode="naive"))
)
# Perform local search
print(
rag.query("What are the top themes in this story?", param=QueryParam(mode="local"))
)
# Perform global search
print(
rag.query("What are the top themes in this story?", param=QueryParam(mode="global"))
)
# Perform hybrid search
print(
rag.query("What are the top themes in this story?", param=QueryParam(mode="hybrid"))
)
# stream response
resp = rag.query(
"What are the top themes in this story?",
param=QueryParam(mode="hybrid", stream=True),
)
async def print_stream(stream):
async for chunk in stream:
print(chunk, end="", flush=True)
if inspect.isasyncgen(resp):
asyncio.run(print_stream(resp))
else:
print(resp)
这里有4种查询方式
这里我们问的问题是
What are the top themes in this story?(这个故事的主题是什么?)
运行结果
第一种方式检索出来的
The document excerpts primarily revolve around several key themes that permeate Charles Dickens' "A Christmas Carol," focusing on transformation, compassion, and social critique:
1. **Transformation and Redemption**: Ebenezer Scrooge undergoes a significant personal transformation through his encounters with the Ghosts of Christmas Past, Present, and Yet to Come. These supernatural visitations reveal past regrets, present realities, and potential future consequences, prompting Scrooge's moral awakening and redemption.
2. **Compassion and Community**: The story underscores the importance of empathy and community spirit. Scrooge’s nephew, Fred, embodies a sense of goodwill and compassion towards others, which contrasts sharply with Scrooge's initial isolationism. This theme encourages readers to consider their own relationships and responsibilities within society.
3. **Social Critique**: Dickens uses Scrooge's character to critique the societal ills of his time, particularly the neglect and exploitation of the poor and marginalized. The scenes involving Old Joe and other characters highlight the broader context of poverty and social inequality that Dickens sought to address through his narrative.
4. **Materialism vs. Spirituality**: There is a strong theme contrasting material wealth with spiritual richness. Scrooge's initial disdain for Christmas reflects an overemphasis on material success and self-interest at the expense of deeper human connections and moral values. His journey towards enlightenment involves rediscovering the true meaning of life beyond mere accumulation of goods.
5. **Memory and Legacy**: The Ghost of Christmas Past reminds Scrooge of his own childhood and early adulthood, revealing how past choices have shaped his current character. This theme emphasizes the importance of one's actions in determining their legacy and future.
6. **Hope and Change**: Throughout the novella, there is an undercurrent of hope for personal change and societal improvement. Even as Scrooge faces grim futures, he also experiences glimpses of joy and potential redemption, encouraging a belief that it is never too late to alter one’s course.
These themes collectively weave together to deliver a powerful message about the redemptive power of compassion, the importance of community, and the broader critique of societal values.
第二种方式检索出来的
### Top Themes in "A Christmas Carol"
1. **Redemption and Transformation**
- Ebenezer Scrooge undergoes a profound moral transformation after encountering Jacob Marley's ghost and the three spirits (Ghost of Christmas Past, Present, and Yet to Come). These encounters force him to reflect on his past actions and their consequences, leading to a significant change in his character as he becomes more generous and compassionate.
2. **Family and Social Relationships**
- Scrooge’s relationships with family members such as Fred, Belle, and Fan are integral to the story. The interactions between characters highlight themes of familial bonds and social connections. For example, Fred's persistent efforts to reconcile with his uncle illustrate the importance of maintaining positive family ties despite differences.
3. **Reflection on Past Actions**
- A major theme involves Scrooge’s examination of his past actions through visits from the Ghost of Christmas Past. This reflection underscores how one’s past choices shape their present character and future destiny, emphasizing the idea that personal development is rooted in understanding history.
4. **Poverty and Social Inequality**
- The story highlights stark contrasts between wealth and poverty, particularly through Bob Cratchit's family and Tiny Tim. These portrayals reflect societal issues of inequality and the impact of economic hardship on individuals and families.
5. **The Meaning of Christmas**
- Scrooge’s initial disdain for Christmas transforms into a deep appreciation for its meaning as a time of joy, charity, and goodwill towards others. This theme challenges readers to reconsider the true spirit of the holiday season beyond commercialism and tradition.
6. **Moral Responsibility**
- The ghosts guide Scrooge through various visions to underscore his moral responsibilities both past and present. These include his duty to care for Tiny Tim’s well-being, reflecting a broader societal message about personal accountability in addressing social issues like poverty and neglect.
7. **Imagination and Childhood Memories**
- Recollections of childhood Christmases bring warmth and nostalgia, exemplified by memories involving Ali Baba and other imaginative tales. These elements highlight the importance of retaining childlike wonder and imagination throughout life.
8. **Ghostly Guidance as Spiritual Journey**
- The supernatural entities that visit Scrooge serve not just to show him visions but also to guide him spiritually towards self-improvement and redemption, illustrating a transformative spiritual journey.
9. **Impact of Personal Choices on Others**
- Throughout the story, Scrooge is made to confront how his past decisions have affected others negatively, leading to an awareness that one’s actions can ripple through society, influencing many lives beyond oneself.
第三种方式检索出来的
These themes intertwine throughout "A Christmas Carol," creating a rich narrative that explores personal transformation and broader societal concerns.
### Top Themes in "A Christmas Carol"
1. **Redemption and Moral Transformation**
- A central theme is Scrooge's journey from a miserly, cold-hearted businessman to a more compassionate and generous individual after encountering the spirits who show him visions of his past, present, and future.
- The Ghost of Christmas Past guides Scrooge through childhood memories that evoke deep emotional responses, prompting introspection and reflection on his life choices. This theme is supported by relationships such as those between Jacob Marley's ghost and Scrooge (ID 3), the Ghost of Christmas Past and Scrooge (IDs 0 and 28), and the Ghost of Christmas Present showing Tiny Tim's future survival to Scrooge (ID 4).
2. **Family Dynamics**
- The story emphasizes the warmth, joy, and importance of family gatherings during the holiday season. This is exemplified by Fred inviting his uncle Scrooge to dinner on Christmas Day.
- Family members engage in discussions about their relationships with Scrooge and express both criticism and hope for change. For instance, Bob Cratchit expresses conflicted feelings towards his employer while maintaining decorum at the Christmas dinner.
3. **Social Commentary**
- The narrative highlights social issues such as poverty (Ignorance and Want), societal indifference, and the harsh realities faced by those less fortunate.
- Scrooge’s encounters with various spirits during Christmas Eve provide insights into societal problems, emphasizing themes of neglect, exploitation, and human suffering.
4. **Nostalgia and Childhood Memories**
- Recollections of past Christmases evoke a sense of nostalgia and childhood wonder, contrasting Scrooge's current state with his happier youth.
- Nostalgic scenes involve characters like Ali Baba and the Sultan’s Groom, reflecting on imaginative tales from Scrooge’s childhood.
5. **Spiritual Guidance**
- The appearance of supernatural beings (the spirits) guide Scrooge through visions meant to help him understand the consequences of his actions and inspire moral transformation.
- These guides symbolize different aspects of time—past, present, and future—and each plays a crucial role in Scrooge’s spiritual journey.
6. **Philosophical Disagreements**
- The story highlights ideological conflicts between characters, such as the difference in perspectives on Christmas and its significance held by Fred and his uncle.
- These disagreements underscore the importance of personal beliefs and values in shaping one's behavior and societal interactions.
### Summary
The top themes in "A Christmas Carol" revolve around Scrooge’s moral transformation, family dynamics, social commentary, nostalgia, spiritual guidance, and philosophical disagreements. Each theme is intricately woven into the narrative to emphasize the broader message about the importance of compassion, empathy, and personal responsibility in society.
{
"high_level_keywords": ["Story themes", "Narrative analysis"],
"low_level_keywords": []
}
第四种方式检索出来的
Top Themes in "A Christmas Carol"
### 1. Redemption and Transformation
**Summary:** Ebenezer Scrooge undergoes a profound moral transformation through his encounters with three spirits—Christmas Past, Present, and Future—who show him scenes from his past, present, and potential future. This journey prompts Scrooge to reflect on his life choices and the consequences of his actions, leading to a significant change in his character.
**Supporting Details:**
- **Past Reflections:** The Ghost of Christmas Past guides Scrooge through memories that evoke deep emotions and help him understand how he became the person he is.
- **Current Realizations:** The Ghost of Christmas Present shows Scrooge the current state of society, emphasizing poverty and social issues like Ignorance and Want.
- **Future Consequences:** The Ghost of Christmas Yet to Come reveals potential future events, including dire predictions about his own fate unless he changes his ways.
### 2. Social Responsibility
**Summary:** Through various visions, Scrooge comes to realize the impact of his actions on society, especially regarding poverty and neglect. He learns that individuals have a responsibility to care for others less fortunate.
**Supporting Details:**
- **Poverty and Want:** The spirits illustrate the struggles faced by poor families like Bob Cratchit's, highlighting Tiny Tim’s precarious health.
- **Societal Issues:** Scrooge is shown symbols of social issues such as Ignorance (a boy) and Want (a girl), representing broader societal problems.
### 3. Family Dynamics
**Summary:** The story explores the importance of family bonds, particularly in contrast to Scrooge's isolation. His relationships with his nephew Fred, Tiny Tim, and sister Fan play a crucial role in his transformation.
**Supporting Details:**
- **Nephew Fred’s Influence:** Fred invites Scrooge to Christmas dinner, illustrating familial warmth despite their differences.
- **Tiny Tim’s Impact:** Scrooge is moved by the Ghost of Christmas Present's discussion about Tiny Tim’s future and decides to be a father figure for him.
- **Sister Fan’s Legacy:** The memory of his sister Fan reminds Scrooge of childhood connections that could have influenced his character positively.
### 4. Reflection on Past Actions
**Summary:** A significant theme is the reflection on past choices and their consequences, as illustrated through visions of events from Scrooge's youth and early adulthood.
**Supporting Details:**
- **Ghost of Christmas Past:** Shows scenes from Scrooges’ childhood that include his sister Fan, highlighting missed opportunities for kindness.
- **Romantic Regrets:** The spirit reveals past relationships that have had lasting impacts on Scrooge’s psyche, such as a relationship with Belle.
### 5. Isolation vs Community
**Summary:** There is a clear contrast drawn between Scrooge's isolation and the warmth of community during Christmas celebrations. This duality serves to highlight the importance of social connections.
**Supporting Details:**
- **Isolated Life:** Initially portrayed as solitary and indifferent, Scrooge’s life contrasts sharply with scenes of family gatherings.
- **Community Celebrations:** Scenes depicting joyful Christmas gatherings emphasize the spirit of unity and charity contrasted against his usual demeanor.
These themes are integral to the narrative and underscore the story's message about personal accountability and societal responsibility.
现在我们来看一下从文档中提取出来的实体节点和关系边。
节点,包含节点的名称key=d0,描述key=d1,文档切块的编号key=d2
<?xml version='1.0' encoding='utf-8'?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key id="d6" for="edge" attr.name="source_id" attr.type="string" />
<key id="d5" for="edge" attr.name="keywords" attr.type="string" />
<key id="d4" for="edge" attr.name="description" attr.type="string" />
<key id="d3" for="edge" attr.name="weight" attr.type="double" />
<key id="d2" for="node" attr.name="source_id" attr.type="string" />
<key id="d1" for="node" attr.name="description" attr.type="string" />
<key id="d0" for="node" attr.name="entity_type" attr.type="string" />
<graph edgedefault="undirected">
<node id=""CHILDREN"">
<data key="d0">"CATEGORY"</data>
<data key="d1">"A category of people, specifically a group of children causing chaos and joy in the scene described."</data>
<data key="d2">chunk-775fc5369b9cca918713b31650cbc33e</data>
</node>
<node id=""THE GENTLEMAN"">
<data key="d0">"PERSON"</data>
<data key="d1">"A gentleman who approaches Scrooge to solicit donations for the poor, highlighting societal issues of poverty and inequality."</data>
<data key="d2">chunk-117db22af734e3ab9fb5ffd55214df54</data>
</node>
<node id=""MR. SCROOGE"">
<data key="d0">"PERSON"</data>
<data key="d1">"Mr. Scrooge is portrayed as an odious, stingy, and unfeeling man who is the subject of criticism by Bob Cratchit's family during their Christmas gathering."</data>
<data key="d2">chunk-a7f6570ee3d412b524f8e1b7b6c7381e</data>
</node>
<node id=""SPIRITS OF PAST, PRESENT, FUTURE"">
<data key="d0">"CATEGORY"</data>
<data key="d1">"The Spirits represent different aspects of time (past, present, future) guiding Scrooge towards self-reflection and transformation."</data>
<data key="d2">chunk-18a6526256779365eaca106e86bd46c6</data>
</node>
<node id=""POOR LAW"">
<data key="d0">"CATEGORY"</data>
<data key="d1">"The Poor Law refers to legal systems that provided relief to the poor through workhouses and other means."</data>
<data key="d2">chunk-117db22af734e3ab9fb5ffd55214df54</data>
</node>
<node id=""PLUMP SISTER"">
<data key="d0">"PERSON"</data>
<data key="d1">"A member of the family mentioned in the text who participates in guessing what Scrooge is like and contributes to the familial atmosphere."<SEP>"The plump sister, referred to by someone watching her with interest, wears a lace tucker and is involved in a conversation about a man lying somewhere."</data>
<data key="d2">chunk-deafc911d18186c59f224e0cfb1bcfbc<SEP>chunk-406765ed969000b436eeeb7c19fd17a2</data>
</node>
<node id=""GRUEL"">
<data key="d0">"CATEGORY"</data>
<data key="d1">"A type of porridge or thick soup that was mentioned as part of the setting, reflecting Scrooge's previous lifestyle."</data>
<data key="d2">chunk-18a6526256779365eaca106e86bd46c6</data>
</node>
<node id=""CHURCHES"">
<data key="d0">"ORGANIZATION"</data>
<data key="d1">"Churches are institutions where bells ring out on Christmas Day to celebrate the holiday."</data>
<data key="d2">chunk-18a6526256779365eaca106e86bd46c6</data>
</node>
<node id=""QUEENS OF SHEBA"">
<data key="d0">"CATEGORY"</data>
<data key="d1">"Illustrations showing Queens of Sheba on the Dutch tiles in Scrooge’s fireplace."</data>
<data key="d2">chunk-c539fea32b6d59c4cd2a5786a0af8db6</data>
</node>
<node id=""CRATCHIT'S HOUSE"">
<data key="d0">"LOCATION"</data>
<data key="d1">"Cratchit's house is where Bob Cratchit's family lives, a place of quiet reflection after news of financial relief arrives due to their creditor's death."</data>
<data key="d2">chunk-9d2eba099f63b6246b7a6dc5b2bfaf58</data>
</node>
<node id=""CHRISTMAS DAY"">
<data key="d0">"EVENT"</data>
<data key="d1">"Christmas Day marks a turning point for Scrooge after his encounters with the Spirits, symbolizing renewal and redemption."<SEP>"The text describes activities and interactions happening on Christmas Day as Scrooge transforms his demeanor."</data>
<data key="d2">chunk-18a6526256779365eaca106e86bd46c6<SEP>chunk-61797aea02cf85b58a1086194fc64694</data>
</node>
<node id=""BOY IN SUNDAY CLOTHES"">
<data key="d0">"PERSON"</data>
<data key="d1">"A young boy encountered by Scrooge who helps him buy a turkey and celebrates Christmas with joy."</data>
<data key="d2">chunk-18a6526256779365eaca106e86bd46c6</data>
</node>
<node id=""ANGELIC MESSENGERS"">
<data key="d0">"CATEGORY"</data>
<data key="d1">"Illustrations of angels descending through clouds like feather-beds on the Dutch tiles in Scrooge's fireplace."</data>
<data key="d2">chunk-c539fea32b6d59c4cd2a5786a0af8db6</data>
</node>
<node id=""CHRISTMAS EVE"">
<data key="d0">"EVENT"</data>
<data key="d1">"A night in which the story takes place, when supernatural occurrences happen and characters make critical decisions or experiences."<SEP>"Christmas Eve is the evening or entire day before Christmas Day, significant as it marks a time when Scrooge encounters the ghost."<SEP>"Christmas Eve marks the setting where Scrooge is depicted as indifferent to the festive spirit."<SEP>"The celebration and festivities associated with Christmas are referenced in the text, particularly how Bob Cratchit celebrates by sliding down Cornhill twenty times."</data>
<data key="d2">chunk-c539fea32b6d59c4cd2a5786a0af8db6<SEP>chunk-3fe36cf9bd25e7ca80b50f7929cc6bb0<SEP>chunk-b150e1523f6ced4362a8b8ce00e301ab<SEP>chunk-57a9276b182b8ac3b909d7020e5803cd</data>
</node>
<node id=""POULTERER'S SHOP"">
<data key="d0">"LOCATION"</data>
<data key="d1">"The location where the prize turkey is sold and purchased by Scrooge for Bob Cratchit."</data>
<data key="d2">chunk-18a6526256779365eaca106e86bd46c6</data>
</node>
<node id=""PARKS"">
<data key="d0">"GEO"</data>
<data key="d1">"Parks is mentioned as the location where people were attired fashionably during the holidays."</data>
<data key="d2">chunk-98e717d39891a61eadbbe096034bcc6c</data>
</node>
<node id=""GHOST (JACOB MARLEY)"">
<data key="d0">"CONCEPT"</data>
<data key="d1">"The Ghost is Jacob Marley's apparition that comes to warn Scrooge about his life choices and their consequences in the afterlife."</data>
<data key="d2">chunk-3fe36cf9bd25e7ca80b50f7929cc6bb0</data>
</node>
边,包括边的描述(key=d4),边的头节点和尾节点(key=d5)以及边所在的文本快(key=d6)
<edge source=""CHILDREN"" target=""PORTERS"">
<data key="d3">8.0</data>
<data key="d4">"Porters are the recipients of enthusiastic and affectionate treatment from children eager to take their Christmas gifts."</data>
<data key="d5">"childhood joy, festive interaction"</data>
<data key="d6">chunk-da2472b4ef2a535b62908f14d0fb0ca9</data>
</edge>
<edge source=""THE GENTLEMAN"" target=""SCROOGE"">
<data key="d3">18.0</data>
<data key="d4">"Scrooge interacts with the gentleman but refuses to donate or participate in charitable actions for the poor."</data>
<data key="d5">"refusal, charity"</data>
<data key="d6">chunk-117db22af734e3ab9fb5ffd55214df54</data>
</edge>
<edge source=""MR. SCROOGE"" target=""CRATCHIT FAMILY"">
<data key="d3">7.0</data>
<data key="d4">"The Cratchit family is economically dependent on Mr. Scrooge but expresses resentment towards him during their Christmas gathering."</data>
<data key="d5">"economic dependency, social tension"</data>
<data key="d6">chunk-a7f6570ee3d412b524f8e1b7b6c7381e</data>
</edge>
<edge source=""MR. SCROOGE"" target=""MRS. CRATCHIT"">
<data key="d3">7.0</data>
<data key="d4">"Mrs. Cratchit expresses strong disapproval towards her husband's employer during the family gathering."</data>
<data key="d5">"resentment, social dynamics"</data>
<data key="d6">chunk-a7f6570ee3d412b524f8e1b7b6c7381e</data>
</edge>
<edge source=""MR. SCROOGE"" target=""BOB CRATCHIT"">
<data key="d3">6.0</data>
<data key="d4">"Bob works for Mr. Scrooge and speaks about him in a mixture of criticism and acceptance during their Christmas celebration."</data>
<data key="d5">"employment, conflict"</data>
<data key="d6">chunk-a7f6570ee3d412b524f8e1b7b6c7381e</data>
</edge>
<edge source=""POOR LAW"" target=""TREADMILL"">
<data key="d3">24.0</data>
<data key="d4">"The treadmill is part of the Poor Law's enforcement mechanism, symbolizing labor requirements for relief recipients."</data>
<data key="d5">"enforcement, work requirement"</data>
<data key="d6">chunk-117db22af734e3ab9fb5ffd55214df54</data>
</edge>
<edge source=""PLUMP SISTER"" target=""JOE"">
<data key="d3">5.0</data>
<data key="d4">"Joe discusses the removal of bed-curtains from a room where a person lies with the plump sister."</data>
<data key="d5">"conflict resolution, communication breakdown"</data>
<data key="d6">chunk-deafc911d18186c59f224e0cfb1bcfbc</data>
</edge>
<edge source=""PLUMP SISTER"" target=""FRED"">
<data key="d3">7.0</data>
<data key="d4">"Both participate in guessing what Scrooge is like during a family gathering."</data>
<data key="d5">"familial interaction, humor"</data>
<data key="d6">chunk-406765ed969000b436eeeb7c19fd17a2</data>
</edge>
<edge source=""CHURCHES"" target=""BOY IN SUNDAY CLOTHES"">
<data key="d3">6.0</data>
<data key="d4">"The boy is dressed for church on Christmas Day, reflecting religious celebration traditions."</data>
<data key="d5">"religious observance, festive attire"</data>
<data key="d6">chunk-18a6526256779365eaca106e86bd46c6</data>
</edge>
<edge source=""CHURCHES"" target=""SCROOGE"">
<data key="d3">7.0</data>
<data key="d4">"Scrooge hears the bells ringing out and feels renewed spirit after his encounters with the Spirits."</data>
<data key="d5">"celebratory atmosphere, spiritual awakening"</data>
<data key="d6">chunk-18a6526256779365eaca106e86bd46c6</data>
</edge>
<edge source=""CRATCHIT'S HOUSE"" target=""CHRISTMAS EVE DEATH"">
<data key="d3">7.0</data>
<data key="d4">"The news that reaches Cratchit's house about the death of their creditor leads to improved financial stability and emotional relief for its inhabitants."</data>
<data key="d5">"home impact, family well-being"</data>
<data key="d6">chunk-9d2eba099f63b6246b7a6dc5b2bfaf58</data>
</edge>
<edge source=""CHRISTMAS DAY"" target=""SCROOGE"">
<data key="d3">7.0</data>
<data key="d4">"Scrooge actively participates in Christmas celebrations on this day, showing a significant shift from his usual demeanor."</data>
<data key="d5">"celebration, transformation"</data>
<data key="d6">chunk-61797aea02cf85b58a1086194fc64694</data>
</edge>
<edge source=""BOY IN SUNDAY CLOTHES"" target=""SCROOGE"">
<data key="d3">8.0</data>
<data key="d4">"Scrooge sends the boy to purchase a turkey for Bob Cratchit as an act of generosity."</data>
<data key="d5">"generosity, festive spirit"</data>
<data key="d6">chunk-18a6526256779365eaca106e86bd46c6</data>
</edge>
<edge source=""CHRISTMAS EVE"" target=""BOB CRATCHIT"">
<data key="d3">8.0</data>
<data key="d4">"Bob Cratchit celebrates Christmas Eve by sliding down Cornhill twenty times in honor of the holiday."</data>
<data key="d5">"celebratory activity, seasonal tradition"</data>
<data key="d6">chunk-b150e1523f6ced4362a8b8ce00e301ab</data>
</edge>
<edge source=""CHRISTMAS EVE"" target=""SCROOGE"">
<data key="d3">15.0</data>
<data key="d4">"On Christmas Eve, Scrooge embodies his cold and indifferent nature towards the festive season."<SEP>"On Christmas Eve, Scrooge's indifference towards the festive season becomes particularly evident in his behavior and attitudes."</data>
<data key="d5">"attitude, setting"<SEP>"attitude, time of year"</data>
<data key="d6">chunk-57a9276b182b8ac3b909d7020e5803cd</data>
</edge>
<edge source=""CHRISTMAS EVE"" target=""GHOST (JACOB MARLEY)"">
<data key="d3">14.0</data>
<data key="d4">"The Ghost visits Scrooge specifically on Christmas Eve to warn him of the consequences of his actions."</data>
<data key="d5">"spiritual intervention, seasonal significance"</data>
<data key="d6">chunk-3fe36cf9bd25e7ca80b50f7929cc6bb0</data>
</edge>
<edge source=""CHRISTMAS EVE"" target=""SCROOGE'S COUNTING-HOUSE"">
<data key="d3">7.0</data>
<data key="d4">"The setting of Scrooge’s counting-house on Christmas Eve emphasizes his detachment from the festive spirit prevalent outside."</data>
<data key="d5">"workplace dynamics, festive atmosphere contrast"</data>
<data key="d6">chunk-57a9276b182b8ac3b909d7020e5803cd</data>
</edge>
<edge source=""POULTERER'S SHOP"" target=""SCROOGE"">
<data key="d3">7.0</data>
<data key="d4">"Scrooge buys a large prize turkey from the poulterer’s shop and has it sent to Bob Cratchit's home as a gift."</data>
<data key="d5">"gift giving, transformation"</data>
<data key="d6">chunk-18a6526256779365eaca106e86bd46c6</data>
</edge>
<edge source=""PARKS"" target=""CRATCHITS FAMILY"">
<data key="d3">6.0</data>
<data key="d4">"The Cratchit family is associated with dressing fashionably for Christmas celebrations at The Parks."</data>
<data key="d5">"celebratory atmosphere, social gathering"</data>
<data key="d6">chunk-98e717d39891a61eadbbe096034bcc6c</data>
</edge>
<edge source=""GHOST (JACOB MARLEY)"" target=""CHAIN (OF FATE)"">
<data key="d3">9.0</data>
<data key="d4">"The Ghost explains that the chain it wears is a result of its own choices made during life and now represents its eternal suffering."</data>
<data key="d5">"consequences, guilt"</data>
<data key="d6">chunk-3fe36cf9bd25e7ca80b50f7929cc6bb0</data>
</edge>
<edge source=""GHOST (JACOB MARLEY)"" target=""SCROOGE"">
<data key="d3">16.0</data>
<data key="d4">"Scrooge interacts with Jacob Marley's ghost, who warns him about his future based on his past actions."</data>
<data key="d5">"warning, redemption"</data>
<data key="d6">chunk-3fe36cf9bd25e7ca80b50f7929cc6bb0</data>
</edge>
<edge source=""MR. SCROOGE'S NEPHEW"" target=""CRATCHIT FAMILY"">
<data key="d3">18.0</data>
<data key="d4">"Mr. Scrooge's nephew showed extraordinary kindness towards the Cratchit family, offering his support and expressing sympathy for their situation."</data>
<data key="d5">"kindness, support"</data>
<data key="d6">chunk-806ac9ed3665e746b3d907a3e08e3cde</data>
</edge>
<edge source=""CRATCHITS FAMILY"" target=""BAKER'S SHOP"">
<data key="d3">7.0</data>
<data key="d4">"The Cratchit children become excited upon smelling the cooking goose from the Baker's shop, indicating a connection to local festivities."</data>
<data key="d5">"local celebration, anticipation"</data>
<data key="d6">chunk-98e717d39891a61eadbbe096034bcc6c</data>
</edge>
<edge source=""CRATCHITS FAMILY"" target=""MRS. CRATCHIT"">
<data key="d3">18.0</data>
<data key="d4">"Mrs. Cratchit is actively involved in welcoming her children back home and preparing for their family dinner."</data>
<data key="d5">"maternal care, family unity"</data>
<data key="d6">chunk-98e717d39891a61eadbbe096034bcc6c</data>
</edge>
<edge source=""CRATCHITS FAMILY"" target=""MARTHA CRATCHIT"">
<data key="d3">16.0</data>
<data key="d4">"Martha participates enthusiastically in the Christmas activities, showing a strong sense of familial bond."</data>
<data key="d5">"family engagement, celebration"</data>
<data key="d6">chunk-98e717d39891a61eadbbe096034bcc6c</data>
</edge>
<edge source=""WORLDLY MIND"" target=""SCROOGE"">
<data key="d3">6.0</data>
<data key="d4">"Scrooge embodies a worldly mind, focusing on business and material wealth, which contrasts with the spiritual guidance he receives from the Ghosts."</data>
<data key="d5">"materialism vs spirituality"</data>
<data key="d6">chunk-3fe36cf9bd25e7ca80b50f7929cc6bb0</data>
</edge>
<edge source=""PETER"" target=""MRS. CRATCHIT"">
<data key="d3">7.0</data>
<data key="d4">"Peter participates in discussions with his mother about the impact of Scrooge’s nephew's kindness on their lives."</data>
<data key="d5">"family discussion, support"</data>
<data key="d6">chunk-806ac9ed3665e746b3d907a3e08e3cde</data>
</edge>
<edge source=""BUSINESSMEN (GENERIC)"" target=""SCROOGE"">
<data key="d3">5.0</data>
<data key="d4">"Businessmen discuss Scrooge's death and speculate about his wealth distribution, reflecting on their lack of empathy for him."</data>
<data key="d5">"society, indifference"</data>
<data key="d6">chunk-8c866d033a3bead10c3ae756a08c9918</data>
</edge>
<edge source=""SPIRIT, GHOST"" target=""SCROOGE"">
<data key="d3">9.0</data>
<data key="d4">"The Spirit guides Scrooge through various scenes during Christmas Eve to teach him moral lessons about humanity and empathy."</data>
<data key="d5">"transformation, guidance, moral lesson"</data>
<data key="d6">chunk-406765ed969000b436eeeb7c19fd17a2</data>
</edge>
<edge source=""SPIRIT, GHOST"" target=""CHILDREN'S TWELFTH-NIGHT PARTY"">
<data key="d3">8.0</data>
<data key="d4">"The Spirit is present at a children's gathering during the Twelfth Night celebration."</data>
<data key="d5">"supernatural presence, moral lesson"</data>
<data key="d6">chunk-406765ed969000b436eeeb7c19fd17a2</data>
</edge>
原理解析
算法流程图
检索查询流程图
# 导入必要的库
import asyncio # 用于异步编程
import os # 用于操作系统相关功能
from tqdm.asyncio import tqdm as tqdm_async # 用于异步进度条显示
from dataclasses import asdict, dataclass, field # 用于数据类定义
from datetime import datetime # 用于处理日期和时间
from functools import partial # 用于函数式编程
from typing import Type, cast, Dict # 用于类型注解
# 从本地模块导入LLM相关功能
from .llm import (
gpt_4o_mini_complete, # GPT-4模型完成功能
openai_embedding, # OpenAI嵌入功能
)
# 从本地模块导入操作相关功能
from .operate import (
chunking_by_token_size, # 按token大小分块
extract_entities, # 提取实体
kg_query, # 知识图谱查询
naive_query, # 简单查询
mix_kg_vector_query, # 混合知识图谱向量查询
)
# 从本地模块导入工具函数
from .utils import (
EmbeddingFunc, # 嵌入函数类型
compute_mdhash_id, # 计算MD哈希ID
limit_async_func_call, # 限制异步函数调用
convert_response_to_json, # 将响应转换为JSON
logger, # 日志记录器
set_logger, # 设置日志记录器
)
# 从本地模块导入基础类
from .base import (
BaseGraphStorage, # 基础图存储
BaseKVStorage, # 基础键值存储
BaseVectorStorage, # 基础向量存储
StorageNameSpace, # 存储命名空间
QueryParam, # 查询参数
DocStatus, # 文档状态
)
# 从本地模块导入存储实现
from .storage import (
JsonKVStorage, # JSON键值存储
NanoVectorDBStorage, # Nano向量数据库存储
NetworkXStorage, # NetworkX图存储
JsonDocStatusStorage, # JSON文档状态存储
)
# 从本地模块导入图字段分隔符
from .prompt import GRAPH_FIELD_SEP
# future KG integrations
# from .kg.ArangoDB_impl import (
# GraphStorage as ArangoDBStorage
# )
def lazy_external_import(module_name: str, class_name: str):
"""懒加载外部模块的函数
Args:
module_name: 模块名称
class_name: 类名称
Returns:
一个函数,用于延迟导入指定的类
"""
import inspect
# 获取调用者的模块和包
caller_frame = inspect.currentframe().f_back
module = inspect.getmodule(caller_frame)
package = module.__package__ if module else None
def import_class(*args, **kwargs):
import importlib
# 使用importlib导入模块
module = importlib.import_module(module_name, package=package)
# 从模块中获取类并实例化
cls = getattr(module, class_name)
return cls(*args, **kwargs)
return import_class
Neo4JStorage = lazy_external_import(".kg.neo4j_impl", "Neo4JStorage")
OracleKVStorage = lazy_external_import(".kg.oracle_impl", "OracleKVStorage")
OracleGraphStorage = lazy_external_import(".kg.oracle_impl", "OracleGraphStorage")
OracleVectorDBStorage = lazy_external_import(".kg.oracle_impl", "OracleVectorDBStorage")
MilvusVectorDBStorge = lazy_external_import(".kg.milvus_impl", "MilvusVectorDBStorge")
MongoKVStorage = lazy_external_import(".kg.mongo_impl", "MongoKVStorage")
ChromaVectorDBStorage = lazy_external_import(".kg.chroma_impl", "ChromaVectorDBStorage")
TiDBKVStorage = lazy_external_import(".kg.tidb_impl", "TiDBKVStorage")
TiDBVectorDBStorage = lazy_external_import(".kg.tidb_impl", "TiDBVectorDBStorage")
TiDBGraphStorage = lazy_external_import(".kg.tidb_impl", "TiDBGraphStorage")
PGKVStorage = lazy_external_import(".kg.postgres_impl", "PGKVStorage")
PGVectorStorage = lazy_external_import(".kg.postgres_impl", "PGVectorStorage")
AGEStorage = lazy_external_import(".kg.age_impl", "AGEStorage")
PGGraphStorage = lazy_external_import(".kg.postgres_impl", "PGGraphStorage")
GremlinStorage = lazy_external_import(".kg.gremlin_impl", "GremlinStorage")
PGDocStatusStorage = lazy_external_import(".kg.postgres_impl", "PGDocStatusStorage")
def always_get_an_event_loop() -> asyncio.AbstractEventLoop:
"""确保总是能获取到一个事件循环
此函数尝试获取当前事件循环。如果当前事件循环已关闭或不存在,
它会创建一个新的事件循环并将其设置为当前事件循环。
Returns:
asyncio.AbstractEventLoop: 当前或新创建的事件循环
"""
try:
# 尝试获取当前事件循环
current_loop = asyncio.get_event_loop()
if current_loop.is_closed():
raise RuntimeError("事件循环已关闭")
return current_loop
except RuntimeError:
# 如果没有事件循环或已关闭,创建一个新的
logger.info("在主线程中创建新的事件循环")
new_loop = asyncio.new_event_loop()
asyncio.set_event_loop(new_loop)
return new_loop
@dataclass
class LightRAG:
"""LightRAG主类,用于处理文档的存储、检索和知识图谱构建
Attributes:
working_dir (str): 工作目录路径
embedding_cache_config (dict): 嵌入缓存配置
kv_storage (str): 键值存储类型
vector_storage (str): 向量存储类型
graph_storage (str): 图存储类型
log_level (str): 日志级别
chunk_token_size (int): 文本分块的token大小
chunk_overlap_token_size (int): 文本分块重叠的token大小
tiktoken_model_name (str): 使用的tiktoken模型名称
entity_extract_max_gleaning (int): 实体提取的最大数量
entity_summary_to_max_tokens (int): 实体摘要的最大token数
node_embedding_algorithm (str): 节点嵌入算法
node2vec_params (dict): node2vec算法参数
embedding_func (EmbeddingFunc): 嵌入函数
embedding_batch_num (int): 嵌入批处理数量
embedding_func_max_async (int): 嵌入函数最大异步数
llm_model_func (callable): LLM模型函数
llm_model_name (str): LLM模型名称
llm_model_max_token_size (int): LLM模型最大token大小
llm_model_max_async (int): LLM模型最大异步数
llm_model_kwargs (dict): LLM模型额外参数
vector_db_storage_cls_kwargs (dict): 向量数据库存储类参数
enable_llm_cache (bool): 是否启用LLM缓存
enable_llm_cache_for_entity_extract (bool): 是否为实体提取启用LLM缓存
addon_params (dict): 附加参数
convert_response_to_json_func (callable): 响应转JSON函数
doc_status_storage (str): 文档状态存储类型
"""
# 工作目录设置
working_dir: str = field(
default_factory=lambda: f"./lightrag_cache_{datetime.now().strftime('%Y-%m-%d-%H:%M:%S')}"
)
# 嵌入缓存配置,默认不使用
embedding_cache_config: dict = field(
default_factory=lambda: {
"enabled": False,
"similarity_threshold": 0.95,
"use_llm_check": False,
}
)
# 存储类型设置
kv_storage: str = field(default="JsonKVStorage")
vector_storage: str = field(default="NanoVectorDBStorage")
graph_storage: str = field(default="NetworkXStorage")
# 日志级别设置
current_log_level = logger.level
log_level: str = field(default=current_log_level)
# 文本分块相关参数
chunk_token_size: int = 1200 # 每个分块的目标token数量
chunk_overlap_token_size: int = 100 # 相邻分块之间的重叠token数量
tiktoken_model_name: str = "gpt-4o-mini" # 用于计算token的模型名称
# 实体提取相关参数
entity_extract_max_gleaning: int = 1 # 实体提取时的最大采集次数
entity_summary_to_max_tokens: int = 500 # 实体摘要的最大token数量
# 节点嵌入相关参数
node_embedding_algorithm: str = "node2vec" # 节点嵌入算法选择
node2vec_params: dict = field(
default_factory=lambda: {
"dimensions": 1536, # 嵌入向量维度
"num_walks": 10, # 每个节点的随机游走次数
"walk_length": 40, # 每次随机游走的长度
"window_size": 2, # 上下文窗口大小
"iterations": 3, # 训练迭代次数
"random_seed": 3, # 随机种子
}
)
# 嵌入函数相关设置
embedding_func: EmbeddingFunc = field(default_factory=lambda: openai_embedding) # 默认使用OpenAI的嵌入函数
embedding_batch_num: int = 32 # 嵌入批处理的大小
embedding_func_max_async: int = 16 # 嵌入函数的最大并发数
# LLM模型相关设置
llm_model_func: callable = gpt_4o_mini_complete # 默认使用GPT-4 mini完成功能
llm_model_name: str = "meta-llama/Llama-3.2-1B-Instruct" # 使用的语言模型名称
llm_model_max_token_size: int = 32768 # 模型支持的最大token数量
llm_model_max_async: int = 16 # LLM模型的最大并发数
llm_model_kwargs: dict = field(default_factory=dict) # LLM模型的额外参数
# 存储相关设置
vector_db_storage_cls_kwargs: dict = field(default_factory=dict) # 向量数据库存储类的初始化参数
# 缓存设置
enable_llm_cache: bool = True # 是否启用LLM响应缓存
# 有时LLM在提取实体时可能会失败,我们可能希望在不产生LLM成本的情况下继续,可以使用此标志
enable_llm_cache_for_entity_extract: bool = True # 是否为实体提取启用LLM缓存
# 扩展设置
addon_params: dict = field(default_factory=dict) # 附加参数字典
convert_response_to_json_func: callable = convert_response_to_json # 用于将响应转换为JSON的函数
# 文档状态存储设置
doc_status_storage: str = field(default="JsonDocStatusStorage") # 文档状态存储类型
def __post_init__(self):
"""初始化LightRAG实例的后处理方法
该方法在实例创建后自动调用,用于设置日志、初始化存储实例等
"""
# 设置日志文件和日志级别
log_file = os.path.join("lightrag.log")
set_logger(log_file)
logger.setLevel(self.log_level)
# 记录初始化信息
logger.info(f"Logger initialized for working directory: {self.working_dir}")
# 打印详细配置信息
_print_config = ",\n ".join([f"{k} = {v}" for k, v in asdict(self).items()])
logger.debug(f"LightRAG init with param:\n {_print_config}\n")
# TODO: 应该将所有存储设置移到这里,以利用附加到self的初始启动参数
# 根据配置获取相应的存储类
self.key_string_value_json_storage_cls: Type[BaseKVStorage] = (
self._get_storage_class()[self.kv_storage]
)
self.vector_db_storage_cls: Type[BaseVectorStorage] = self._get_storage_class()[
self.vector_storage
]
self.graph_storage_cls: Type[BaseGraphStorage] = self._get_storage_class()[
self.graph_storage
]
# 确保工作目录存在
if not os.path.exists(self.working_dir):
logger.info(f"Creating working directory {self.working_dir}")
os.makedirs(self.working_dir)
# 初始化LLM响应缓存
self.llm_response_cache = self.key_string_value_json_storage_cls(
namespace="llm_response_cache",
global_config=asdict(self),
embedding_func=None,
)
# 设置嵌入函数的并发限制
self.embedding_func = limit_async_func_call(self.embedding_func_max_async)(
self.embedding_func
)
# 初始化文档存储相关实例
self.full_docs = self.key_string_value_json_storage_cls(
namespace="full_docs", # 完整文档存储
global_config=asdict(self),
embedding_func=self.embedding_func,
)
self.text_chunks = self.key_string_value_json_storage_cls(
namespace="text_chunks", # 文本分块存储
global_config=asdict(self),
embedding_func=self.embedding_func,
)
self.chunk_entity_relation_graph = self.graph_storage_cls(
namespace="chunk_entity_relation", # 分块-实体关系图存储
global_config=asdict(self),
embedding_func=self.embedding_func,
)
# 初始化实体和关系的向量数据库存储
self.entities_vdb = self.vector_db_storage_cls(
namespace="entities", # 实体向量存储
global_config=asdict(self),
embedding_func=self.embedding_func,
meta_fields={"entity_name"}, # 实体名称作为元数据字段
)
self.relationships_vdb = self.vector_db_storage_cls(
namespace="relationships", # 关系向量存储
global_config=asdict(self),
embedding_func=self.embedding_func,
meta_fields={"src_id", "tgt_id"}, # 源ID和目标ID作为元数据字段
)
self.chunks_vdb = self.vector_db_storage_cls(
namespace="chunks",
global_config=asdict(self),
embedding_func=self.embedding_func,
)
# 设置LLM模型函数的并发限制和缓存
self.llm_model_func = limit_async_func_call(self.llm_model_max_async)(
partial(
self.llm_model_func,
hashing_kv=self.llm_response_cache
if self.llm_response_cache
and hasattr(self.llm_response_cache, "global_config")
else self.key_string_value_json_storage_cls(
namespace="llm_response_cache",
global_config=asdict(self),
embedding_func=None,
),
**self.llm_model_kwargs,
)
)
# 初始化文档状态存储
self.doc_status_storage_cls = self._get_storage_class()[self.doc_status_storage]
self.doc_status = self.doc_status_storage_cls(
namespace="doc_status", # 文档状态存储
global_config=asdict(self),
embedding_func=None,
)
def _get_storage_class(self) -> dict:
"""获取所有可用的存储类
Returns:
dict: 存储类名称到类的映射字典
"""
return {
# 键值存储
"JsonKVStorage": JsonKVStorage,
"OracleKVStorage": OracleKVStorage,
"MongoKVStorage": MongoKVStorage,
"TiDBKVStorage": TiDBKVStorage,
# 向量存储
"NanoVectorDBStorage": NanoVectorDBStorage,
"OracleVectorDBStorage": OracleVectorDBStorage,
"MilvusVectorDBStorge": MilvusVectorDBStorge,
"ChromaVectorDBStorage": ChromaVectorDBStorage,
"TiDBVectorDBStorage": TiDBVectorDBStorage,
# 图存储
"NetworkXStorage": NetworkXStorage,
"Neo4JStorage": Neo4JStorage,
"OracleGraphStorage": OracleGraphStorage,
"AGEStorage": AGEStorage,
"PGGraphStorage": PGGraphStorage,
"PGKVStorage": PGKVStorage,
"PGDocStatusStorage": PGDocStatusStorage,
"PGVectorStorage": PGVectorStorage,
"TiDBGraphStorage": TiDBGraphStorage,
"GremlinStorage": GremlinStorage,
# "ArangoDBStorage": ArangoDBStorage
"JsonDocStatusStorage": JsonDocStatusStorage,
}
def insert(self, string_or_strings, split_by_character=None):
"""同步插入文档的方法
Args:
string_or_strings: 单个文档字符串或文档字符串列表
split_by_character: 如果不为None,则按此字符分割文档
"""
loop = always_get_an_event_loop()
return loop.run_until_complete(
self.ainsert(string_or_strings, split_by_character)
)
async def ainsert(self, string_or_strings, split_by_character):
"""异步插入文档的方法,支持检查点功能
Args:
string_or_strings: 单个文档字符串或文档字符串列表
split_by_character: 如果不为None,则按此字符分割文档
"""
# 将单个字符串转换为列表形式
if isinstance(string_or_strings, str):
string_or_strings = [string_or_strings]
# 1. 从列表中删除重复内容
unique_contents = list(set(doc.strip() for doc in string_or_strings))
# 2. 生成文档ID和初始状态
new_docs = {
compute_mdhash_id(content, prefix="doc-"): {
"content": content, # 文档内容
"content_summary": self._get_content_summary(content), # 文档摘要
"content_length": len(content), # 文档长度
"status": DocStatus.PENDING, # 初始状态为待处理
"created_at": datetime.now().isoformat(), # 创建时间
"updated_at": datetime.now().isoformat(), # 更新时间
}
for content in unique_contents
}
# 3. 过滤掉已处理的文档
_add_doc_keys = await self.doc_status.filter_keys(list(new_docs.keys()))
new_docs = {k: v for k, v in new_docs.items() if k in _add_doc_keys}
if not new_docs:
logger.info("所有文档都已处理或是重复的")
return
logger.info(f"处理 {len(new_docs)} 个新的唯一文档")
# 按批次处理文档
batch_size = self.addon_params.get("insert_batch_size", 10)
for i in range(0, len(new_docs), batch_size):
batch_docs = dict(list(new_docs.items())[i : i + batch_size])
# 使用进度条处理每个批次的文档
for doc_id, doc in tqdm_async(
batch_docs.items(), desc=f"处理批次 {i // batch_size + 1}"
):
try:
# 更新状态为处理中
doc_status = {
"content_summary": doc["content_summary"],
"content_length": doc["content_length"],
"status": DocStatus.PROCESSING,
"created_at": doc["created_at"],
"updated_at": datetime.now().isoformat(),
}
await self.doc_status.upsert({doc_id: doc_status})
# 从文档生成文本块
chunks = {
compute_mdhash_id(dp["content"], prefix="chunk-"): {
**dp,
"full_doc_id": doc_id,
}
for dp in chunking_by_token_size(
doc["content"],
split_by_character=split_by_character,
overlap_token_size=self.chunk_overlap_token_size,
max_token_size=self.chunk_token_size,
tiktoken_model=self.tiktoken_model_name,
)
}
# 更新状态,添加分块信息
doc_status.update(
{
"chunks_count": len(chunks),
"updated_at": datetime.now().isoformat(),
}
)
await self.doc_status.upsert({doc_id: doc_status})
try:
# 将文本块存储到向量数据库
await self.chunks_vdb.upsert(chunks)
# 提取并存储实体和关系
maybe_new_kg = await extract_entities(
chunks,
knowledge_graph_inst=self.chunk_entity_relation_graph,
entity_vdb=self.entities_vdb,
relationships_vdb=self.relationships_vdb,
llm_response_cache=self.llm_response_cache,
global_config=asdict(self),
)
# 检查实体提取是否成功
if maybe_new_kg is None:
raise Exception(
"实体和关系提取失败"
)
self.chunk_entity_relation_graph = maybe_new_kg
# 存储原始文档和文本块
await self.full_docs.upsert(
{doc_id: {"content": doc["content"]}}
)
await self.text_chunks.upsert(chunks)
# 更新状态为已处理
doc_status.update(
{
"status": DocStatus.PROCESSED,
"updated_at": datetime.now().isoformat(),
}
)
await self.doc_status.upsert({doc_id: doc_status})
except Exception as e:
# 如果处理过程中出现错误,将状态标记为失败
doc_status.update(
{
"status": DocStatus.FAILED,
"error": str(e),
"updated_at": datetime.now().isoformat(),
}
)
await self.doc_status.upsert({doc_id: doc_status})
raise e
except Exception as e:
# 记录处理文档时的错误信息
import traceback
error_msg = f"处理文档 {doc_id} 失败: {str(e)}\n{traceback.format_exc()}"
logger.error(error_msg)
continue
finally:
# 确保每个文档处理后都更新所有索引
await self._insert_done()
async def _insert_done(self):
"""完成插入操作后的清理工作
更新所有存储实例的索引
"""
tasks = []
for storage_inst in [
self.full_docs,
self.text_chunks,
self.llm_response_cache,
self.entities_vdb,
self.relationships_vdb,
self.chunks_vdb,
self.chunk_entity_relation_graph,
]:
if storage_inst is None:
continue
tasks.append(cast(StorageNameSpace, storage_inst).index_done_callback())
await asyncio.gather(*tasks)
def insert_custom_kg(self, custom_kg: dict):
"""同步插入自定义知识图谱的方法
Args:
custom_kg: 包含自定义知识图谱数据的字典,应包含以下结构:
- chunks: 文本块列表,每个文本块包含content和source_id
- entities: 实体列表,每个实体包含entity_name、entity_type、description和source_id
- relationships: 关系列表,每个关系包含src_id、tgt_id、description、keywords、weight和source_id
"""
loop = always_get_an_event_loop()
return loop.run_until_complete(self.ainsert_custom_kg(custom_kg))
async def ainsert_custom_kg(self, custom_kg: dict):
"""异步插入自定义知识图谱的方法
此方法允许用户插入预定义的知识图谱数据,包括文本块、实体和关系。
所有的source_id将被重新映射以保持一致性。
Args:
custom_kg: 包含自定义知识图谱数据的字典,结构同上
注意:
- 实体名称将被自动转换为大写并添加引号
- 如果实体或关系的source_id未知,将记录警告
- 所有数据将同时更新到图数据库和向量数据库
"""
update_storage = False
try:
# 插入文本块到向量存储
all_chunks_data = {} # 存储所有文本块数据
chunk_to_source_map = {} # 用于映射chunk_id到source_id
for chunk_data in custom_kg.get("chunks", []):
chunk_content = chunk_data["content"] # 文本块内容
source_id = chunk_data["source_id"] # 源文档ID
# 为文本块生成唯一ID
chunk_id = compute_mdhash_id(chunk_content.strip(), prefix="chunk-")
# 准备文本块数据
chunk_entry = {
"content": chunk_content.strip(),
"source_id": source_id
}
all_chunks_data[chunk_id] = chunk_entry
chunk_to_source_map[source_id] = chunk_id
update_storage = True
# 将文本块存储到向量数据库和文本块存储
if self.chunks_vdb is not None and all_chunks_data:
await self.chunks_vdb.upsert(all_chunks_data)
if self.text_chunks is not None and all_chunks_data:
await self.text_chunks.upsert(all_chunks_data)
# 插入实体到知识图谱
all_entities_data = [] # 存储所有实体数据
for entity_data in custom_kg.get("entities", []):
# 处理实体数据
entity_name = f'"{entity_data["entity_name"].upper()}"' # 转换实体名称格式
entity_type = entity_data.get("entity_type", "UNKNOWN") # 获取实体类型
description = entity_data.get("description", "No description provided") # 获取描述
source_chunk_id = entity_data.get("source_id", "UNKNOWN") # 获取源文本块ID
source_id = chunk_to_source_map.get(source_chunk_id, "UNKNOWN") # 映射到实际的源ID
# 如果source_id未知则记录警告
if source_id == "UNKNOWN":
logger.warning(f"实体 '{entity_name}' 的source_id未知。请检查源映射。")
# 准备节点数据并插入图数据库
node_data = {
"entity_type": entity_type,
"description": description,
"source_id": source_id,
}
await self.chunk_entity_relation_graph.upsert_node(entity_name, node_data)
# 添加实体名称到节点数据中用于向量存储
node_data["entity_name"] = entity_name
all_entities_data.append(node_data)
update_storage = True
# 插入关系到知识图谱
all_relationships_data = []
for relationship_data in custom_kg.get("relationships", []):
src_id = f'"{relationship_data["src_id"].upper()}"' # 源实体ID
tgt_id = f'"{relationship_data["tgt_id"].upper()}"' # 目标实体ID
description = relationship_data["description"] # 关系描述
keywords = relationship_data["keywords"] # 关系关键词
weight = relationship_data.get("weight", 1.0) # 关系权重,默认为1.0
source_chunk_id = relationship_data.get("source_id", "UNKNOWN") # 源文本块ID
source_id = chunk_to_source_map.get(source_chunk_id, "UNKNOWN") # 映射到实际的源ID
# 如果source_id未知则记录警告
if source_id == "UNKNOWN":
logger.warning(
f"关系 '{src_id}' 到 '{tgt_id}' 的source_id未知。请检查源映射。"
)
# 检查知识图谱中是否存在相关节点
for need_insert_id in [src_id, tgt_id]:
if not (
await self.chunk_entity_relation_graph.has_node(need_insert_id)
):
# 如果节点不存在,则创建一个新节点
await self.chunk_entity_relation_graph.upsert_node(
need_insert_id,
node_data={
"source_id": source_id,
"description": "UNKNOWN",
"entity_type": "UNKNOWN",
},
)
# 将边插入知识图谱
await self.chunk_entity_relation_graph.upsert_edge(
src_id,
tgt_id,
edge_data={
"weight": weight,
"description": description,
"keywords": keywords,
"source_id": source_id,
},
)
edge_data = {
"src_id": src_id,
"tgt_id": tgt_id,
"description": description,
"keywords": keywords,
}
all_relationships_data.append(edge_data)
update_storage = True
# 将实体插入向量存储
if self.entities_vdb is not None:
data_for_vdb = {
compute_mdhash_id(dp["entity_name"], prefix="ent-"): {
"content": dp["entity_name"] + dp["description"],
"entity_name": dp["entity_name"],
}
for dp in all_entities_data
}
await self.entities_vdb.upsert(data_for_vdb)
# 将关系插入向量存储
if self.relationships_vdb is not None:
data_for_vdb = {
compute_mdhash_id(dp["src_id"] + dp["tgt_id"], prefix="rel-"): {
"src_id": dp["src_id"],
"tgt_id": dp["tgt_id"],
"content": dp["keywords"]
+ dp["src_id"]
+ dp["tgt_id"]
+ dp["description"],
}
for dp in all_relationships_data
}
await self.relationships_vdb.upsert(data_for_vdb)
finally:
if update_storage:
await self._insert_done()
def query(self, query: str, param: QueryParam = QueryParam()):
"""同步查询方法
Args:
query: 查询字符串
param: 查询参数对象,默认使用默认参数
Returns:
查询结果
"""
loop = always_get_an_event_loop()
return loop.run_until_complete(self.aquery(query, param))
async def aquery(self, query: str, param: QueryParam = QueryParam()):
"""异步查询方法
Args:
query: 查询字符串
param: 查询参数对象,默认使用默认参数
Returns:
查询结果
"""
if param.mode in ["local", "global", "hybrid"]:
# 使用知识图谱查询
response = await kg_query(
query,
self.chunk_entity_relation_graph,
self.entities_vdb,
self.relationships_vdb,
self.text_chunks,
param,
asdict(self),
hashing_kv=self.llm_response_cache
if self.llm_response_cache
and hasattr(self.llm_response_cache, "global_config")
else self.key_string_value_json_storage_cls(
namespace="llm_response_cache",
global_config=asdict(self),
embedding_func=None,
),
)
elif param.mode == "naive":
# 使用简单向量查询
response = await naive_query(
query,
self.chunks_vdb,
self.text_chunks,
param,
asdict(self),
hashing_kv=self.llm_response_cache
if self.llm_response_cache
and hasattr(self.llm_response_cache, "global_config")
else self.key_string_value_json_storage_cls(
namespace="llm_response_cache",
global_config=asdict(self),
embedding_func=None,
),
)
elif param.mode == "mix":
# 使用混合查询(知识图谱+向量)
response = await mix_kg_vector_query(
query,
self.chunk_entity_relation_graph,
self.entities_vdb,
self.relationships_vdb,
self.chunks_vdb,
self.text_chunks,
param,
asdict(self),
hashing_kv=self.llm_response_cache
if self.llm_response_cache
and hasattr(self.llm_response_cache, "global_config")
else self.key_string_value_json_storage_cls(
namespace="llm_response_cache",
global_config=asdict(self),
embedding_func=None,
),
)
else:
raise ValueError(f"未知的查询模式 {param.mode}")
await self._query_done()
return response
async def _query_done(self):
"""完成查询后的清理工作
更新LLM响应缓存的索引
"""
tasks = []
for storage_inst in [self.llm_response_cache]:
if storage_inst is None:
continue
tasks.append(cast(StorageNameSpace, storage_inst).index_done_callback())
await asyncio.gather(*tasks)
def delete_by_entity(self, entity_name: str):
"""同步删除实体的方法
Args:
entity_name: 要删除的实体名称
"""
loop = always_get_an_event_loop()
return loop.run_until_complete(self.adelete_by_entity(entity_name))
async def adelete_by_entity(self, entity_name: str):
"""异步删除实体的方法
Args:
entity_name: 要删除的实体名称
"""
# 将实体名称转换为大写并添加引号
entity_name = f'"{entity_name.upper()}"'
try:
# 从向量数据库和知识图谱中删除实体及其关系
await self.entities_vdb.delete_entity(entity_name)
await self.relationships_vdb.delete_entity_relation(entity_name)
await self.chunk_entity_relation_graph.delete_node(entity_name)
logger.info(
f"实体 '{entity_name}' 及其关系已被删除。"
)
await self._delete_by_entity_done()
except Exception as e:
logger.error(f"删除实体 '{entity_name}' 时出错: {e}")
async def _delete_by_entity_done(self):
"""完成实体删除后的清理工作
更新相关存储实例的索引
"""
tasks = []
for storage_inst in [
self.entities_vdb,
self.relationships_vdb,
self.chunk_entity_relation_graph,
]:
if storage_inst is None:
continue
tasks.append(cast(StorageNameSpace, storage_inst).index_done_callback())
await asyncio.gather(*tasks)
def _get_content_summary(self, content: str, max_length: int = 100) -> str:
"""获取文档内容的摘要
Args:
content: 原始文档内容
max_length: 摘要的最大长度
Returns:
如果需要,将截断内容并添加省略号
"""
content = content.strip()
if len(content) <= max_length:
return content
return content[:max_length] + "..."
async def get_processing_status(self) -> Dict[str, int]:
"""获取当前文档处理状态的计数
Returns:
包含每种状态计数的字典
"""
return await self.doc_status.get_status_counts()
async def adelete_by_doc_id(self, doc_id: str):
"""异步删除文档及其所有相关数据
Args:
doc_id: 要删除的文档ID
"""
try:
# 1. 获取文档状态和相关数据
doc_status = await self.doc_status.get(doc_id)
if not doc_status:
logger.warning(f"找不到文档 {doc_id}")
return
logger.debug(f"开始删除文档 {doc_id}")
# 2. 获取所有相关的文本块
chunks = await self.text_chunks.filter(
lambda x: x.get("full_doc_id") == doc_id
)
chunk_ids = list(chunks.keys())
logger.debug(f"找到 {len(chunk_ids)} 个需要删除的文本块")
# 3. 删除前,检查这些文本块相关的实体和关系
for chunk_id in chunk_ids:
# 检查实体
entities = [
dp
for dp in self.entities_vdb.client_storage["data"]
if dp.get("source_id") == chunk_id
]
logger.debug(f"文本块 {chunk_id} 有 {len(entities)} 个相关实体")
# 检查关系
relations = [
dp
for dp in self.relationships_vdb.client_storage["data"]
if dp.get("source_id") == chunk_id
]
logger.debug(f"文本块 {chunk_id} 有 {len(relations)} 个相关关系")
# 4. 从向量数据库中删除文本块
if chunk_ids:
await self.chunks_vdb.delete(chunk_ids)
await self.text_chunks.delete(chunk_ids)
# 5. 查找并处理以这些文本块为源的实体和关系
# 获取图中的所有节点
nodes = self.chunk_entity_relation_graph._graph.nodes(data=True)
edges = self.chunk_entity_relation_graph._graph.edges(data=True)
# 跟踪需要删除或更新的实体和关系
entities_to_delete = set() # 需要删除的实体集合
entities_to_update = {} # 实体名称 -> 新的源ID映射
relationships_to_delete = set() # 需要删除的关系集合
relationships_to_update = {} # (源,目标) -> 新的源ID映射
# 处理实体
for node, data in nodes:
if "source_id" in data:
# 使用GRAPH_FIELD_SEP分割source_id
sources = set(data["source_id"].split(GRAPH_FIELD_SEP))
sources.difference_update(chunk_ids)
if not sources:
entities_to_delete.add(node)
logger.debug(
f"实体 {node} 标记为删除 - 没有剩余的源"
)
else:
new_source_id = GRAPH_FIELD_SEP.join(sources)
entities_to_update[node] = new_source_id
logger.debug(
f"实体 {node} 将更新为新的source_id: {new_source_id}"
)
# 处理关系
for src, tgt, data in edges:
if "source_id" in data:
# 使用GRAPH_FIELD_SEP分割source_id
sources = set(data["source_id"].split(GRAPH_FIELD_SEP))
sources.difference_update(chunk_ids)
if not sources:
relationships_to_delete.add((src, tgt))
logger.debug(
f"关系 {src}-{tgt} 标记为删除 - 没有剩余的源"
)
else:
new_source_id = GRAPH_FIELD_SEP.join(sources)
relationships_to_update[(src, tgt)] = new_source_id
logger.debug(
f"关系 {src}-{tgt} 将更新为新的source_id: {new_source_id}"
)
# 删除实体
if entities_to_delete:
for entity in entities_to_delete:
await self.entities_vdb.delete_entity(entity)
logger.debug(f"从向量数据库中删除实体 {entity}")
self.chunk_entity_relation_graph.remove_nodes(list(entities_to_delete))
logger.debug(f"从图中删除 {len(entities_to_delete)} 个实体")
# 更新实体
for entity, new_source_id in entities_to_update.items():
node_data = self.chunk_entity_relation_graph._graph.nodes[entity]
node_data["source_id"] = new_source_id
await self.chunk_entity_relation_graph.upsert_node(entity, node_data)
logger.debug(
f"更新实体 {entity} 的source_id为: {new_source_id}"
)
# 删除关系
if relationships_to_delete:
for src, tgt in relationships_to_delete:
rel_id_0 = compute_mdhash_id(src + tgt, prefix="rel-")
rel_id_1 = compute_mdhash_id(tgt + src, prefix="rel-")
await self.relationships_vdb.delete([rel_id_0, rel_id_1])
logger.debug(f"从向量数据库中删除关系 {src}-{tgt}")
self.chunk_entity_relation_graph.remove_edges(
list(relationships_to_delete)
)
logger.debug(
f"从图中删除 {len(relationships_to_delete)} 个关系"
)
# 更新关系
for (src, tgt), new_source_id in relationships_to_update.items():
edge_data = self.chunk_entity_relation_graph._graph.edges[src, tgt]
edge_data["source_id"] = new_source_id
await self.chunk_entity_relation_graph.upsert_edge(src, tgt, edge_data)
logger.debug(
f"更新关系 {src}-{tgt} 的source_id为: {new_source_id}"
)
# 删除原始文档和状态
await self.full_docs.delete([doc_id])
await self.doc_status.delete([doc_id])
# 确保所有索引都已更新
await self._insert_done()
logger.info(
f"成功删除文档 {doc_id} 及相关数据。"
f"删除了 {len(entities_to_delete)} 个实体和 {len(relationships_to_delete)} 个关系。"
f"更新了 {len(entities_to_update)} 个实体和 {len(relationships_to_update)} 个关系。"
)
# 添加验证步骤
async def verify_deletion():
# 验证文档是否已被删除
if await self.full_docs.get_by_id(doc_id):
logger.error(f"文档 {doc_id} 仍然存在于full_docs中")
# 验证文本块是否已被删除
remaining_chunks = await self.text_chunks.filter(
lambda x: x.get("full_doc_id") == doc_id
)
if remaining_chunks:
logger.error(f"发现 {len(remaining_chunks)} 个剩余的文本块")
# 验证实体和关系
for chunk_id in chunk_ids:
# 检查实体
entities_with_chunk = [
dp
for dp in self.entities_vdb.client_storage["data"]
if chunk_id
in (dp.get("source_id") or "").split(GRAPH_FIELD_SEP)
]
if entities_with_chunk:
logger.error(
f"发现 {len(entities_with_chunk)} 个实体仍然引用文本块 {chunk_id}"
)
# 检查关系
relations_with_chunk = [
dp
for dp in self.relationships_vdb.client_storage["data"]
if chunk_id
in (dp.get("source_id") or "").split(GRAPH_FIELD_SEP)
]
if relations_with_chunk:
logger.error(
f"发现 {len(relations_with_chunk)} 个关系仍然引用文本块 {chunk_id}"
)
await verify_deletion()
except Exception as e:
logger.error(f"删除文档 {doc_id} 时出错: {e}")
def delete_by_doc_id(self, doc_id: str):
"""同步删除文档的方法
Args:
doc_id: 要删除的文档ID
"""
return asyncio.run(self.adelete_by_doc_id(doc_id))
async def get_entity_info(
self, entity_name: str, include_vector_data: bool = False
):
"""获取实体的详细信息
Args:
entity_name: 实体名称(不需要引号)
include_vector_data: 是否包含向量数据库中的数据
Returns:
dict: 包含实体信息的字典,包括:
- entity_name: 实体名称
- source_id: 源文档ID
- graph_data: 来自图数据库的完整节点数据
- vector_data: (可选)来自向量数据库的数据
"""
# 将实体名称转换为大写并添加引号
entity_name = f'"{entity_name.upper()}"'
# 从图中获取信息
node_data = await self.chunk_entity_relation_graph.get_node(entity_name)
source_id = node_data.get("source_id") if node_data else None
result = {
"entity_name": entity_name,
"source_id": source_id,
"graph_data": node_data,
}
# 可选:获取向量数据库信息
if include_vector_data:
entity_id = compute_mdhash_id(entity_name, prefix="ent-")
vector_data = self.entities_vdb._client.get([entity_id])
result["vector_data"] = vector_data[0] if vector_data else None
return result
def get_entity_info_sync(self, entity_name: str, include_vector_data: bool = False):
"""获取实体信息的同步版本
Args:
entity_name: 实体名称(不需要引号)
include_vector_data: 是否包含向量数据库中的数据
"""
try:
import tracemalloc
tracemalloc.start()
return asyncio.run(self.get_entity_info(entity_name, include_vector_data))
finally:
tracemalloc.stop()
async def get_relation_info(
self, src_entity: str, tgt_entity: str, include_vector_data: bool = False
):
"""获取关系的详细信息
Args:
src_entity: 源实体名称(不需要引号)
tgt_entity: 目标实体名称(不需要引号)
include_vector_data: 是否包含向量数据库中的数据
Returns:
dict: 包含关系信息的字典,包括:
- src_entity: 源实体名称
- tgt_entity: 目标实体名称
- source_id: 源文档ID
- graph_data: 来自图数据库的完整边数据
- vector_data: (可选)来自向量数据库的数据
"""
src_entity = f'"{src_entity.upper()}"'
tgt_entity = f'"{tgt_entity.upper()}"'
# 从图中获取信息
edge_data = await self.chunk_entity_relation_graph.get_edge(src_entity, tgt_entity)
source_id = edge_data.get("source_id") if edge_data else None
result = {
"src_entity": src_entity,
"tgt_entity": tgt_entity,
"source_id": source_id,
"graph_data": edge_data,
}
# 可选:获取向量数据库信息
if include_vector_data:
rel_id = compute_mdhash_id(src_entity + tgt_entity, prefix="rel-")
vector_data = self.relationships_vdb._client.get([rel_id])
result["vector_data"] = vector_data[0] if vector_data else None
return result
def get_relation_info_sync(
self, src_entity: str, tgt_entity: str, include_vector_data: bool = False
):
"""获取关系信息的同步版本
Args:
src_entity: 源实体名称(不需要引号)
tgt_entity: 目标实体名称(不需要引号)
include_vector_data: 是否包含向量数据库中的数据
"""
try:
import tracemalloc
tracemalloc.start()
return asyncio.run(
self.get_relation_info(src_entity, tgt_entity, include_vector_data)
)
finally:
tracemalloc.stop()