首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >专栏 >ChatGPT|AI搜索(Tiny Search)上线

ChatGPT|AI搜索(Tiny Search)上线

作者头像
用户1904552
发布2025-02-27 10:31:56
发布2025-02-27 10:31:56
7800
代码可运行
举报
文章被收录于专栏:周末程序猿周末程序猿
运行总次数:0
代码可运行

1、简介

AI搜索(Tiny Search)是类似Perplexity AI,实现会话搜索引擎,通过将搜索的内容聚合,然后通过AI进行整合从而提升搜索效率,之前贾扬清也开源了Perplexity的源码,项目:

代码语言:javascript
代码运行次数:0
运行
复制
https://github.com/leptonai/search_with_lepton

为了适配国内的搜索引擎,于是用python重新实现一个版本,大家可以通过访问(目前是beta版本,查询可能会有点慢):

代码语言:javascript
代码运行次数:0
运行
复制
https://service-mpjvpuxa-1251014631.gz.apigw.tencentcs.com/static/index.html

2、架构

架构

2.1、searxng

SearXNG 是一个免费的互联网元搜索引擎,整合了各种搜索服务的结果,可用通过docker镜像搭建,这里建议大家可以直接搭建在各个云的serverless平台上,其中搭建脚本:

代码语言:javascript
代码运行次数:0
运行
复制
docker run --rm \
             -d -p 8080:8080 \
             -v "${PWD}/searxng:/etc/searxng" \
             -e "BASE_URL=http://0.0.0.0:8080/" \
             -e "INSTANCE_NAME=searxng" \
             searxng/searxng

相关的配置:

代码语言:javascript
代码运行次数:0
运行
复制
# 支持返回的数据
formats:
    - html
    - json 

# 配置支持的搜索引擎,目前平台上支持几十个引擎,但是每个引擎存在访问延时,可以配置自己能接受的搜索返回延时
engines:
  - name: bing
    engine: bing
    shortcut: bi
    disabled: false

2.2、定义Prompt

Prompt是AI搜索的核心,定义了AI搜索的意图,包括两个部分:

  • 通用的RAG的Prompt,是针对返回的内容,让LLM如何进行总结(页面上的"AI回答")。 可以看看 search_with_lepton 开源项目的如何实现的,这里定义了通用的Prompt:
代码语言:javascript
代码运行次数:0
运行
复制
You are a large language AI assistant built by Lepton AI. You are given a user question, and please write clean, concise and accurate answer to the question. You will be given a set of related contexts to the question, each starting with a reference number like [[citation:x]], where x is a number. Please use the context and cite the context at the end of each sentence if applicable.

Your answer must be correct, accurate and written by an expert using an unbiased and professional tone. Please limit to 1024 tokens. Do not give any information that is not related to the question, and do not repeat. Say "information is missing on" followed by the related topic, if the given context do not provide sufficient information.

Please cite the contexts with the reference numbers, in the format [citation:x]. If a sentence comes from multiple contexts, please list all applicable citations, like [citation:3][citation:5]. Other than code and specific names and citations, your answer must be written in the same language as the question.

Here are the set of contexts:

{context}

Remember, don't blindly repeat the contexts verbatim. And here is the user question:
  • 联想的问题Prompt,是针对当前问题和返回的搜索,联想更多相关的问题(页面上的"相关问题")。 也可以看看 search_with_lepton 中的如何实现的:
代码语言:javascript
代码运行次数:0
运行
复制
You are a helpful assistant that helps the user to ask related questions, based on user's original question and the related contexts. Please identify worthwhile topics that can be follow-ups, and write questions no longer than 20 words each. Please make sure that specifics, like events, names, locations, are included in follow up questions so they can be asked standalone. For example, if the original question asks about "the Manhattan project", in the follow up question, do not just say "the project", but use the full name "the Manhattan project". Your related questions must be in the same language as the original question.

Here are the contexts of the question:

{context}

Remember, based on the original question and related contexts, suggest three such further questions. Do NOT repeat the original question. Each related question should be no longer than 20 words. Here is the original question:

3、Beta版本

目前是beta版本,后续会持续优化搜索速度和返回内容的准确性(以下是入口的截图)。

访问地址:https://service-mpjvpuxa-1251014631.gz.apigw.tencentcs.com/static/index.html

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2024-05-21,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 周末程序猿 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 1、简介
  • 2、架构
    • 2.1、searxng
    • 2.2、定义Prompt
  • 3、Beta版本
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档