首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >MYsql全文查询产生意想不到的排名;为什么?

MYsql全文查询产生意想不到的排名;为什么?
EN

Stack Overflow用户
提问于 2012-09-11 05:59:24
回答 2查看 1.3K关注 0票数 3

我正在尝试全文搜索与标签,但它不能正常工作,我请检查附件图片

查询为:

代码语言:javascript
运行
复制
 SELECT *, 
         MATCH(tags) AGAINST ('tag3 tag6 tag4') AS score 
    FROM items
ORDER BY score DESC

为什么分数在正确的顺序字段中没有排序?如果您选中,第二行包含我搜索的所有标签,而第一个字段没有tag3关键字。

我的意思是id字段的顺序应该是: 5,1,2 ..etc而不是1,5,2..etc

我的错误在哪里?

然后我想先在标签字段中搜索,如果没有结果,我想在描述字段中搜索与全文相同的关键字,这样用户将在标签和描述中搜索如果标签不匹配,是否可以在相同的查询中或我需要两个单独的查询?

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2012-09-11 06:23:47

在这个文档http://dev.mysql.com/doc/refman/5.0/en/fulltext-natural-language.html中,它说“对于非常小的表格,单词分布不能充分反映它们的语义值,这个模型有时可能会产生奇怪的结果。”

如果您的items表很小--例如一个示例表--那么您可能遇到了这个问题,并得到了一个“奇怪”的结果。

您可能希望尝试此查询IN BOOLEAN MODE,看看您的结果是否与您的预测相符。尝尝这个。

代码语言:javascript
运行
复制
    SELECT *, 
           MATCH(tags) AGAINST ('tag3 tag6 tag4' IN BOOLEAN MODE) AS score 
      FROM items
  ORDER BY score DESC

布尔模式禁用单词分布排名。请注意,您应该了解自然语言模式和布尔模式之间的区别,一旦您有了合适大小的表,就应该明智地选择使用哪种模式。如果您正在搜索博客中包含的标签类型,那么Boolean可能是个不错的选择。

票数 2
EN

Stack Overflow用户

发布于 2012-09-11 06:58:06

首先,这是在我的Windows7机器上加载到MySQL 5.5.12中的示例数据

代码语言:javascript
运行
复制
mysql> DROP DATABASE IF EXISTS lspuk;
Query OK, 1 row affected (0.00 sec)

mysql> CREATE DATABASE lspuk;
Query OK, 1 row affected (0.00 sec)

mysql> USE lspuk
Database changed
mysql> CREATE TABLE items
    -> (
    ->     id int not null auto_increment,
    ->     description VARCHAR(30),
    ->     tags VARCHAR(30),
    ->     primary key (id),
    ->     FULLTEXT tags_ftndx (tags)
    -> ) ENGINE=MyISAM;
Query OK, 0 rows affected (0.04 sec)

mysql> INSERT INTO items (description,tags) VALUES
    -> ('the first' ,'tag1 tag3 tag4'),
    -> ('the second','tag5 tag1 tag2'),
    -> ('the third' ,'tag5 tag1 tag9'),
    -> ('the fourth','tag5 tag6 tag2'),
    -> ('the fifth' ,'tag4 tag3 tag6'),
    -> ('the sixth' ,'tag2 tag3 tag6');
Query OK, 6 rows affected (0.00 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql>

请看一下MySQL中标签填充的方式:

代码语言:javascript
运行
复制
mysql> SELECT 'tag1',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag1%' UNION
    -> SELECT 'tag2',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag2%' UNION
    -> SELECT 'tag3',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag3%' UNION
    -> SELECT 'tag4',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag4%' UNION
    -> SELECT 'tag5',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag5%' UNION
    -> SELECT 'tag6',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag6%' UNION
    -> SELECT 'tag9',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag9%';
+------+-----------+
| tag1 | tag_count |
+------+-----------+
| tag1 |         3 |
| tag2 |         3 |
| tag3 |         3 |
| tag4 |         2 |
| tag5 |         3 |
| tag6 |         3 |
| tag9 |         1 |
+------+-----------+
7 rows in set (0.00 sec)

mysql>

仔细观察,请注意以下事实:

  1. 每行恰好有3个标签
  2. 请求标签的顺序与每个标签的存在数量似乎决定了

的得分

如果删除tag4并运行查询,则根本得不到任何分数

代码语言:javascript
运行
复制
mysql> SELECT *,MATCH(tags) AGAINST ('tag3 tag6') as score FROM items ORDER BY score DESC;
+----+-------------+----------------+-------+
| id | description | tags           | score |
+----+-------------+----------------+-------+
|  1 | the first   | tag1 tag3 tag4 |     0 |
|  2 | the second  | tag5 tag1 tag2 |     0 |
|  3 | the third   | tag5 tag1 tag9 |     0 |
|  4 | the fourth  | tag5 tag6 tag2 |     0 |
|  5 | the fifth   | tag4 tag3 tag6 |     0 |
|  6 | the sixth   | tag2 tag3 tag6 |     0 |
+----+-------------+----------------+-------+
6 rows in set (0.00 sec)

评估方法似乎基于令牌字段的平均数量和特定顺序中特定值的存在和/或不存在影响评分。如果您应用不同的评分和标签规范样式,请注意不同的分数:

代码语言:javascript
运行
复制
mysql> SELECT *,MATCH(tags) AGAINST ('tag3 tag6 tag4') as score FROM items ORDER BY score DESC;
+----+-------------+----------------+--------------------+
| id | description | tags           | score              |
+----+-------------+----------------+--------------------+
|  1 | the first   | tag1 tag3 tag4 | 0.6700310707092285 |
|  5 | the fifth   | tag4 tag3 tag6 | 0.6700310707092285 |
|  2 | the second  | tag5 tag1 tag2 |                  0 |
|  3 | the third   | tag5 tag1 tag9 |                  0 |
|  4 | the fourth  | tag5 tag6 tag2 |                  0 |
|  6 | the sixth   | tag2 tag3 tag6 |                  0 |
+----+-------------+----------------+--------------------+
6 rows in set (0.00 sec)

mysql> SELECT *,MATCH(tags) AGAINST ('tag3 tag6 tag4' IN BOOLEAN MODE) as score FROM items ORDER BY score DESC;
+----+-------------+----------------+-------+
| id | description | tags           | score |
+----+-------------+----------------+-------+
|  5 | the fifth   | tag4 tag3 tag6 |     3 |
|  1 | the first   | tag1 tag3 tag4 |     2 |
|  6 | the sixth   | tag2 tag3 tag6 |     2 |
|  4 | the fourth  | tag5 tag6 tag2 |     1 |
|  2 | the second  | tag5 tag1 tag2 |     0 |
|  3 | the third   | tag5 tag1 tag9 |     0 |
+----+-------------+----------------+-------+
6 rows in set (0.00 sec)

mysql> SELECT *,MATCH(tags) AGAINST ('+tag3 +tag6 +tag4' IN BOOLEAN MODE) as score FROM items ORDER BY score DESC;
+----+-------------+----------------+-------+
| id | description | tags           | score |
+----+-------------+----------------+-------+
|  5 | the fifth   | tag4 tag3 tag6 |     1 |
|  1 | the first   | tag1 tag3 tag4 |     0 |
|  2 | the second  | tag5 tag1 tag2 |     0 |
|  3 | the third   | tag5 tag1 tag9 |     0 |
|  4 | the fourth  | tag5 tag6 tag2 |     0 |
|  6 | the sixth   | tag2 tag3 tag6 |     0 |
+----+-------------+----------------+-------+
6 rows in set (0.00 sec)

mysql>

解决方案似乎是先计算布尔模式分数,然后再计算非布尔模式分数,如下所示:

代码语言:javascript
运行
复制
SELECT *,
MATCH(tags) AGAINST ('tag3 tag6 tag4') as score1,
MATCH(tags) AGAINST ('+tag3 +tag6 +tag4' IN BOOLEAN MODE) as score2
FROM items ORDER BY score2 DESC, score1 DESC;

以下是针对示例数据的结果:

代码语言:javascript
运行
复制
mysql> SELECT *,
    -> MATCH(tags) AGAINST ('tag3 tag6 tag4') as score1,
    -> MATCH(tags) AGAINST ('+tag3 +tag6 +tag4' IN BOOLEAN MODE) as score2
    -> FROM items ORDER BY score2 DESC, score1 DESC;
+----+-------------+----------------+--------------------+--------+
| id | description | tags           | score1             | score2 |
+----+-------------+----------------+--------------------+--------+
|  5 | the fifth   | tag4 tag3 tag6 | 0.6700310707092285 |      1 |
|  1 | the first   | tag1 tag3 tag4 | 0.6700310707092285 |      0 |
|  2 | the second  | tag5 tag1 tag2 |                  0 |      0 |
|  3 | the third   | tag5 tag1 tag9 |                  0 |      0 |
|  4 | the fourth  | tag5 tag6 tag2 |                  0 |      0 |
|  6 | the sixth   | tag2 tag3 tag6 |                  0 |      0 |
+----+-------------+----------------+--------------------+--------+
6 rows in set (0.00 sec)

mysql>

或者你可以试着不使用加号

代码语言:javascript
运行
复制
mysql> SELECT *,
    -> MATCH(tags) AGAINST ('tag3 tag6 tag4') as score1,
    -> MATCH(tags) AGAINST ('tag3 tag6 tag4' IN BOOLEAN MODE) as score2
    -> FROM items ORDER BY score2 DESC, score1 DESC;
+----+-------------+----------------+--------------------+--------+
| id | description | tags           | score1             | score2 |
+----+-------------+----------------+--------------------+--------+
|  5 | the fifth   | tag4 tag3 tag6 | 0.6700310707092285 |      3 |
|  1 | the first   | tag1 tag3 tag4 | 0.6700310707092285 |      2 |
|  6 | the sixth   | tag2 tag3 tag6 |                  0 |      2 |
|  4 | the fourth  | tag5 tag6 tag2 |                  0 |      1 |
|  2 | the second  | tag5 tag1 tag2 |                  0 |      0 |
|  3 | the third   | tag5 tag1 tag9 |                  0 |      0 |
+----+-------------+----------------+--------------------+--------+
6 rows in set (0.00 sec)

mysql>

无论哪种方式,您都必须同时合并布尔模式和非布尔模式。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/12360047

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档