首先KM和TAPD都是通过后台给前端生成一个js文件，由前端去做模糊查找，这样比较经济实惠，没有后台查询损耗，定期更新js文件即可。但是我们的业务场景有通过 企业服务号 使用的需求，虽然我们已接入了 移动网关 但毕竟在外网可访问的前提下，前端能够获取所有员工信息太危险了，最后决定用接口实现。

Staffs表结构&查询接口

将员工数据导入到MySQL之后，共有 59066 条数据，大家重点关注 rtx，pinyin，chn_name 这三个字段，其他都是辅助字段，不参与查询业务。

mysql> select * from staffs limit 5;
+----+----------+--------------+----------+------------+---------------------+
| id | rtx      | pinyin       | chn_name | searchable | updated_at          |
+----+----------+--------------+----------+------------+---------------------+
|  1 | 5001     | TAPDkefu     | TAPD??   |          1 | 2018-05-11 13:32:44 |
|  2 | aachen   | chenfangying | ???      |          1 | 2018-05-11 13:32:44 |
|  3 | aalizzlu | lujianbin    | ???      |          1 | 2018-05-11 13:32:44 |
|  4 | aamli    | lixinyun     | ???      |          1 | 2018-05-11 13:32:44 |
|  5 | aaqin    | qinjian      | ??       |          1 | 2018-05-11 13:32:44 |
+----+----------+--------------+----------+------------+---------------------+
5 rows in set (0.00 sec)

接口设计

1526286695_53_w1407_h621.png

成功示例：

{
    "ret": 0,
    "msg": "ok",
    "result": [
        {
            "rtx": "andychai",
            "pinyin": "chaiyanlin",
            "chn_name": "柴延林"
        },
        {
            "rtx": "andyche",
            "pinyin": "chemaocheng",
            "chn_name": "车懋成"
        },
        {
            "rtx": "candychcao",
            "pinyin": "caohui",
            "chn_name": "曹慧"
        },
        {
            "rtx": "mandycheng",
            "pinyin": "chengwen",
            "chn_name": "程雯"
        },
        {
            "rtx": "nandychen",
            "pinyin": "chennan",
            "chn_name": "陈楠"
        }
    ]
}

简单粗暴一把搜（LIKE %%）

当用户请求：

GET /staffs?q=andy

后台执行sql:

select * from staffs where rtx like '%andy%' or pinyin like '%andy%' or chn_name like '%andy%' limit 5;

别笑！真的别笑！你以为很慢吗？

mysql> select * from staffs where rtx like '%andy%' or pinyin like '%andy%' or chn_name like '%andy%' limit 5;
+------+-------------+-----------+----------+------------+---------------------+
| id   | rtx         | pinyin    | chn_name | searchable | updated_at          |
+------+-------------+-----------+----------+------------+---------------------+
|  214 | acandysui   | suiyanli  | ???      |          1 | 2018-05-11 13:32:44 |
|  871 | alee        | AndyLee   | AndyLee  |          1 | 2018-05-11 13:32:44 |
| 1895 | andyabhuang | huanganbu | ???      |          1 | 2018-05-11 13:32:44 |
| 1896 | andyao      | aonaiyong | ???      |          1 | 2018-05-11 13:32:44 |
| 1897 | andyawang   | wangang   | ??       |          1 | 2018-05-11 13:32:44 |
+------+-------------+-----------+----------+------------+---------------------+
5 rows in set (0.01 sec)

没错用了LIKE，而且是%%，这种查询只能遍历全表：

explain select * from staffs where rtx like '%andy%' or pinyin like '%andy%' or chn_name like '%andy%' limit 5;
+----+-------------+--------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+----+-------------+--------+------+---------------+------+---------+------+-------+-------------+
|  1 | SIMPLE      | staffs | ALL  | NULL          | NULL | NULL    | NULL | 59091 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+-------+-------------+

数据库设计准则里，几乎都在不厌其烦的强调不要用LIKE！那么你能怎么办呢？我总不能为了这去去6万不到数据去弄个搜索引擎进来吧？这就是传说中的大炮打蚊子啊！

MySQL全文索引

首先检查你用的MySQL的版本，最好是5.6+。因为InnoDB引擎对FULLTEXT索引的支持是MySQL5.6新引入的特性，之前只有MyISAM引擎支持FULLTEXT索引。

MySQL支持三种模式的全文检索模式：自然语言模式（IN NATURAL LANGUAGE MODE），即通过MATCH AGAINST 传递某个特定的字符串来进行检索。布尔模式（IN BOOLEAN MODE），可以为检索的字符串增加操作符，例如“+”表示必须包含，“-”表示不包含，“*”表示通配符（这种情况，即使传递的字符串较小或出现在停词中，也不会被过滤掉），其他还有很多特殊的布尔操作符，可以通过如下参数控制：查询扩展模式（WITH QUERY EXPANSION）, 这种模式是自然语言模式下的一个变种，会执行两次检索，第一次使用给定的短语进行检索，第二次是结合第一次相关性比较高的行进行检索。

但是！对于中文的分词问题，依然没有解决，英文的分词是能够通过空格去区分，而中文名是没空格，所以全文索引只能帮助我们加快rtx和pinyin的查询，中文名依旧无奈

建立全文索引

现在我们来看看加上全文索引之后的表结构：

CREATE TABLE IF NOT EXISTS `ieg_briefs`.`staffs` (
  `id` BIGINT(13) NOT NULL AUTO_INCREMENT,
  `rtx` VARCHAR(45) NOT NULL COMMENT '员工rtx',
  `pinyin` VARCHAR(45) NOT NULL COMMENT '员工姓名拼音',
  `chn_name` VARCHAR(45) NOT NULL COMMENT '员工中文名',
  `searchable` TINYINT(3) NOT NULL DEFAULT 1 COMMENT '是否可检索：1可以，0不可以',
  `updated_at` TIMESTAMP NOT NULL COMMENT '数据更新时间',
  PRIMARY KEY (`id`),
  UNIQUE INDEX `rtx_UNIQUE` (`rtx` ASC),
  FULLTEXT INDEX `fuzzy_search` (`rtx` ASC, `pinyin` ASC),
  INDEX `searchable` (`searchable` ASC),
  INDEX `updated_at` (`updated_at` ASC))
ENGINE = InnoDB

mysql> show indexes from staffs;
+--------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table  | Non_unique | Key_name     | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| staffs |          0 | PRIMARY      |            1 | id          | A         |       59091 |     NULL | NULL   |      | BTREE      |         |               |
| staffs |          0 | rtx_UNIQUE   |            1 | rtx         | A         |       59091 |     NULL | NULL   |      | BTREE      |         |               |
| staffs |          1 | searchable   |            1 | searchable  | A         |           2 |     NULL | NULL   |      | BTREE      |         |               |
| staffs |          1 | updated_at   |            1 | updated_at  | A         |           2 |     NULL | NULL   |      | BTREE      |         |               |
| staffs |          1 | fuzzy_search |            1 | rtx         | NULL      |       59091 |     NULL | NULL   |      | FULLTEXT   |         |               |
| staffs |          1 | fuzzy_search |            2 | pinyin      | NULL      |       59091 |     NULL | NULL   |      | FULLTEXT   |         |               |
+--------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

为了看的清晰，我把建表语句也贴出来了，大家主需要看重点 FULLTEXT INDEX fuzzy_search (**rtx** ASC, pinyin ASC)，我把rtx和拼音字段做了一个全文索引，命名为fuzzy_search。那么在这个时候对应的查询语句就是：

mysql> select * from staffs where match(rtx, pinyin) against('*andy*' IN BOOLEAN MODE) limit 5;
+------+-------------+-----------+----------+------------+---------------------+
| id   | rtx         | pinyin    | chn_name | searchable | updated_at          |
+------+-------------+-----------+----------+------------+---------------------+
|  871 | alee        | AndyLee   | AndyLee  |          1 | 2018-05-11 13:32:44 |
| 1895 | andyabhuang | huanganbu | ???      |          1 | 2018-05-11 13:32:44 |
| 1896 | andyao      | aonaiyong | ???      |          1 | 2018-05-11 13:32:44 |
| 1897 | andyawang   | wangang   | ??       |          1 | 2018-05-11 13:32:44 |
| 1898 | andybi      | bisheng   | ??       |          1 | 2018-05-11 13:32:44 |
+------+-------------+-----------+----------+------------+---------------------+
5 rows in set (0.00sec)

mysql> explain select * from staffs where match(rtx, pinyin) against('*andy*' IN BOOLEAN MODE) limit 5;
+----+-------------+--------+----------+---------------+--------------+---------+------+------+-------------+
| id | select_type | table  | type     | possible_keys | key          | key_len | ref  | rows | Extra       |
+----+-------------+--------+----------+---------------+--------------+---------+------+------+-------------+
|  1 | SIMPLE      | staffs | fulltext | fuzzy_search  | fuzzy_search | 0       | NULL |    1 | Using where |
+----+-------------+--------+----------+---------------+--------------+---------+------+------+-------------+
1 row in set (0.01 sec)

模糊匹配策略

如上，索引完美命中，效率喜人，那么中文怎么办？全文索引解决不了中文分词，不过人是活的嘛~要会变通~在代码层，我们可做出如下逻辑：

如果用户输入参数不包含中文，则默认其搜索rtx或拼音，使用全文索引查询；
如果用户输入参数包含中文，则使用LIKE %%查询中文名字段；
关键字查询接口返回结果做Redis缓存，缓存时间为120分钟；

所以我的解决方式：使用全文索引优化rtx和拼音的模糊查询，中文的模糊查询继续使用**LIKE %%**，最后再加一个Redis缓存。介于IEG市场部也就200多号人，这已经完全够用了。

<?php

namespace App\Criteria;

use Prettus\Repository\Criteria\RequestCriteria;
use Prettus\Repository\Contracts\RepositoryInterface;

/**
 * Class StaffSearchCriteriaCriteria.
 *
 * @package namespace App\Criteria;
 */
class StaffSearchCriteriaCriteria extends RequestCriteria
{
    /**
     * Apply criteria in query repository
     *
     * @param string              $model
     * @param RepositoryInterface $repository
     *
     * @return mixed
     */
    public function apply($model, RepositoryInterface $repository)
    {
        $str = $this->request->get('q', null);
        $model = $model->where('searchable', 1);
        // 含中文只搜索chn_name
        if (preg_match("/[\x7f-\xff]/", $str)) {
            $model = $model->where('chn_name', 'like', "%$str%");
        } else {
            // IN BOOLEAN MODE
            $str = "*$str*";
            $model = $model->whereRaw("MATCH (rtx, pinyin) AGAINST (? IN BOOLEAN MODE)", [$str]);
        }
        return $model->select(['rtx', 'pinyin', 'chn_name'])->take(5);
    }
}

最后前端同学别忘了加上debounce防抖动函数~

坚持不用LIKE的方案

如果你说我坚持不用LIKE！好兄弟！有理想！

字段内容是中文，没法做全文索引，但是有变通的办法，就是将整句的中文分词，并按urlencode、区位码、base64、拼音等进行编码使之以"字母+数字"的方式存储于数据库中。

你有张表里面有个title字段，可以再加一个字段叫title_pinyin：

e.g.

| title | title_pinyin

| ------------ |

| 刺激战场 | ci ji zhan chang

如此你便可以继续使用全文索引了，查询的时候需要将中文转成拼音，再使用全文索引match的方式进行查询。

还是渐进式扩展的好

相信大多数的情况是，前期数据量少，即使用LIKE也无伤大雅，但随着数据量增多，扫描全表必然不是长久之计。所以我个人还是建议，我们开发不要过度设计，思考可以很长远，但做可以只做一点点。中后期引入搜索引擎，一劳永逸的解决问题。

搜索引擎

这里帮助大家罗列能够快速入门的全文检索引擎。

mysql性能调优

MySQL模糊查询性能优化

结论写在最前面

业务场景分析

业务背景

Staffs表结构&查询接口

简单粗暴一把搜（LIKE %%）

MySQL全文索引

建立全文索引

模糊匹配策略

坚持不用LIKE的方案

还是渐进式扩展的好

搜索引擎

Sphinx

Elasticsearch

RediSearch

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐