【JavaScript 算法】KMP算法：高效的字符串匹配

空白诗

发布于 2024-07-20 12:44:58

1540

发布于 2024-07-20 12:44:58

KMP算法（Knuth-Morris-Pratt Algorithm）是一种用于在文本中高效查找子串的字符串匹配算法。它通过预处理模式字符串，构建部分匹配表（又称为失配表），在匹配过程中避免重复扫描，从而提高匹配效率。本文将详细介绍KMP算法的原理、实现及其应用。

一、算法原理

KMP算法的核心思想是在匹配过程中利用已经匹配的部分信息来避免重复匹配。其主要步骤如下：

构建部分匹配表：对于模式字符串中的每个位置，计算在该位置之前的子串的最大前缀和后缀的长度。
字符串匹配：利用部分匹配表，在文本中查找模式字符串，如果发生失配，根据部分匹配表跳过一定的字符，而不是逐个字符地重新匹配。

部分匹配表的构建

部分匹配表记录了每个位置之前的子串的最大前缀和后缀的长度，用于在发生失配时跳过重复匹配的字符。

二、算法实现

构建部分匹配表

/**
 * 构建部分匹配表
 * @param {string} pattern - 模式字符串
 * @return {number[]} - 部分匹配表
 */
function buildPartialMatchTable(pattern) {
  const m = pattern.length;
  const table = Array(m).fill(0);
  let j = 0;

  for (let i = 1; i < m; i++) {
    while (j > 0 && pattern[i] !== pattern[j]) {
      j = table[j - 1];
    }
    if (pattern[i] === pattern[j]) {
      j++;
    }
    table[i] = j;
  }

  return table;
}

// 示例
const pattern = "ABABC";
const partialMatchTable = buildPartialMatchTable(pattern);
console.log(partialMatchTable); // 输出: [0, 0, 1, 2, 0]

KMP字符串匹配

/**
 * KMP字符串匹配算法
 * @param {string} text - 文本字符串
 * @param {string} pattern - 模式字符串
 * @return {number} - 模式字符串在文本中的起始位置，未找到返回 -1
 */
function kmpSearch(text, pattern) {
  const n = text.length;
  const m = pattern.length;
  const table = buildPartialMatchTable(pattern);
  let j = 0;

  for (let i = 0; i < n; i++) {
    while (j > 0 && text[i] !== pattern[j]) {
      j = table[j - 1];
    }
    if (text[i] === pattern[j]) {
      j++;
    }
    if (j === m) {
      return i - m + 1; // 匹配成功，返回起始位置
    }
  }

  return -1; // 未找到匹配
}

// 示例
const text = "ABABDABACDABABCABAB";
const result = kmpSearch(text, pattern);
console.log(result); // 输出: 10

注释说明：

构建部分匹配表：
- buildPartialMatchTable(pattern)：构建模式字符串的部分匹配表，返回一个数组，记录每个位置之前的子串的最大前缀和后缀的长度。
- let j = 0;：初始化前缀长度。
- for (let i = 1; i < m; i++)：遍历模式字符串，从第二个字符开始。
- while (j > 0 && pattern[i] !== pattern[j])：如果字符不匹配，更新前缀长度。
- if (pattern[i] === pattern[j])：如果字符匹配，前缀长度加1。
- table[i] = j;：更新部分匹配表。
KMP字符串匹配：
- kmpSearch(text, pattern)：在文本字符串中查找模式字符串，返回模式字符串在文本中的起始位置，未找到返回-1。
- const table = buildPartialMatchTable(pattern);：构建模式字符串的部分匹配表。
- for (let i = 0; i < n; i++)：遍历文本字符串。
- while (j > 0 && text[i] !== pattern[j])：如果字符不匹配，更新前缀长度。
- if (text[i] === pattern[j])：如果字符匹配，前缀长度加1。
- if (j === m)：如果前缀长度等于模式字符串长度，表示匹配成功，返回起始位置。