# 编辑距离

``` class LevenshteinDistanceFunction {

private final boolean isCaseSensitive;

public LevenshteinDistanceFunction(boolean isCaseSensitive) {
this.isCaseSensitive = isCaseSensitive;
}

public int distance(CharSequence left, CharSequence right) {
int leftLength = left.length(), rightLength = right.length();

// special cases.
if (leftLength == 0)
return rightLength;
if (rightLength == 0)
return leftLength;

// Use the iterative matrix method.
int[] currentRow = new int[rightLength + 1];
int[] nextRow    = new int[rightLength + 1];

// Fill first row with all edit counts.
for (int i = 0; i <= rightLength; i++)
currentRow[i] = i;

for (int i = 1; i <= leftLength; i++) {
nextRow[0] = i;

for(int j = 1; j <= rightLength; j++) {
int subDistance = currentRow[j - 1]; // Distance without insertions or deletions.
if (!charEquals(left.charAt(i - 1), right.charAt(j - 1), isCaseSensitive))
subDistance++; // Add one edit if letters are different.
nextRow[j] = Math.min(Math.min(nextRow[j - 1], currentRow[j]) + 1, subDistance);
}

// Swap rows, use last row for next row.
int[] t = currentRow;
currentRow = nextRow;
nextRow = t;
}

return currentRow[rightLength];
}

}```

# BK树

1973年，Burkhard和Keller提出的BK树有效地解决了这个问题。BK树的核心思想是：

```令d(x,y)表示字符串x到y的Levenshtein距离，那么显然：
d(x,y) = 0 当且仅当 x=y （Levenshtein距离为0 <==> 字符串相等）
d(x,y) = d(y,x) （从x变到y的最少步数就是从y变到x的最少步数）
d(x,y) + d(y,z) >= d(x,z) （从x变到z所需的步数不会超过x先变成y再变成z的步数）```

## BK 实现

```public boolean add(T t) {
if (t == null)
throw new NullPointerException();

if (rootNode == null) {
rootNode = new Node<>(t);
length = 1;
modCount++; // Modified tree by adding root.
return true;
}

Node<T> parentNode = rootNode;
Integer distance;
while ((distance = distanceFunction.distance(parentNode.item, t)) != 0
|| !t.equals(parentNode.item)) {
Node<T> childNode = parentNode.children.get(distance);
if (childNode == null) {
parentNode.children.put(distance, new Node<>(t));
length++;
modCount++; // Modified tree by adding a child.
return true;
}
parentNode = childNode;
}

return false;
}```

``` public List<SearchResult<T>> search(T t, int radius) {
if (t == null)
return Collections.emptyList();
ArrayList<SearchResult<T>> searchResults = new ArrayList<>();
ArrayDeque<Node<T>> nextNodes = new ArrayDeque<>();
if (rootNode != null)

while(!nextNodes.isEmpty()) {
Node<T> nextNode = nextNodes.poll();
int distance = distanceFunction.distance(nextNode.item, t);
int lowBound = Math.max(0, distance - radius), highBound = distance + radius;
for (Integer i = lowBound; i <= highBound; i++) {
if (nextNode.children.containsKey(i))
}
}

searchResults.trimToSize();
Collections.sort(searchResults);
return Collections.unmodifiableList(searchResults);
}```

# 使用BK树做文本纠错

```  static void outputSearchResult( List<SearchResult<CharSequence>> results){
for(SearchResult<CharSequence> item : results){
System.out.println(item.item);
}
}

static void test(BKTree<CharSequence> tree,String word){
System.out.println(word+"的最相近结果：");
outputSearchResult(tree.search(word,Math.max(1,word.length()/4)));
}

public static void main(String[] args) {

BKTree<CharSequence> tree = new BKTree(DistanceFunctions.levenshteinDistance());
System.out.println("词典条数："+testStrings.size());
long startTime = System.currentTimeMillis();
for(String testStr: testStrings){
}
System.out.println("建树耗时："+(System.currentTimeMillis()-startTime)+"ms");
startTime = System.currentTimeMillis();
String[] testWords = new String[]{
"湄公河凶案",
"葫芦丝兄弟",
"少林足球"
};

for (String testWord: testWords){
test(tree,testWord);
}
System.out.println("测试耗时："+(System.currentTimeMillis()-startTime)+"ms");
}```

```词典条数：18513

102 篇文章37 人订阅

0 条评论

## 相关文章

### 2843 拯救炜哥

2843 拯救炜哥 时间限制: 2 s 空间限制: 128000 KB 题目等级 : 黄金 Gold 题目描述 Description 有一天，炜...

2755

3065

4608

2724

33312

### Python OJ 从入门到入门基础练习 10 题

OJ 是 Online Judge 系统的简称，用来在线检测程序源代码的正确性 1、天天向上的力量： 一年365天，以第1天的能力值为基数，记为1.0。当好好学...

47411

4043

3337

1753

3316