获得将字符串A转换为字符串B所需的编辑的最佳API是什么?
A: This is a sentence
B: This was a sentences
就像这样
编辑:
我知道我可以使用Levenshtein using自己来完成这个任务,但是如果有人知道我可以在java中使用的任何NLP/String Util或API,请帮助我,我想使用一个行业标准。
Spacey errant
是一个选项,但这是针对python的。我正在寻找类似于java的东西。
发布于 2022-02-01 08:22:43
备选案文1
使用差分匹配贴片库。它提供了以下特点:
基于实例的这
import java.util.LinkedList;
import name.fraser.neil.plaintext.diff_match_patch;
public class Test {
public static void main(String args[]) {
diff_match_patch dmp = new diff_match_patch();
LinkedList<diff_match_patch.Diff> diff = dmp.diff_main("This is a sentence", "This was a sentences");
dmp.diff_cleanupSemantic(diff);
System.out.println(diff);
}
}
输出:
[Diff(EQUAL,"This "), Diff(DELETE,"i"), Diff(INSERT,"wa"), Diff(EQUAL,"s a sentence"), Diff(INSERT,"s")]
选项2
使用Diff Utils库,它是来自Google代码存档的java-diff-utils的一个积极维护的分支。Diff Utils允许文本之间的比较操作:计算差异、应用补丁等。
示例(关于更多示例,请看这里):
import com.github.difflib.text.*;
import java.util.*;
public class Main {
public static void main(String[] args) {
// create a configured DiffRowGenerator
DiffRowGenerator generator = DiffRowGenerator.create().showInlineDiffs(true).mergeOriginalRevised(true)
.inlineDiffByWord(true).oldTag(f -> "~") // introduce markdown style for strikethrough
.newTag(f -> "**") // introduce markdown style for bold
.build();
// compute the differences for two test texts.
List<DiffRow> rows = generator.generateDiffRows(Arrays.asList("This is a sentence"),
Arrays.asList("This was a sentences"));
System.out.println(rows.get(0).getOldLine());
}
}
输出:
This ~is~**was** a ~sentence~**sentences**
选项3
创建您自己的Diff实用程序,如描述的这里工具。下面的代码如上面的源代码所示,并使用最长公共子序列(LCS) --这是编辑距离 (Levenshtein距离)的变体--来解决这个问题。
示例:
public class Main {
// Function to display the differences between two strings
public static void diff(String X, String Y, int m, int n, int[][] lookup) {
// if the last character of `X` and `Y` matches
if (m > 0 && n > 0 && X.charAt(m - 1) == Y.charAt(n - 1)) {
diff(X, Y, m - 1, n - 1, lookup);
System.out.print(" " + X.charAt(m - 1));
}
// if the current character of `Y` is not present in `X`
else if (n > 0 && (m == 0 || lookup[m][n - 1] >= lookup[m - 1][n])) {
diff(X, Y, m, n - 1, lookup);
System.out.print(" +" + Y.charAt(n - 1));
}
// if the current character of `X` is not present in `Y`
else if (m > 0 && (n == 0 || lookup[m][n - 1] < lookup[m - 1][n])) {
diff(X, Y, m - 1, n, lookup);
System.out.print(" -" + X.charAt(m - 1));
}
}
// Function to fill the lookup table by finding the length of LCS
// of substring X[0…m-1] and Y[0…n-1]
public static int[][] findLCS(String X, String Y, int m, int n) {
// lookup[i][j] stores the length of LCS of substring X[0…i-1] and Y[0…j-1]
int[][] lookup = new int[X.length() + 1][Y.length() + 1];
// first column of the lookup table will be all 0
for (int i = 0; i <= m; i++) {
lookup[i][0] = 0;
}
// first row of the lookup table will be all 0
for (int j = 0; j <= n; j++) {
lookup[0][j] = 0;
}
// fill the lookup table in a bottom-up manner
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
// if current character of `X` and `Y` matches
if (X.charAt(i - 1) == Y.charAt(j - 1)) {
lookup[i][j] = lookup[i - 1][j - 1] + 1;
}
// otherwise, if the current character of `X` and `Y` don't match
else {
lookup[i][j] = Integer.max(lookup[i - 1][j], lookup[i][j - 1]);
}
}
}
return lookup;
}
// Implement diff utility in Java
public static void main(String[] args) {
String X = "This is a sentence";
String Y = "This was a sentences";
// lookup[i][j] stores the length of LCS of substring X[0…i-1] and Y[0…j-1]
int[][] lookup = findLCS(X, Y, X.length(), Y.length());
// find the difference
diff(X, Y, X.length(), Y.length(), lookup);
}
}
输出:
T h i s -i +w +a s a s e n t e n c e +s
https://stackoverflow.com/questions/70931155
复制相似问题