请考虑以下几点。我有一个非常大的数组,其中包含给定国家的所有道路名称,按字符串长度排序,如下所示:
$roadNames = ['Ivy Lane','East Road','The Maltings','Greenhill Road', 'Woodlands Close']; //And many, many more
现在我要在长字符串中寻找一个完全匹配的
$string = "
..... ALOT OF TEXT .....
..... ALOT OF TEXT .....
You can find us at: Greenhill Road 1, 11111, The City
..... ALOT OF TEXT .....
..... ALOT OF TEXT .....
";
为了找到精确的匹配,女巫是相当容易的,我只做以下几点:
foreach ($roadNames as $roadName) {
if(stripos($string, $roadName) !== false){
echo 'Exact match: '.$roadName;
break;
}
}
但是如果道路名被一个字母fx拼错了呢?一个额外的空格/一个空格丢失,一个字母少/多,或者一个字母错了。外汇。“绿化道”、“绿化道”、"GreenhillRoad“、”绿化山道“、”绿化路“?如果$string中的道路名称是其中之一,那么现在如何才能找到数组中所有道路名称的最佳匹配呢?有什么数学方法吗?或者也许我能买个雷杰斯?
我在想这样的事情,虽然看起来有点过分(而且不起作用)。
foreach ($roadNames as $roadName) {
if (stripos($string, $roadName)) {
echo 'Exact match: ' . $roadName;
break;
} else {
$alphabet = range('a', 'z');
$alphabet[] = ' ';
$roadName_split = str_split($roadName);
$test_array = array();
foreach ($roadName_split as $strpos => $letter) {
foreach ($alphabet as $letter_in_alphabet) {
$test_array[] = $letter_in_alphabet;
}
foreach ($test_array as $key => $value) {
$test_array[$key] .= substr($roadName, $strpos, 1);
}
}
echo '<pre>';
print_r($test_array);
echo '</pre>';
die;
foreach ($test_array as $misplled_value) {
if (stripos($string, $misplled_value)) {
echo 'close match found: ' . $roadName;
break;
}
}
// OR Some kind of a regex, dont know how it should be
// $roadName_split = str_split($roadName);
// $re = '';
// foreach ($roadName_split as $strpos => $letter) {
// $re .= "$letter?";
// }
}
}
发布于 2020-01-14 23:38:00
我终于找到了一个有灵感的解决方案:https://stackoverflow.com/a/1720798/2192013
我从每个roadName中做一个正则表达式,允许一个字母被拼错,一个字母或空格,太短或太多,使用这个函数:
function generateRegex($word) {
$len = strlen($word);
$regex = "/(($word)";
$word_split = str_split($word);
foreach ($word_split as $strpos => $letter) {
$temp = $word;
$temp[$strpos - 1] = '.';
if ($strpos === 0) {
$temp1 = substr($word, ($strpos + 1));
$temp2 = '.' . substr($word, $strpos);
} else {
$temp1 = substr($word, 0, $strpos) . substr($word, ($strpos + 1));
$temp2 = substr($word, 0, $strpos) . '.' . substr($word, $strpos);
}
$regex .= "|($temp)";
$regex .= "|($temp1)";
$regex .= "|($temp2)";
}
$regex = $regex . ")/mi";
return $regex;
}
最后的脚本如下所示:
function generateRegex($word) {
$len = strlen($word);
$regex = "/(($word)";
$word_split = str_split($word);
foreach ($word_split as $strpos => $letter) {
$temp = $word;
$temp[$strpos - 1] = '.';
if ($strpos === 0) {
$temp1 = substr($word, ($strpos + 1));
$temp2 = '.' . substr($word, $strpos);
} else {
$temp1 = substr($word, 0, $strpos) . substr($word, ($strpos + 1));
$temp2 = substr($word, 0, $strpos) . '.' . substr($word, $strpos);
}
$regex .= "|($temp)";
$regex .= "|($temp1)";
$regex .= "|($temp2)";
}
$regex = $regex . ")/mi";
return $regex;
}
function findBestMatch($roadNames, $string) {
foreach ($roadNames as $roadName) {
if (stripos($string, $roadName)) {
$return = 'Exact match found: ' . $roadName;
break;
} else {
$re = generateRegex($roadName);
if (preg_match($re, $string)) {
$return = 'Close match found: ' . $roadName;
break;
}
}
}
if (!isset($return) OR empty($return)) {
return 'Match not found';
} else {
return $return;
}
}
$roadNames = ['Greenhill Road', 'Ivy Lane', 'East Road', 'The Maltings', 'Woodlands Close']; //And many, many more
$misspelled_examples = ['Greenhill Road', 'Crennhill Road', 'Green hill Road', 'Greennhill Road', 'GreenhillRoad', 'Grenhill Road', 'GrenhilRoad'];
foreach ($misspelled_examples as $value) {
$strings[] = "
..... ALOT OF TEXT .....
..... ALOT OF TEXT .....
You can find us at: $value 1, 11111, The City
..... ALOT OF TEXT .....
..... ALOT OF TEXT .....
";
}
foreach ($strings as $key => $string) {
echo 'Input road-name: ' . $misspelled_examples[$key];
echo '<br>';
echo findBestMatch($roadNames, $string);
echo '<hr>';
}
https://stackoverflow.com/questions/59724674
复制相似问题