文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在大字符串中从大数组中找到精确或最接近的匹配？

问如何在大字符串中从大数组中找到精确或最接近的匹配？
EN

Stack Overflow用户

提问于 2020-01-13 21:50:24

回答 1查看 147关注 0票数 0

请考虑以下几点。我有一个非常大的数组，其中包含给定国家的所有道路名称，按字符串长度排序，如下所示：

$roadNames = ['Ivy Lane','East Road','The Maltings','Greenhill Road', 'Woodlands Close']; //And many, many more

现在我要在长字符串中寻找一个完全匹配的

$string = "
..... ALOT OF TEXT ..... 
..... ALOT OF TEXT ..... 

You can find us at: Greenhill Road 1, 11111, The City 

..... ALOT OF TEXT .....
..... ALOT OF TEXT ..... 
";

为了找到精确的匹配，女巫是相当容易的，我只做以下几点：

foreach ($roadNames as $roadName) {
    if(stripos($string, $roadName) !== false){
        echo 'Exact match: '.$roadName;
        break;
    }
}

但是如果道路名被一个字母fx拼错了呢？一个额外的空格/一个空格丢失，一个字母少/多，或者一个字母错了。外汇。“绿化道”、“绿化道”、"GreenhillRoad“、”绿化山道“、”绿化路“？如果$string中的道路名称是其中之一，那么现在如何才能找到数组中所有道路名称的最佳匹配呢？有什么数学方法吗？或者也许我能买个雷杰斯？

我在想这样的事情，虽然看起来有点过分(而且不起作用)。

foreach ($roadNames as $roadName) {
    if (stripos($string, $roadName)) {
        echo 'Exact match: ' . $roadName;
        break;
    } else {
        $alphabet = range('a', 'z');
        $alphabet[] = ' ';
        $roadName_split = str_split($roadName);
        $test_array = array();
        foreach ($roadName_split as $strpos => $letter) {
            foreach ($alphabet as $letter_in_alphabet) {
                $test_array[] = $letter_in_alphabet;
            }
            foreach ($test_array as $key => $value) {
                $test_array[$key] .= substr($roadName, $strpos, 1);
            }
        }
        echo '<pre>';
        print_r($test_array);
        echo '</pre>';
        die;
        foreach ($test_array as $misplled_value) {
            if (stripos($string, $misplled_value)) {
                echo 'close match found: ' . $roadName;
                break;
            }
        }
//        OR Some kind of a regex, dont know how it should be
//        $roadName_split = str_split($roadName);
//        $re = '';
//        foreach ($roadName_split as $strpos => $letter) {
//            $re .= "$letter?";
//        }
    }
}

php

math

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-01-14 23:38:00

我终于找到了一个有灵感的解决方案：https://stackoverflow.com/a/1720798/2192013

我从每个roadName中做一个正则表达式，允许一个字母被拼错，一个字母或空格，太短或太多，使用这个函数：

function generateRegex($word) {
    $len = strlen($word);
    $regex = "/(($word)";
    $word_split = str_split($word);
    foreach ($word_split as $strpos => $letter) {
        $temp = $word;
        $temp[$strpos - 1] = '.';
        if ($strpos === 0) {
            $temp1 = substr($word, ($strpos + 1));
            $temp2 = '.' . substr($word, $strpos);
        } else {
            $temp1 = substr($word, 0, $strpos) . substr($word, ($strpos + 1));
            $temp2 = substr($word, 0, $strpos) . '.' . substr($word, $strpos);
        }
        $regex .= "|($temp)";
        $regex .= "|($temp1)";
        $regex .= "|($temp2)";
    }
    $regex = $regex . ")/mi";
    return $regex;
}

$temp注意一个字母拼写错误(Ex )。并且正在用一个点代替一个字母，就像possible
$temp1处理一个字母/空格太短的次数一样。格伦希尔路或GreenhillRoad)，并移除了一个字母，就像possible
$temp2过多地处理一个字母/空间一样(Ex )。绿色山路或格林希尔路)，并在每个字母上添加一个点，尽可能多地添加

最后的脚本如下所示：

function generateRegex($word) {
    $len = strlen($word);
    $regex = "/(($word)";
    $word_split = str_split($word);
    foreach ($word_split as $strpos => $letter) {
        $temp = $word;
        $temp[$strpos - 1] = '.';
        if ($strpos === 0) {
            $temp1 = substr($word, ($strpos + 1));
            $temp2 = '.' . substr($word, $strpos);
        } else {
            $temp1 = substr($word, 0, $strpos) . substr($word, ($strpos + 1));
            $temp2 = substr($word, 0, $strpos) . '.' . substr($word, $strpos);
        }
        $regex .= "|($temp)";
        $regex .= "|($temp1)";
        $regex .= "|($temp2)";
    }
    $regex = $regex . ")/mi";
    return $regex;
}

function findBestMatch($roadNames, $string) {
    foreach ($roadNames as $roadName) {
        if (stripos($string, $roadName)) {
            $return = 'Exact match found: ' . $roadName;
            break;
        } else {
            $re = generateRegex($roadName);
            if (preg_match($re, $string)) {
                $return = 'Close match found: ' . $roadName;
                break;
            }
        }
    }
    if (!isset($return) OR empty($return)) {
        return 'Match not found';
    } else {
        return $return;
    }
}

$roadNames = ['Greenhill Road', 'Ivy Lane', 'East Road', 'The Maltings', 'Woodlands Close']; //And many, many more
$misspelled_examples = ['Greenhill Road', 'Crennhill Road', 'Green hill Road', 'Greennhill Road', 'GreenhillRoad', 'Grenhill Road', 'GrenhilRoad'];

foreach ($misspelled_examples as $value) {
    $strings[] = "
..... ALOT OF TEXT ..... 
..... ALOT OF TEXT ..... 

You can find us at: $value 1, 11111, The City 

..... ALOT OF TEXT .....
..... ALOT OF TEXT ..... 
";
}

foreach ($strings as $key => $string) {
    echo 'Input road-name: ' . $misspelled_examples[$key];
    echo '<br>';
    echo findBestMatch($roadNames, $string);
    echo '<hr>';
}

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/59724674

复制

相似问题

问如何在大字符串中从大数组中找到精确或最接近的匹配？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在大字符串中从大数组中找到精确或最接近的匹配？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在大字符串中从大数组中找到精确或最接近的匹配？
EN