我有一个文本,其中我想计算出现的短语"lorem ipsum dolor“。
Lorem ipsum dolor与之同坐,是一种非常重要的植物。Ipsum lorem dolor Curabitur ac risus nunc.Dolor ipsum lorem.
即使搜索短语是以不同的顺序编写的,算法也应该计数出现的次数。我强调了预期的结果。有什么更好的方法来实现这一点,而不是使用正则表达式与每一个可能的组合?
在这种情况下,结果应该等于3。
这个短语大约有3-4个单词,字符串将是网页的内容.
发布于 2014-01-06 19:10:52
$haystack = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ipsum lorem dolor Curabitur ac risus nunc. Dolor ipsum lorem.';
$needle = 'Lorem ipsum dolor';
$hayWords = str_word_count(
strtolower($haystack),
1
);
$needleWords = str_word_count(
strtolower($needle),
1
);
$needleWordsCount = count($needleWords);
$foundWords = array_intersect(
$hayWords,
$needleWords
);
$count = array_reduce(
array_keys($foundWords),
function($counter, $item) use ($foundWords, $needleWordsCount) {
for($i = $item; $i < $item + $needleWordsCount; ++$i) {
if (!isset($foundWords[$i]))
return $counter;
}
return ++$counter;
},
0
);
var_dump($count);
发布于 2014-01-06 18:36:16
你可以试试regex:
/(?:(?:(?:lorem|ipsum|dolor)\s?)+)/gi
使用preg_match_all
,然后计数匹配的数量。从你的样本中,你应该得到3场比赛。
我不太擅长算法,也不擅长PHP,但我是在尝试.
<?php
$string = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ipsum lorem dolor Curabitur ac risus nunc. Dolor ipsum lorem.';
$lower_string = strtolower($string);
$text = array('lorem', 'ipsum', 'dolor');
$perms = AllPermutations($text);
$result = 0;
foreach ($perms as $piece) {
$phrase = join(' ', $piece);
$result += substr_count($lower_string, $phrase);
}
# From http://stackoverflow.com/a/12749950/1578604
function AllPermutations($InArray, $InProcessedArray = array())
{
$ReturnArray = array();
foreach($InArray as $Key=>$value)
{
$CopyArray = $InProcessedArray;
$CopyArray[$Key] = $value;
$TempArray = array_diff_key($InArray, $CopyArray);
if (count($TempArray) == 0)
{
$ReturnArray[] = $CopyArray;
}
else
{
$ReturnArray = array_merge($ReturnArray, AllPermutations($TempArray, $CopyArray));
}
}
return $ReturnArray;
}
echo $result;
?>
ideone演示
发布于 2014-01-06 18:45:22
我想你是在找这个:计数
$text = 'This is a test';
echo strlen($text); // 14
echo substr_count($text, 'is'); // 2
// the string is reduced to 's is a test', so it prints 1
echo substr_count($text, 'is', 3);
// the text is reduced to 's i', so it prints 0
echo substr_count($text, 'is', 3, 3);
// generates a warning because 5+10 > 14
echo substr_count($text, 'is', 5, 10);
// prints only 1, because it doesn't count overlapped substrings
$text2 = 'gcdgcdgcd';
echo substr_count($text2, 'gcdgcd');
https://stackoverflow.com/questions/20956713
复制相似问题