首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >PHP -如何在数组中查找重复的值分组

PHP -如何在数组中查找重复的值分组
EN

Stack Overflow用户
提问于 2014-01-23 06:29:38
回答 8查看 2K关注 0票数 18

我有一个字符串值数组,有时会形成重复值模式('a','b','c','d')

$array = array(
    'a', 'b', 'c', 'd',
    'a', 'b', 'c', 'd',
    'c', 'd',
);

我想根据数组顺序找到重复的模式,并按相同的顺序对它们进行分组(以维护它)。

$patterns = array(
    array('number' => 2, 'values' => array('a', 'b', 'c', 'd')),
    array('number' => 1, 'values' => array('c'))
    array('number' => 1, 'values' => array('d'))
);

请注意,a、b、b、c、c、d本身不是模式,因为它们位于较大的a、b、c、d模式中,并且最后一个c、d集合只出现一次,因此它也不是模式--只是单个值'c‘和'd’

另一个例子:

$array = array(
    'x', 'x', 'y', 'x', 'b', 'x', 'b', 'a'
  //[.......] [.] [[......]  [......]] [.]
);

它会产生

$patterns = array(
    array('number' => 2, 'values' => array('x')),
    array('number' => 1, 'values' => array('y')),
    array('number' => 2, 'values' => array('x', 'b')),
    array('number' => 1, 'values' => array('a'))
);

我该怎么做呢?

EN

回答 8

Stack Overflow用户

发布于 2016-01-26 05:36:28

字符数组就是字符串。正则表达式是字符串模式匹配的王者。添加递归,解决方案非常优雅,即使从字符数组来回转换也是如此:

function findPattern($str){
    $results = array();
    if(is_array($str)){
        $str = implode($str);
    }
    if(strlen($str) == 0){ //reached the end
        return $results;
    }
    if(preg_match_all('/^(.+)\1+(.*?)$/',$str,$matches)){ //pattern found
        $results[] = array('number' => (strlen($str) - strlen($matches[2][0])) / strlen($matches[1][0]), 'values' => str_split($matches[1][0]));
        return array_merge($results,findPattern($matches[2][0]));
    }
    //no pattern found
    $results[] = array('number' => 1, 'values' => array(substr($str, 0, 1)));
    return array_merge($results,findPattern(substr($str, 1)));
}

你可以在这里测试:https://eval.in/507818https://eval.in/507815

票数 7
EN

Stack Overflow用户

发布于 2014-01-23 07:19:56

如果c和d可以分组,这是我的代码:

<?php
$array = array(
    'a', 'b', 'c', 'd',
    'a', 'b', 'c', 'd',
    'c', 'd',
);

$res = array();

foreach ($array AS $value) {
    if (!isset($res[$value])) {
        $res[$value] = 0;
    }
    $res[$value]++;
}

foreach ($res AS $key => $value) {
    $fArray[$value][] = $key;
    for ($i = $value - 1; $i > 0; $i--) {
        $fArray[$i][] = $key;
    }
}

$res = array();
foreach($fArray AS $key => $value) {
    if (!isset($res[serialize($value)])) {
        $res[serialize($value)] = 0;
    }
    $res[serialize($value)]++;
}
$fArray = array();
foreach($res AS $key => $value) {
    $fArray[] = array('number' => $value, 'values' => unserialize($key));
}

echo '<pre>';
var_dump($fArray);
echo '</pre>';

最终结果是:

array (size=2)
  0 => 
    array (size=2)
      'number' => int 2
      'values' => 
        array (size=4)
          0 => string 'a' (length=1)
          1 => string 'b' (length=1)
          2 => string 'c' (length=1)
          3 => string 'd' (length=1)
  1 => 
    array (size=2)
      'number' => int 1
      'values' => 
        array (size=2)
          0 => string 'c' (length=1)
          1 => string 'd' (length=1)
票数 5
EN

Stack Overflow用户

发布于 2016-01-25 08:09:57

好的,这是我的观点,下面的代码将整个原始数组分成最长的相邻非重叠块。

所以在这样的情况下

'a', 'b', 'a', 'b', 'a', 'b', 'a', 'b', 'c', 'd' 
[                 ] [                 ] [ ]  [  ]  <-- use 2 long groups
[      ] [        ] [      ]  [       ] [ ]  [  ]  <-- and not 4 short

它将更喜欢2个长组而不是4个较短的组。

更新:也用另一个答案中的例子进行了测试,也适用于这些情况:

one, two, one, two, one, two, one, two
[one two one two], [one two one two]

'one' 'two' 'one' 'two' 'three' 'four' 'one' 'two' 'three' 'four'    
['one'] ['two'] ['one' 'two' 'three' 'four'] ['one' 'two' 'three' 'four']

以下是代码和测试:

<?php

/*
 * Splits an $array into chunks of $chunk_size.
 * Returns number of repeats, start index and chunk which has
 * max number of ajacent repeats.
 */
function getRepeatCount($array, $chunk_size) {
    $parts = array_chunk($array, $chunk_size);
    $maxRepeats = 1;
    $maxIdx = 0;
    $repeats = 1;
    $len = count($parts);
    for ($i = 0; $i < $len-1; $i++) {
        if ($parts[$i] === $parts[$i+1]) {
            $repeats += 1;
            if ($repeats > $maxRepeats) {
                $maxRepeats = $repeats;
                $maxIdx = $i - ($repeats-2);
            }
        } else {
            $repeats = 1;
        }
    }
    return array($maxRepeats, $maxIdx*$chunk_size, $parts[$maxIdx]);
}

/*
 * Finds longest pattern in the $array.
 * Returns number of repeats, start index and pattern itself.
 */
function findLongestPattern($array) {
    $len = count($array);
    for ($window = floor($len/2); $window >= 1; $window--) {
      $num_chunks = ceil($len/$window);
      for ($i = 0; $i < $num_chunks; $i++) {
        list($repeats, $idx, $pattern) = getRepeatCount(
          array_slice($array, $i), $window
        );
        if ($repeats > 1) {
            return array($repeats, $idx+$i, $pattern);
        }
      }
    }
    return array(1, 0, [$array[0]]);
}

/*
 * Splits $array into longest adjacent non-overlapping parts.
 */
function splitToPatterns($array) {
    if (count($array) < 1) {
        return $array;
    }
    list($repeats, $start, $pattern) = findLongestPattern($array);
    $end = $start + count($pattern) * $repeats;
    return array_merge(
            splitToPatterns(array_slice($array, 0, $start)),
            array(
                array('number'=>$repeats, 'values' => $pattern)
            ),
            splitToPatterns(array_slice($array, $end))
    );
}

测试:

function isEquals($expected, $actual) {
    $exp_str = json_encode($expected);
    $act_str = json_encode($actual);
    $equals = $exp_str === $act_str;
    if (!$equals) {
        echo 'Equals check failed'.PHP_EOL;
        echo 'expected: '.$exp_str.PHP_EOL;
        echo 'actual  : '.$act_str.PHP_EOL;
    }
    return $equals;
}

assert(isEquals(
    array(1, 0, ['a']), getRepeatCount(['a','b','c'], 1)
));
assert(isEquals(
    array(1, 0, ['a']), getRepeatCount(['a','b','a','b','c'], 1)
));
assert(isEquals(
    array(2, 0, ['a','b']), getRepeatCount(['a','b','a','b','c'], 2)
));
assert(isEquals(
    array(1, 0, ['a','b','a']), getRepeatCount(['a','b','a','b','c'], 3)
));
assert(isEquals(
    array(3, 0, ['a','b']), getRepeatCount(['a','b','a','b','a','b','a'], 2)
));
assert(isEquals(
    array(2, 2, ['a','c']), getRepeatCount(['x','c','a','c','a','c'], 2)
));
assert(isEquals(
    array(1, 0, ['x','c','a']), getRepeatCount(['x','c','a','c','a','c'], 3)
));
assert(isEquals(
    array(2, 0, ['a','b','c','d']),
    getRepeatCount(['a','b','c','d','a','b','c','d','c','d'],4)
));

assert(isEquals(
    array(2, 2, ['a','c']), findLongestPattern(['x','c','a','c','a','c'])
));
assert(isEquals(
    array(1, 0, ['a']), findLongestPattern(['a','b','c'])
));
assert(isEquals(
    array(2, 2, ['c','a']),
    findLongestPattern(['a','b','c','a','c','a'])
));
assert(isEquals(
    array(2, 0, ['a','b','c','d']),
    findLongestPattern(['a','b','c','d','a','b','c','d','c','d'])
));


// Find longest adjacent non-overlapping patterns
assert(isEquals(
    array(
        array('number'=>1, 'values'=>array('a')),
        array('number'=>1, 'values'=>array('b')),
        array('number'=>1, 'values'=>array('c')),
    ),
    splitToPatterns(['a','b','c'])
));
assert(isEquals(
    array(
        array('number'=>1, 'values'=>array('a')),
        array('number'=>1, 'values'=>array('b')),
        array('number'=>2, 'values'=>array('c','a')),
    ),
    splitToPatterns(['a','b','c','a','c','a'])
));
assert(isEquals(
    array(
        array('number'=>2, 'values'=>array('a','b','c','d')),
        array('number'=>1, 'values'=>array('c')),
        array('number'=>1, 'values'=>array('d')),
    ),
    splitToPatterns(['a','b','c','d','a','b','c','d','c','d'])
));
/*     'a', 'b', 'a', 'b', 'a', 'b', 'a', 'b', 'c', 'd', */
/*     [                 ] [                 ] [ ]  [  ] */
/* NOT [      ] [        ] [      ]  [       ] [ ]  [  ] */
assert(isEquals(
    array(
        array('number'=>2, 'values'=>array('a','b','a','b')),
        array('number'=>1, 'values'=>array('c')),
        array('number'=>1, 'values'=>array('d')),
    ),
    splitToPatterns(['a','b','a','b','a','b','a','b','c','d'])
));

/*     'x', 'x', 'y', 'x', 'b', 'x', 'b', 'a' */
/* //  [  ] [  ] [ ]  [       ] [      ]  [ ] */
assert(isEquals(
    array(
        array('number'=>2, 'values'=>array('x')),
        array('number'=>1, 'values'=>array('y')),
        array('number'=>2, 'values'=>array('x','b')),
        array('number'=>1, 'values'=>array('a')),
    ),
    splitToPatterns(['x','x','y','x','b','x','b','a'])
));
// one, two, one, two, one, two, one, two
// [                ] [                 ]
assert(isEquals(
    array(
        array('number'=>2, 'values'=>array('one', 'two', 'one', 'two')),
    ),
    splitToPatterns(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])
));
// 'one', 'two', 'one', 'two', 'three', 'four', 'one', 'two', 'three', 'four'
// [   ]  [   ]  [                           ]  [                           ]
assert(isEquals(
    array(
        array('number'=>1, 'values'=>array('one')),
        array('number'=>1, 'values'=>array('two')),
        array('number'=>2, 'values'=>array('one','two','three','four')),
    ),
    splitToPatterns(['one', 'two', 'one', 'two', 'three', 'four', 'one', 'two', 'three','four'])
));

/*     'a', 'a', 'b', 'a', 'b', 'a', 'b', 'a', 'b', 'c', */
/*     [  ] [                 ] [                 ] [ ]  */
assert(isEquals(
    array(
        array('number'=>1, 'values'=>array('a')),
        array('number'=>2, 'values'=>array('a','b','a','b')),
        array('number'=>1, 'values'=>array('c')),
    ),
    splitToPatterns(['a','a','b','a','b','a','b','a','b','c'])
));

/* 'a', 'b', 'a', 'b', 'c', 'd', 'a', 'b', 'a', 'b', 'a', 'b' */
// [      ]  [      ]  [ ]  [ ]  [      ] [       ]  [      ]
assert(isEquals(
    array(
        array('number'=>2, 'values'=>array('a', 'b')),
        array('number'=>1, 'values'=>array('c')),
        array('number'=>1, 'values'=>array('d')),
        array('number'=>3, 'values'=>array('a','b')),
    ),
    splitToPatterns(['a', 'b', 'a', 'b', 'c', 'd', 'a', 'b', 'a', 'b', 'a', 'b'])
));
/* 'a', 'c', 'd', 'a', 'b', 'a', 'b', 'a', 'b', 'a', 'b', 'c', */
/* [  ] [  ] [  ] [                 ] [                 ] [ ]  */
assert(isEquals(
    array(
        array('number'=>1, 'values'=>array('a')),
        array('number'=>2, 'values'=>array('a','b','a','b')),
        array('number'=>1, 'values'=>array('c')),
    ),
    splitToPatterns(['a','a','b','a','b','a','b','a','b','c'])
));
票数 3
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/21295384

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档