PHP -如何在数组中找到重复的值分组

I have an array of string values which sometimes form repeating value patterns ('a', 'b', 'c', 'd')

我有一个字符串值数组，它们有时会形成重复的值模式('a'， 'b'， 'c'， 'd')

$array = array(
    'a', 'b', 'c', 'd',
    'a', 'b', 'c', 'd',
    'c', 'd',
);

I would like to find duplicate patterns based on the array order and group them by that same order (to maintain it).

我希望根据数组顺序找到重复的模式，并按相同的顺序对它们进行分组(以维护它)。

$patterns = array(
    array('number' => 2, 'values' => array('a', 'b', 'c', 'd')),
    array('number' => 1, 'values' => array('c'))
    array('number' => 1, 'values' => array('d'))
);

Notice that [a,b], [b,c], & [c,d] are not patterns by themselves because they are inside the larger [a,b,c,d] pattern and the last [c,d] set only appears once so it's not a pattern either - just the individual values 'c' and 'd'

请注意，[a,b]， [b,c]和[c,d]本身并不是模式，因为它们在较大的[a,b,c,d]模式中，最后一个[c,d]集合只出现一次，所以它也不是模式——只是单个值'c'和'd'

Another example:

另一个例子:

$array = array(
    'x', 'x', 'y', 'x', 'b', 'x', 'b', 'a'
  //[.......] [.] [[......]  [......]] [.]
);

which produces

生产

$patterns = array(
    array('number' => 2, 'values' => array('x')),
    array('number' => 1, 'values' => array('y')),
    array('number' => 2, 'values' => array('x', 'b')),
    array('number' => 1, 'values' => array('a'))
);

How can I do this?

我该怎么做呢?

10 个解决方案

#1

Character arrays are just strings. Regex is the king of string pattern matching. Add recursion and the solution is pretty elegant, even with the conversion back and forth from character arrays:

字符数组只是字符串。Regex是字符串模式匹配的王者。添加递归，解决方案相当优雅，即使是字符数组的来回转换:

function findPattern($str){
    $results = array();
    if(is_array($str)){
        $str = implode($str);
    }
    if(strlen($str) == 0){ //reached the end
        return $results;
    }
    if(preg_match_all('/^(.+)\1+(.*?)$/',$str,$matches)){ //pattern found
        $results[] = array('number' => (strlen($str) - strlen($matches[2][0])) / strlen($matches[1][0]), 'values' => str_split($matches[1][0]));
        return array_merge($results,findPattern($matches[2][0]));
    }
    //no pattern found
    $results[] = array('number' => 1, 'values' => array(substr($str, 0, 1)));
    return array_merge($results,findPattern(substr($str, 1)));
}

You can test here : https://eval.in/507818 and https://eval.in/507815

您可以在这里测试:https://eval。/ 507818和https://eval.in/507815

#2

If c and d can be grouped, this is my code:

如果c和d可以分组，这是我的代码:

<?php
$array = array(
    'a', 'b', 'c', 'd',
    'a', 'b', 'c', 'd',
    'c', 'd',
);

$res = array();

foreach ($array AS $value) {
    if (!isset($res[$value])) {
        $res[$value] = 0;
    }
    $res[$value]++;
}

foreach ($res AS $key => $value) {
    $fArray[$value][] = $key;
    for ($i = $value - 1; $i > 0; $i--) {
        $fArray[$i][] = $key;
    }
}

$res = array();
foreach($fArray AS $key => $value) {
    if (!isset($res[serialize($value)])) {
        $res[serialize($value)] = 0;
    }
    $res[serialize($value)]++;
}
$fArray = array();
foreach($res AS $key => $value) {
    $fArray[] = array('number' => $value, 'values' => unserialize($key));
}

echo '<pre>';
var_dump($fArray);
echo '</pre>';

Final result is:

最后的结果是:

array (size=2)
  0 => 
    array (size=2)
      'number' => int 2
      'values' => 
        array (size=4)
          0 => string 'a' (length=1)
          1 => string 'b' (length=1)
          2 => string 'c' (length=1)
          3 => string 'd' (length=1)
  1 => 
    array (size=2)
      'number' => int 1
      'values' => 
        array (size=2)
          0 => string 'c' (length=1)
          1 => string 'd' (length=1)

#3

The following code will return the expected result, finding the longest portions with repeated values:

下面的代码将返回预期的结果，找到具有重复值的最长的部分:

function pepito($array) {
  $sz=count($array);
  $patterns=Array();
  for ($pos=0;$pos<$sz;$pos+=$len) {
    $nb=1;
    for ($len=floor($sz/2);$len>0;$len--) {
      while (array_slice($array, $pos, $len)==array_slice($array, $pos+$len, $len)) {
        $pos+=$len;
        $nb++;
      }
      if ($nb>1) break;
    }
    if (!$len) $len=1;
    $patterns[]=Array('number'=>$nb, 'values'=>array_slice($array, $pos, $len));
  }
  return $patterns;
}

This will match with your examples:

这将与您的示例相匹配:

{['a', 'b', 'c', 'd'], ['a', 'b', 'c', 'd']}, ['c', 'd']

{[a,b,c,d '),[a,b,c,d ')},(' c ',' d ')

or {['x'], ['x']}, ['y'], {['x', 'b'], ['x', 'b']}, ['a']

或{[' x '],[x]},[y]{[' x ',' b '],[' x ',' b ']},[a]

The difficult part is more about examples like:

困难的部分更多的是像这样的例子:

{['one', 'one', 'two'], ['one', 'one', 'two']}

{【1】，【1】，【1】，【1】，【1】，【1】，【2】}

Or the most difficult choice to make:

或者是最困难的选择:

one, two, one, two, one, two, one, two

一，二，一，二，一，二，一，二

Because we can group this in both the form:

因为我们可以把它分成两种形式:

[one, two], [one, two], [one, two], [one, two]

[一，二]，[一，二]，[一，二]，[一，二]

[one, two, one, two], [one, two, one, two]

[1,2,1,2]，[1,2,1,2]

where there is no "evident" choice. My above algorithm will always consider the longest match, as this is the most easy implementation to consider any combination.

没有“明显”的选择。我上面的算法总是考虑最长的匹配，因为这是最容易考虑任何组合的实现。

EDIT: You should also consider cases where the longest match is after a shorter one:

编辑:你也应该考虑最长匹配在短匹配之后的情况:

Example:

例子:

'one', 'two', 'one', 'two', 'three', 'four', 'one', 'two', 'three', 'four'

'一',' 2 ',' 1 ',' 2 ',' 3 ',' 4 ',' 1 ',' 2 ',' 3 ',' 4 '

If you start from left to right, you might want to group as :

如果你从左到右开始，你可能想把它分成以下几类:

{['one', 'two'], ['one', 'two'],} 'three', 'four', 'one', 'two', 'three', 'four'

{[' 1 ',' 2 '],[' 1 ',' 2 '],} ' 3 ',' 4 ','一',' 2 ',' 3 ',' 4 '

when you could group like:

当你可以这样分组时:

'one', 'two', {['one', 'two', 'three', 'four'], ['one', 'two', 'three', 'four']}

“一”、“两个”,{[‘一个’,‘2’,‘三’,' 4 '],[“一”,“两个”,“三”、“四”)}

This situation has to be resolved with recursive calls to get the better solution but this will result in longer execution time:

这种情况必须通过递归调用来解决，以获得更好的解决方案，但这会导致更长的执行时间:

function pepito($array) {
  if (($sz=count($array))<1) return Array();
  $pos=0;
  $nb=1;
  for ($len=floor($sz/2);$len>0;$len--) {
    while (array_slice($array, $pos, $len)==array_slice($array, $pos+$len, $len)) {
      $pos+=$len;
      $nb++;
    }
    if ($nb>1) break;
  }

  if (!$len) $len=1;
  $rec1=pepito(array_slice($array, $pos+$len));
  $rec2=pepito(array_slice($array, 1));

  if (count($rec1)<count($rec2)+1) {
    return array_merge(Array(Array('number'=>$nb, 'values'=>array_slice($array, $pos, $len))), $rec1);
  }
  return array_merge(Array(Array('number'=>1, 'values'=>array_slice($array, 0, 1))), $rec2);
}

#4

OK, here is my take, the code below splits the whole original array into longest adjacent non-overlapping chunks.

下面的代码将原始数组分割成最长的不重叠块。

So in the situation like this

在这种情况下

'a', 'b', 'a', 'b', 'a', 'b', 'a', 'b', 'c', 'd' 
[                 ] [                 ] [ ]  [  ]  <-- use 2 long groups
[      ] [        ] [      ]  [       ] [ ]  [  ]  <-- and not 4 short

it will prefer 2 long groups to 4 shorter groups.

它更喜欢2个长组而不是4个短组。

Update: also tested with examples from another answer, works for these cases too:

更新:用另一个答案中的例子进行测试，也适用于这些情况:

one, two, one, two, one, two, one, two
[one two one two], [one two one two]

'one' 'two' 'one' 'two' 'three' 'four' 'one' 'two' 'three' 'four'    
['one'] ['two'] ['one' 'two' 'three' 'four'] ['one' 'two' 'three' 'four']

Here is the code and tests:

这里是代码和测试:

<?php

/*
 * Splits an $array into chunks of $chunk_size.
 * Returns number of repeats, start index and chunk which has
 * max number of ajacent repeats.
 */
function getRepeatCount($array, $chunk_size) {
    $parts = array_chunk($array, $chunk_size);
    $maxRepeats = 1;
    $maxIdx = 0;
    $repeats = 1;
    $len = count($parts);
    for ($i = 0; $i < $len-1; $i++) {
        if ($parts[$i] === $parts[$i+1]) {
            $repeats += 1;
            if ($repeats > $maxRepeats) {
                $maxRepeats = $repeats;
                $maxIdx = $i - ($repeats-2);
            }
        } else {
            $repeats = 1;
        }
    }
    return array($maxRepeats, $maxIdx*$chunk_size, $parts[$maxIdx]);
}

/*
 * Finds longest pattern in the $array.
 * Returns number of repeats, start index and pattern itself.
 */
function findLongestPattern($array) {
    $len = count($array);
    for ($window = floor($len/2); $window >= 1; $window--) {
      $num_chunks = ceil($len/$window);
      for ($i = 0; $i < $num_chunks; $i++) {
        list($repeats, $idx, $pattern) = getRepeatCount(
          array_slice($array, $i), $window
        );
        if ($repeats > 1) {
            return array($repeats, $idx+$i, $pattern);
        }
      }
    }
    return array(1, 0, [$array[0]]);
}

/*
 * Splits $array into longest adjacent non-overlapping parts.
 */
function splitToPatterns($array) {
    if (count($array) < 1) {
        return $array;
    }
    list($repeats, $start, $pattern) = findLongestPattern($array);
    $end = $start + count($pattern) * $repeats;
    return array_merge(
            splitToPatterns(array_slice($array, 0, $start)),
            array(
                array('number'=>$repeats, 'values' => $pattern)
            ),
            splitToPatterns(array_slice($array, $end))
    );
}

Tests:

测试:

function isEquals($expected, $actual) {
    $exp_str = json_encode($expected);
    $act_str = json_encode($actual);
    $equals = $exp_str === $act_str;
    if (!$equals) {
        echo 'Equals check failed'.PHP_EOL;
        echo 'expected: '.$exp_str.PHP_EOL;
        echo 'actual  : '.$act_str.PHP_EOL;
    }
    return $equals;
}

assert(isEquals(
    array(1, 0, ['a']), getRepeatCount(['a','b','c'], 1)
));
assert(isEquals(
    array(1, 0, ['a']), getRepeatCount(['a','b','a','b','c'], 1)
));
assert(isEquals(
    array(2, 0, ['a','b']), getRepeatCount(['a','b','a','b','c'], 2)
));
assert(isEquals(
    array(1, 0, ['a','b','a']), getRepeatCount(['a','b','a','b','c'], 3)
));
assert(isEquals(
    array(3, 0, ['a','b']), getRepeatCount(['a','b','a','b','a','b','a'], 2)
));
assert(isEquals(
    array(2, 2, ['a','c']), getRepeatCount(['x','c','a','c','a','c'], 2)
));
assert(isEquals(
    array(1, 0, ['x','c','a']), getRepeatCount(['x','c','a','c','a','c'], 3)
));
assert(isEquals(
    array(2, 0, ['a','b','c','d']),
    getRepeatCount(['a','b','c','d','a','b','c','d','c','d'],4)
));

assert(isEquals(
    array(2, 2, ['a','c']), findLongestPattern(['x','c','a','c','a','c'])
));
assert(isEquals(
    array(1, 0, ['a']), findLongestPattern(['a','b','c'])
));
assert(isEquals(
    array(2, 2, ['c','a']),
    findLongestPattern(['a','b','c','a','c','a'])
));
assert(isEquals(
    array(2, 0, ['a','b','c','d']),
    findLongestPattern(['a','b','c','d','a','b','c','d','c','d'])
));


// Find longest adjacent non-overlapping patterns
assert(isEquals(
    array(
        array('number'=>1, 'values'=>array('a')),
        array('number'=>1, 'values'=>array('b')),
        array('number'=>1, 'values'=>array('c')),
    ),
    splitToPatterns(['a','b','c'])
));
assert(isEquals(
    array(
        array('number'=>1, 'values'=>array('a')),
        array('number'=>1, 'values'=>array('b')),
        array('number'=>2, 'values'=>array('c','a')),
    ),
    splitToPatterns(['a','b','c','a','c','a'])
));
assert(isEquals(
    array(
        array('number'=>2, 'values'=>array('a','b','c','d')),
        array('number'=>1, 'values'=>array('c')),
        array('number'=>1, 'values'=>array('d')),
    ),
    splitToPatterns(['a','b','c','d','a','b','c','d','c','d'])
));
/*     'a', 'b', 'a', 'b', 'a', 'b', 'a', 'b', 'c', 'd', */
/*     [                 ] [                 ] [ ]  [  ] */
/* NOT [      ] [        ] [      ]  [       ] [ ]  [  ] */
assert(isEquals(
    array(
        array('number'=>2, 'values'=>array('a','b','a','b')),
        array('number'=>1, 'values'=>array('c')),
        array('number'=>1, 'values'=>array('d')),
    ),
    splitToPatterns(['a','b','a','b','a','b','a','b','c','d'])
));

/*     'x', 'x', 'y', 'x', 'b', 'x', 'b', 'a' */
/* //  [  ] [  ] [ ]  [       ] [      ]  [ ] */
assert(isEquals(
    array(
        array('number'=>2, 'values'=>array('x')),
        array('number'=>1, 'values'=>array('y')),
        array('number'=>2, 'values'=>array('x','b')),
        array('number'=>1, 'values'=>array('a')),
    ),
    splitToPatterns(['x','x','y','x','b','x','b','a'])
));
// one, two, one, two, one, two, one, two
// [                ] [                 ]
assert(isEquals(
    array(
        array('number'=>2, 'values'=>array('one', 'two', 'one', 'two')),
    ),
    splitToPatterns(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])
));
// 'one', 'two', 'one', 'two', 'three', 'four', 'one', 'two', 'three', 'four'
// [   ]  [   ]  [                           ]  [                           ]
assert(isEquals(
    array(
        array('number'=>1, 'values'=>array('one')),
        array('number'=>1, 'values'=>array('two')),
        array('number'=>2, 'values'=>array('one','two','three','four')),
    ),
    splitToPatterns(['one', 'two', 'one', 'two', 'three', 'four', 'one', 'two', 'three','four'])
));

/*     'a', 'a', 'b', 'a', 'b', 'a', 'b', 'a', 'b', 'c', */
/*     [  ] [                 ] [                 ] [ ]  */
assert(isEquals(
    array(
        array('number'=>1, 'values'=>array('a')),
        array('number'=>2, 'values'=>array('a','b','a','b')),
        array('number'=>1, 'values'=>array('c')),
    ),
    splitToPatterns(['a','a','b','a','b','a','b','a','b','c'])
));

/* 'a', 'b', 'a', 'b', 'c', 'd', 'a', 'b', 'a', 'b', 'a', 'b' */
// [      ]  [      ]  [ ]  [ ]  [      ] [       ]  [      ]
assert(isEquals(
    array(
        array('number'=>2, 'values'=>array('a', 'b')),
        array('number'=>1, 'values'=>array('c')),
        array('number'=>1, 'values'=>array('d')),
        array('number'=>3, 'values'=>array('a','b')),
    ),
    splitToPatterns(['a', 'b', 'a', 'b', 'c', 'd', 'a', 'b', 'a', 'b', 'a', 'b'])
));
/* 'a', 'c', 'd', 'a', 'b', 'a', 'b', 'a', 'b', 'a', 'b', 'c', */
/* [  ] [  ] [  ] [                 ] [                 ] [ ]  */
assert(isEquals(
    array(
        array('number'=>1, 'values'=>array('a')),
        array('number'=>2, 'values'=>array('a','b','a','b')),
        array('number'=>1, 'values'=>array('c')),
    ),
    splitToPatterns(['a','a','b','a','b','a','b','a','b','c'])
));

#5

Definitions:

定义:

Pattern base: The sequence of elements that repeat within a pattern. (ie. For [a,b,a,b,c], [a,b] is the pattern base and [a,b,a,b] is the pattern.

模式库:在模式中重复的元素序列。(即。对于[a,b,a,b,c]， [a,b]是模式基，[a,b,a,b]是模式。

We want to start searching for the longest pattern base, followed by the next longest, and so forth. It's important to understand that if we find a pattern, we don't need to check within it for the start of another pattern with a base of the same length.

我们想要开始搜索最长的模式基，然后是下一个最长的，等等。重要的是要理解，如果我们找到了一个模式，我们不需要在其中检查另一个具有相同长度基数的模式的开始。

Here's the proof.

这是证明。

Assume that A is a pattern base, and that we've encountered the pattern AA. Assume that B is a another pattern base, of the same length, that forms a pattern starting within A. Let Y be the overlapping elements. If A=XY, then AA=XYXY. Since B is the same length, it must be the case that B=YX because in order to complete B we must use the remaining elements in A. Moreover, since B forms a pattern we must have BB, which is YXYX. Since A starts before B, we have XYXYX=AAX=XBB. If B repeats again, we would have XBBB=XYXYXYX=AAAX. Therefore, B cannot repeat an additional time without A repeating an additional time. Thus, we do not need to check for longer patterns within those generated by A.

假设A是一个模式库，并且我们已经遇到了模式AA。假设B是另一个相同长度的模式基，它从a开始形成一个模式，让Y成为重叠的元素。如果一个= XY,那么AA = XYXY。因为B的长度是一样的，所以B一定是YX，因为为了完成B，我们必须使用a中剩余的元素。此外，由于B形成了一个图案，我们必须有BB，也就是YXYX。因为A在B之前开始，我们有xyx =AAX=XBB。如果B再次重复，我们会得到XBBB= xyxyxyxyxyx =AAAX。因此，B不能重复一个额外的时间而不重复一个额外的时间。因此，我们不需要在A生成的模式中检查更长的模式。

The longest pattern possible consists of half the elements in the whole list because the simplest pattern can occur exactly twice. Thus, we can start checking for patterns of half of the length and work our way down to patterns of size 2.

最长的模式可能由整个列表中的一半元素组成，因为最简单的模式可能出现两次。因此，我们可以开始检查长度的一半的模式，并按照大小为2的模式工作。

Assuming that we search the array from left to right, if a pattern is found, we only need to search on either side of it for additional patterns. To the left, there are no patterns with bases of the same length, or they would have been discovered beforehand. Thus, we search the left side for patterns using the next smallest base size. The elements to the right of the pattern haven't been searched so we continue searching for patterns using a base of the same size.

假设我们从左到右搜索数组，如果找到了一个模式，我们只需要在它的任何一边搜索其他的模式。在左边，没有相同长度的底的图案，否则它们会被事先发现。因此，我们使用下一个最小的基本大小搜索左侧的模式。模式右边的元素没有被搜索，因此我们继续使用相同大小的基数搜索模式。

The function to do this is as follows:

这样做的功能如下:

function get_patterns($arr, $len = null) {
    // The smallest pattern base length for which a pattern can be found
    $minlen = 2;

    // No pattern base length was specified
    if ($len === null) {
        // Use the longest pattern base length possible
        $maxlen = floor(count($arr) / 2);
        return get_patterns($arr, $maxlen);

    // Base length is too small to find any patterns
    } else if ($len < $minlen) {
        // Compile elements into lists consisting of one element

        $results = array();

        $num = 1;
        $elem = $arr[0];

        for ($i=1; $i < count($arr); $i++) {
            if ($elem === $arr[$i]) {
                $num++;
            } else {
                array_push($results, array(
                    'number' => $num,
                    'values' => array( $elem )
                ));

                $num = 1;
                $elem = $arr[$i];
            }
        }

        array_push($results, array(
            'number' => $num,
            'values' => array( $elem )
        ));

        return $results;
    }

    // Cycle through elements until there aren't enough elements to fit
    //  another repition.
    for ($i=0; $i < count($arr) - $len * 2 + 1; $i++) {
        // Number of times pattern base occurred
        $num_read = 1; // One means there is no pattern yet

        // Current pattern base we are attempting to match against
        $base = array_slice($arr, $i, $len);

        // Check for matches using segments of the same length for the elements
        //  following the current pattern base
        for ($j = $i + $len; $j < count($arr) - $len + 1; $j += $len) {
            // Elements being compared to pattern base
            $potential_match = array_slice($arr, $j, $len);

            // Match found
            if (has_same_elements($base, $potential_match)) {
                $num_read++;

            // NO match found
            } else {
                // Do not check again using currently selected elements
                break;
            }
        }

        // Patterns were encountered
        if ($num_read > 1) {
            // The total number of elements that make up the pattern
            $pattern_len = $num_read * $len;

            // The elements before the pattern
            $before = array_slice($arr, 0, $i);

            // The elements after the pattern
            $after = array_slice(
                $arr, $i + $pattern_len, count($arr) - $pattern_len - $i
            );

            $results = array_merge(
                // Patterns of a SMALLER length may exist beforehand
                count($before) > 0 ? get_patterns($before, $len-1) : array(),

                // Patterns that were found
                array(
                    array(
                        'number' => $num_read,
                        'values' => $base
                    )
                ),

                // Patterns of the SAME length may exist afterward
                count($after) > 0 ? get_patterns($after, $len) : array()
            );

            return $results;
        }
    }

    // No matches were encountered

    // Search for SMALLER patterns
    return get_patterns($arr, $len-1);
}

The function has_same_elements, which was used to check if arrays with primitive keys are identical, is as follows:

has_same_elements函数用于检查具有原语键的数组是否相同，如下所示:

// Returns true if two arrays have the same elements.
//
// Precondition: Elements must be primitive data types (ie. int, string, etc)
function has_same_elements($a1, $a2) {
    // There are a different number of elements
    if (count($a1) != count($a2)) {
        return false;
    }

    for ($i=0; $i < count($a1); $i++) {
        if ($a1[$i] !== $a2[$i]) {
            return false;
        }
    }

    return true;
}

In order to speed up the code, there are a few things that you could do. Instead of slicing the array, you can supply the function with indexes to the start and end position that are to be examined, along with the array. Also, using strings may be slow so you can create an array that maps strings to numbers and vice versa. Then you can convert the array of strings into an array of numbers and use it instead. After you get the result, you can convert the arrays of numbers back into strings.

为了加快代码的速度，您可以做一些事情。与切片数组不同，您可以向函数提供要检查的开始和结束位置的索引，以及数组。另外，使用字符串可能很慢，因此可以创建一个将字符串映射到数字的数组，反之亦然。然后，您可以将字符串数组转换成数字数组并使用它。得到结果后，可以将数字数组转换回字符串。

I tested the function using the following code:

我使用以下代码测试了这个函数:

$tests = array(
    'a,b,c,d',
    'a',
    'a,a,a,a',
    'a,a,a,a,a',
    'a,a,a,a,a,a',
    'b,a,a,a,a,c',
    'b,b,a,a,a,a,c,c',
    'b,b,a,a,d,a,a,c,c',
    'a,b,c,d,a,b,c,d,c,d',
    'x,x,y,x,b,x,b,a'
);

echo '<pre>';
foreach ($tests as $test) {
    echo '<div>';
    $arr = explode(',',$test);
    echo "$test<br /><br />";
    pretty_print(get_patterns($arr));
    echo '</div><br />';
}
echo '</pre>';

The function that I used to print the output, pretty_print is as follows:

我用来打印输出的函数pretty_print如下:

function pretty_print($results) {
    foreach ($results as $result) {
        $a = "array('" . implode("','", $result['values']) . "')";
        echo "array('number' => ${result['number']}, 'values' => $a)<br />";
    }
}

The output from the test code is as follows:

测试代码输出如下:

a,b,c,d

array('number' => 1, 'values' => array('a'))
array('number' => 1, 'values' => array('b'))
array('number' => 1, 'values' => array('c'))
array('number' => 1, 'values' => array('d'))

a

array('number' => 1, 'values' => array('a'))

a,a,a,a

array('number' => 2, 'values' => array('a','a'))

a,a,a,a,a

array('number' => 2, 'values' => array('a','a'))
array('number' => 1, 'values' => array('a'))

a,a,a,a,a,a

array('number' => 2, 'values' => array('a','a','a'))

b,a,a,a,a,c

array('number' => 1, 'values' => array('b'))
array('number' => 2, 'values' => array('a','a'))
array('number' => 1, 'values' => array('c'))

b,b,a,a,a,a,c,c

array('number' => 2, 'values' => array('b'))
array('number' => 2, 'values' => array('a','a'))
array('number' => 2, 'values' => array('c'))

b,b,a,a,d,a,a,c,c

array('number' => 2, 'values' => array('b'))
array('number' => 2, 'values' => array('a'))
array('number' => 1, 'values' => array('d'))
array('number' => 2, 'values' => array('a'))
array('number' => 2, 'values' => array('c'))

a,b,c,d,a,b,c,d,c,d

array('number' => 2, 'values' => array('a','b','c','d'))
array('number' => 1, 'values' => array('c'))
array('number' => 1, 'values' => array('d'))

x,x,y,x,b,x,b,a

array('number' => 2, 'values' => array('x'))
array('number' => 1, 'values' => array('y'))
array('number' => 2, 'values' => array('x','b'))
array('number' => 1, 'values' => array('a'))

#6

I started with this now but at the end my brain burn and I don't know where to start to compare the arrays... Enjoy!

我从这个开始，但最后我的大脑烧坏了，我不知道从哪里开始比较这些数组……享受吧!

$array = array(
    'x', 'x', 'y', 'x', 'b', 'x', 'b', 'a'
    //[.......] [.] [[......]  [......]] [.]
);

$arrayCount = count($array);

$res = array();
for($i = 0; $i < $arrayCount; $i++) {
    for($j = 1; $j < $arrayCount; $j++) {
        $res[$i][] = array_slice($array, $i, $j);
    }
}

//echo '<pre>';
//var_dump($res);
//echo '</pre>';
//
//die;


$resCount = count($res);
$oneResCount = count($res[0]);

#7

First make a function which will find the possible group matches in the array for a given group array, starting from a specific index in the array and will return the number of matches found.

首先创建一个函数，该函数将查找给定组数组中可能的组匹配项，从数组中的特定索引开始，并返回找到的匹配项的数量。

function findGroupMatch($group, $array, $startFrom) {
    $match = 0;
    while($group == array_slice($array, $startFrom, count($group))) {
        $match++;
        $startFrom += count($group);
    }
    return $match;
}

Now, we need to iterate through each item to find possible groups and then send it to findGroupMatch() function to check if any match for the group exists in the next items. The trick to find a possible group is finding an item which matches any of the previous items. If so, we find a possible group taking all the previous items starting from the matched item. Otherwise we just increase the list of unmatched items and at the end we enter all unmatched items as single item groups. (In the given example, we have a, b, c, d, a.... When we find 2nd a in the array, it matches the previous a, So, we consider a, b, c, d a possible group and send it to function findGroupMatch(), to check how many more groups we can find in the next items.)

现在，我们需要遍历每个条目，以找到可能的组，然后将其发送到findGroupMatch()函数，以检查下一个条目中是否存在该组的任何匹配。找到一个可能的组的诀窍是找到一个与前面的任何项匹配的项。如果是，我们会找到一个可能的组，从匹配项开始获取所有前面的项。否则，我们只增加不匹配项的列表，最后我们将所有不匹配项作为单个项组输入。(在给定的例子中,我们有a,b,c,d,一个....当我们在数组中找到第2个a时，它与前面的a匹配，因此，我们考虑a、b、c、d一个可能的组并将其发送到函数findGroupMatch()，以检查在下一个项目中可以找到多少组。

$array = array(
    'a', 'b', 'c', 'd',
    'a', 'b', 'c', 'd',
    'c', 'd',
);

$unMatchedItems = array();
$totalCount = count($array);
$groupsArray = array();

for($i=0; $i < $totalCount; $i++) {
    $item = $array[$i];

    if(in_array($item, $unMatchedItems)) {
        $matched_keys = array_keys($unMatchedItems, $item);
        foreach($matched_keys as $key) {
            $possibleGroup = array_slice($unMatchedItems, $key);

            $matches = findGroupMatch($possibleGroup, $array, $i);

            if ($matches) {
                //Insert the items before group as single item group
                if ($key > 0) {
                    for ($j = 0; $j < $key; $j++) {
                        $groupsArray[] = array('number' => 1, 'values' => array($unMatchedItems[$j]));
                    }
                }
                //Insert the group array
                $groupsArray[] = array('number' => $matches + 1, 'values' => $possibleGroup); //number includes initial group also so $matches + 1
                $i += (count($possibleGroup) * $matches) - 1; //skip the matched items from next iteration
                //Empty the unmatched array to start with a new group search
                $unMatchedItems = array();
                break;
            }
        }
        //If there was no matches, add the item to the unMatched group
        if(!$matches) $unMatchedItems[] = $item;
    } else {
        $unMatchedItems[] = $item;
    }

}

//Insert the remaining items as single item group
for($k=0; $k<count($unMatchedItems); $k++) {
    $groupsArray[] = array('number' => 1, 'values' => array($unMatchedItems[$k]));
}

print_r($groupsArray);

The Result will be like this: (Check this PHP Fiddle for testing and also https://eval.in/507333 for your another input test.)

结果将如下所示:(检查这个PHP小提琴进行测试，以及https://eval。在/507333中进行另一个输入测试。)

Array
(
    [0] => Array
    (
        [number] => 2
        [values] => Array
        (
            [0] => a
            [1] => b
            [2] => c
            [3] => d
        )

    )

    [1] => Array
    (
        [number] => 1
        [values] => Array
        (
            [0] => c
        )

    )

    [2] => Array
    (
        [number] => 1
        [values] => Array
        (
            [0] => d
        )

    )

)

#8

The first example is super easy with recursion. The second example... not so easy.

第一个示例非常容易使用递归。第二个例子……不是那么容易。

The example below works for only the first example by assuming no pattern should ever contain two of the same element. This will also handle all individual element patterns at the end of the original array and keep the pattern order (of the first pattern occurrence).

下面的示例仅适用于第一个示例，假设任何模式都不应该包含两个相同的元素。这还将处理原始数组末尾的所有单个元素模式，并保持模式顺序(第一个模式出现时)。

function find_pattern($input, &$result) {
    $values = []; // currently processed elements
    $pattern = ''; // the current element pattern
    $dupe_found = false; // did we find a duplicate element?

    // search the values for the first that matches a previous value
    while ($next = array_shift($input)) {
        // check if the element was already found
        if (in_array($next, $values)) {
            // re-add the value back into the input, since the next call needs it
            array_unshift($input, $next);

            // add the resulting pattern
            add_pattern($pattern, $values, $result);

            // find the next pattern with a recursive call
            find_pattern($input, $result);

            // a duplicate element was found!
            $dupe_found = true;

            // the rest of the values are handled by recursion, break the while loop
            break;
        } else {
            // not already found, so store the element and keep going
            $values[] = $next;

            // use the element to produce a key for the result set
            $pattern .= $next;
        }
    }

    // if no duplicate was found, then each value should be an individual pattern
    if (!$dupe_found) {
        foreach ($values as $value) {
            add_pattern($value, [$value], $result);
        }
    }
}

function add_pattern($pattern, $values, &$result) {
    // increment the pattern count
    $result[$pattern]['number'] = isset($result[$pattern]['number']) ?
        result[$pattern]['number']+1 : 1;

    // add the current pattern to the result, if not already done
    if (!isset($result[$pattern]['values'])) {
        $result[$pattern]['values'] = $values;
    }
}

And an example usage:

和一个示例用法:

$input = [
    'a', 'b', 'c', 'd',
    'a', 'b', 'c', 'd',
    'c', 'd'
];

$result = [];
find_pattern($input, $result);

echo "<pre>";
print_r($result);
echo "</pre>";

The example output:

示例输出:

Array
(
    [abcd] => Array
    (
        [number] => 2
        [values] => Array
        (
            [0] => a
            [1] => b
            [2] => c
            [3] => d
        )
    )

    [c] => Array
    (
        [number] => 1
        [values] => Array
        (
            [0] => c
        )
    )

    [d] => Array
    (
        [number] => 1
        [values] => Array
        (
            [0] => d
        )
    )
)

#9

You could do something like this:

你可以这样做:

<?php
$array = array(
    'a', 'b', 'c', 'd',
    'a', 'b', 'c', 'd',
    'c', 'd'
);

// Call this function to get your patterns
function patternMatching(array $array) {
    $patterns = array();
    $belongsToPattern = array_fill(0, count($array), false);

    // Find biggest patterns first
    for ($size = (int) (count($array) / 2); $size > 0; $size--) {

        // for each pattern: start at every possible point in the array
        for($start=0; $start <= count($array) - $size; $start++) {

            $p = findPattern($array, $start, $size);

            if($p != null) {

                /* Before we can save the pattern we need to check, if we've found a
                 * pattern that does not collide with patterns we've already found */
                $hasConflict = false;
                foreach($p["positions"] as $idx => $pos) {
                    $PatternConflicts = array_slice($belongsToPattern, $pos, $p["size"]);
                    $hasConflict = $hasConflict || in_array(true, $PatternConflicts);
                }

                if(!$hasConflict) {

                    /* Since we have found a pattern, we don't want to find more 
                     * patterns for these positions */
                    foreach($p["positions"] as $idx => $pos) {
                        $replace = array_fill($pos, $p["size"], true);
                        $belongsToPattern = array_replace($belongsToPattern, $replace);
                    }

                    $patterns[] = $p;
                    // or only return number and values:
                    // $patterns[] = [ "number" => $p["number"], "values" => $p["values"]];
                }
            }
        }
    }

    return $patterns;
}


function findPattern(array $haystack, $patternStart, $patternSize ) {

    $size = count($haystack);
    $patternCandidate = array_slice($haystack, $patternStart, $patternSize);

    $patternCount = 1;
    $patternPositions = [$patternStart];

    for($i = $patternStart + $patternSize; $i <= $size - $patternSize; $i++) {

        $patternCheck = array_slice($haystack, $i, $patternSize);

        $diff = array_diff($patternCandidate, $patternCheck);

        if(empty($diff)) {
            $patternCount++;
            $patternPositions[] = $i;
        }

    }

    if($patternCount > 1 || $patternSize <= 1) {

        return [
            "number"    => $patternCount,
            "values"    => $patternCandidate,

            // Additional information needed for filtering, sorting, etc.
            "positions" => $patternPositions,
            "size"      => $patternSize
        ];
    } else {
        return null;
    }

}

$patterns = patternMatching($array);

print "<pre>";
print_r($patterns);
print "</pre>";

?>

The code might be far from being optimal in speed but it should do what you want to do for any sequence of strings in an array. patternMatching() returns the patterns ordered descending in size of the pattern and ascending by first occurence (You can use ['positions'][0] as a sorting criteria to achieve a different order).

代码在速度上可能远不是最优的，但它应该对数组中的任何字符串序列执行您想要的操作。模式匹配()返回按模式大小降序的模式，并按第一次出现的次数升序(您可以使用['position '][0]作为排序标准来实现不同的排序)。

#10

This should do it:

这应该这样做:

<?php

$array = array(
  'x', 'y', 'x', 'y', 'a',
  'ab', 'c', 'd',
  'a', 'b', 'c', 'd',
  'c', 'd', 'x', 'y', 'b',
  'x', 'y', 'b', 'c', 'd'
);


// convert the array to a string
$string = '';
foreach ($array as $a) {
  $l = strlen($a)-1;
  $string .= ($l) ? str_replace('::',':',$a[0] . ':' . substr($a,1,$l-1) . ':' . $a[$l]) . '-' : $a . '-';
}

// find patterns
preg_match_all('/(?=((.+)(?:.*?\2)+))/s', $string, $matches, PREG_SET_ORDER);
foreach ($matches as $m) {
  $temp = str_replace('--','-',$m[2].'-');
  $patterns[] = ($temp[0]==='-') ? substr($temp,1) : $temp;
}

// remove empty values and duplicates
$patterns = array_keys(array_flip(array_filter($patterns)));

// sort patterns
foreach ($patterns as $p) {
  $sorted[$p] = strlen($p);
}
arsort($sorted);

// find double or more occurences
$stringClone = $string;
foreach ($sorted as $s=>$n) {
  $nA = substr_count($stringClone,':'.$s);
  $nZ = substr_count($stringClone,$s.':');
  $number = substr_count($stringClone,$s);
  $sub = explode('-',substr($stringClone,strpos($stringClone,$s),$n-1));
  $values = $sub;
  if($nA>0 || $nZ>0){
    $numberAdjusted = $number - $nA - $nZ;
    if($numberAdjusted > 1) {
      $temp = '';
      while($n--){
        $temp .= '#';
      }
      $position = strpos(str_replace(':'.$s,':'.$temp,str_replace($s.':',$temp.':',$string)),$s);
      $stringClone = str_replace(':'.$s,':'.$temp,$stringClone);
      $stringClone = str_replace($s.':',$temp.':',$stringClone);
      $result['p'.sprintf('%09d', $position)] = array('number'=>$numberAdjusted,'values'=>$values);
      $stringClone = str_replace($s,'',$stringClone);
      $stringClone = str_replace($temp,$s,$stringClone);
    }
  } else if($number>1){
    $position = strpos($string,$s);
    $result['p'.sprintf('%09d', $position)] = array('number'=>$number,'values'=>$values);
    $stringClone = str_replace($s,'',$stringClone);
  }
}

// add the remaining items
$remaining = array_flip(explode('-',substr($stringClone,0,-1)));
foreach ($remaining as $r=>$n) {
    $position = strpos($string,$r);
    $result['p'.sprintf('%09d', $position)] = array('number'=>1,'values'=>str_replace(':','',$r));
}

// sort results
ksort($result);
$result = array_values($result);

print_r($result);
?>

Working example here.

工作的例子。

#1