如何在数组中搜索字符串的一部分?

时间:2020-12-01 07:08:36

I want to search whether the complete string or a part of the string is a part of the array. How can this be achieved in PHP?

我想要搜索整个字符串还是字符串的一部分是数组的一部分。如何在PHP中实现这一点?

Also, how can I use metaphone in it as well?

还有,我如何在里面也使用变音位呢?

Example:

例子:

array1={'India','USA','China'};
array2={'India is in east','United States of America is USA','Made in China'}

If I search for array1 in array2, then:

如果我在array2中搜索array1,则:

'India' should match 'India is in east' and similarly for USA & China.

“印度”应该与“印度在东方”相匹配,美国和中国也应该如此。

3 个解决方案

#1


0  

$a1 = array('India','USA','China');
$a2 = array('India is in east','United States of America is USA','Made in China');


foreach ( $a2 as $a )
{
  foreach( $a1 as $b  )
  {
    if ( strpos( $a, $b ) > -1 )
    {
      echo $a . " contains " . $b . "\n";
    }
  }
}

#2


4  

$array1 = array('India','USA','China');
$array2 = array('India is in east','United States of America is USA','Made in China');
$found = array();

foreach ($array1 as $key => $value) {
    // Thanks to @Andrea for this suggestion:
    $found[$value] = preg_grep("/$value/", $array2);
    // Alternative:
    //$found = $found + preg_grep("/$value/", $array2);
}

print_r($found);

Result:

结果:

Array
(
    [0] => India is in east
    [1] => United States of America is USA
    [2] => Made in China
)

Using Metaphone is trickier. You will have to determine what constitutes a match. One way to do that is to use the Levenshtein distance between the Methaphone results for the two values being compared.

使用变音位是棘手的。你必须决定什么是比赛。一种方法是对两个值进行比较,使用美沙酮结果之间的Levenshtein距离。

Update: See @Andrea's solution for a more sensible per-word Metaphone comparison.

更新:请参见@Andrea的解决方案,以更合理的逐字变音比较。

Here's a rough example:

这是一个粗略的例子:

$meta1 = array_map(
    create_function( '$v', 'return array(metaphone($v) => $v);' ),
    $array1
);

$meta2 = array_map(
    create_function( '$v', 'return array(metaphone($v) => $v);' ),
    $array2
);

$threshold = 3;

foreach ($meta2 as $key2 => $value2) {

    $k2 = key($value2);
    $v2 = $value2[$k2];

    foreach ($meta1 as $key1 => $value1) {

        $k1  = key($value1);
        $v1  = $value1[$k1];
        $lev = levenshtein($k2, $k1);

        if( strpos($v2, $v1) !== false || levenshtein($k2, $k1) <= $threshold ) {
            array_push( $found, $v2 );
        }
    }
}

...but it needs work. It produces duplicates if the threshold is too high. You may prefer to run the match in two passes. One to find simple matches, as in my first code example, and then another to match with Metaphone if the first returns no matches.

…但它需要工作。如果阈值过高,则产生重复。你可能更喜欢分两次传球。一个用于找到简单的匹配,如我的第一个代码示例,另一个用于匹配变音(如果第一个返回不匹配)。

#3


1  

The metaphone case could also follow the same structure proposed by Mike for the strict case.

变音位情况也可以遵循Mike为严格情况提出的相同结构。

I do not think that an additional similarity function is needed, because the purpose of the metaphone should be to give us a key that is common to words that sound the same.

我认为不需要额外的相似函数,因为变音的目的应该是给我们一个与发音相同的单词相同的键。

$array1 = array('India','USA','China');
$array2 = array(
    'Indiuh is in east',
    'United States of America is USA',
    'Gandhi was born in India',
    'Made in China'
);
$found = array();
foreach ($array1 as $key => $value) {
    $found[$value] = preg_grep('/\b'.$value.'\b/i', $array2);
}

var_export($found);

echo "\n\n";

function meta( $sentence )
{
    return implode(' ', array_map('metaphone', explode(' ', $sentence)));
}

$array2meta = array_map('meta', $array2);
foreach ($array1 as $key => $value) {
    $valuemeta = meta($value);
    $foundmeta[$value] = preg_grep('/\b'.$valuemeta.'\b/', $array2meta);
    $foundmeta[$value] = array_intersect_key($array2, $foundmeta[$value]);
}

var_export($foundmeta);

The above code prints out:

上述代码输出:

array (
  'India' => 
  array (
    2 => 'Gandhi was born in India',
  ),
  'USA' => 
  array (
    1 => 'United States of America is USA',
  ),
  'China' => 
  array (
    3 => 'Made in China',
  ),
)

array (
  'India' => 
  array (
    0 => 'Indiuh is in east',
    2 => 'Gandhi was born in India',
  ),
  'USA' => 
  array (
    1 => 'United States of America is USA',
  ),
  'China' => 
  array (
    3 => 'Made in China',
  ),
)

#1


0  

$a1 = array('India','USA','China');
$a2 = array('India is in east','United States of America is USA','Made in China');


foreach ( $a2 as $a )
{
  foreach( $a1 as $b  )
  {
    if ( strpos( $a, $b ) > -1 )
    {
      echo $a . " contains " . $b . "\n";
    }
  }
}

#2


4  

$array1 = array('India','USA','China');
$array2 = array('India is in east','United States of America is USA','Made in China');
$found = array();

foreach ($array1 as $key => $value) {
    // Thanks to @Andrea for this suggestion:
    $found[$value] = preg_grep("/$value/", $array2);
    // Alternative:
    //$found = $found + preg_grep("/$value/", $array2);
}

print_r($found);

Result:

结果:

Array
(
    [0] => India is in east
    [1] => United States of America is USA
    [2] => Made in China
)

Using Metaphone is trickier. You will have to determine what constitutes a match. One way to do that is to use the Levenshtein distance between the Methaphone results for the two values being compared.

使用变音位是棘手的。你必须决定什么是比赛。一种方法是对两个值进行比较,使用美沙酮结果之间的Levenshtein距离。

Update: See @Andrea's solution for a more sensible per-word Metaphone comparison.

更新:请参见@Andrea的解决方案,以更合理的逐字变音比较。

Here's a rough example:

这是一个粗略的例子:

$meta1 = array_map(
    create_function( '$v', 'return array(metaphone($v) => $v);' ),
    $array1
);

$meta2 = array_map(
    create_function( '$v', 'return array(metaphone($v) => $v);' ),
    $array2
);

$threshold = 3;

foreach ($meta2 as $key2 => $value2) {

    $k2 = key($value2);
    $v2 = $value2[$k2];

    foreach ($meta1 as $key1 => $value1) {

        $k1  = key($value1);
        $v1  = $value1[$k1];
        $lev = levenshtein($k2, $k1);

        if( strpos($v2, $v1) !== false || levenshtein($k2, $k1) <= $threshold ) {
            array_push( $found, $v2 );
        }
    }
}

...but it needs work. It produces duplicates if the threshold is too high. You may prefer to run the match in two passes. One to find simple matches, as in my first code example, and then another to match with Metaphone if the first returns no matches.

…但它需要工作。如果阈值过高,则产生重复。你可能更喜欢分两次传球。一个用于找到简单的匹配,如我的第一个代码示例,另一个用于匹配变音(如果第一个返回不匹配)。

#3


1  

The metaphone case could also follow the same structure proposed by Mike for the strict case.

变音位情况也可以遵循Mike为严格情况提出的相同结构。

I do not think that an additional similarity function is needed, because the purpose of the metaphone should be to give us a key that is common to words that sound the same.

我认为不需要额外的相似函数,因为变音的目的应该是给我们一个与发音相同的单词相同的键。

$array1 = array('India','USA','China');
$array2 = array(
    'Indiuh is in east',
    'United States of America is USA',
    'Gandhi was born in India',
    'Made in China'
);
$found = array();
foreach ($array1 as $key => $value) {
    $found[$value] = preg_grep('/\b'.$value.'\b/i', $array2);
}

var_export($found);

echo "\n\n";

function meta( $sentence )
{
    return implode(' ', array_map('metaphone', explode(' ', $sentence)));
}

$array2meta = array_map('meta', $array2);
foreach ($array1 as $key => $value) {
    $valuemeta = meta($value);
    $foundmeta[$value] = preg_grep('/\b'.$valuemeta.'\b/', $array2meta);
    $foundmeta[$value] = array_intersect_key($array2, $foundmeta[$value]);
}

var_export($foundmeta);

The above code prints out:

上述代码输出:

array (
  'India' => 
  array (
    2 => 'Gandhi was born in India',
  ),
  'USA' => 
  array (
    1 => 'United States of America is USA',
  ),
  'China' => 
  array (
    3 => 'Made in China',
  ),
)

array (
  'India' => 
  array (
    0 => 'Indiuh is in east',
    2 => 'Gandhi was born in India',
  ),
  'USA' => 
  array (
    1 => 'United States of America is USA',
  ),
  'China' => 
  array (
    3 => 'Made in China',
  ),
)