从一个数组中搜索关键字与另一个数组的值 - php

时间:2021-12-09 16:03:17

I have 2 arrays. One with bad keywords and the other with names of sites.

我有2个阵列。一个有坏关键字,另一个有网站名称。

$bad_keywords = array('google', 
                      'twitter', 
                      'facebook');

$sites = array('youtube.com', 'google.com', 'm.google.co.uk', 'walmart.com', 'thezoo.com', 'etc.com');

Simple task: I need to filter through the $sites array and filter out any value that contains any keyword that is found in the $bad_keywords array. At the end of it I need an array with clean values that I would not find any bad_keywords occurring at all.

简单的任务:我需要过滤$ sites数组并过滤掉包含$ bad_keywords数组中找到的任何关键字的任何值。在它结束时,我需要一个具有干净值的数组,我根本不会发现任何bad_keywords。

I have scoured the web and can't seem to find a simple easy solution for this. Here are several methods that I have tried:
1. using 2 foreach loops (feels slower - I think using in-built php functions will speed it up)
2. array_walk
3. array_filter

我已经在网上搜索过,似乎找不到一个简单易用的解决方案。以下是我尝试过的几种方法:1。使用2个foreach循环(感觉较慢 - 我认为使用内置的php函数会加快它的速度)2。array_walk 3. array_filter

But I haven't managed to nail down the best, most efficient way. I want to have a tool that will filter through a list of 20k+ sites against a list of keywords that may be up to 1k long, so performance is paramount. Also, what would be the better method for the actual search in this case - regex or strpos?

但我还没有设法确定最好,最有效的方法。我希望有一个工具可以根据可能长达1k的关键字列表过滤20k +网站列表,因此性能至关重要。此外,在这种情况下,实际搜索的更好方法是什么 - 正则表达式或strpos?

What other options are there to do this and what would be the best way?

还有哪些其他选择可以做到这一点以及最好的方法是什么?

1 个解决方案

#1


2  

Short solution using preg_grep function:

使用preg_grep函数的简短解决方案:

$result = preg_grep('/'. implode('|', $bad_keywords) .'/', $sites, 1);
print_r($result);

The output:

Array
(
    [0] => youtube.com
    [3] => walmart.com
    [4] => thezoo.com
    [5] => etc.com
)

#1


2  

Short solution using preg_grep function:

使用preg_grep函数的简短解决方案:

$result = preg_grep('/'. implode('|', $bad_keywords) .'/', $sites, 1);
print_r($result);

The output:

Array
(
    [0] => youtube.com
    [3] => walmart.com
    [4] => thezoo.com
    [5] => etc.com
)