php - strpos是在大量文本中搜索字符串的最快方法吗？

if (strpos(htmlentities($storage->getMessage($i)),'chocolate'))

Hi, I'm using gmail oauth access to find specific text strings in email addresses. Is there a way to find text instances quicker and more efficiently than using strpos in the above code? Should I be using a hash technique?

嗨,我正在使用gmail oauth访问权限来查找电子邮件地址中的特定文本字符串。有没有办法比在上面的代码中使用strpos更快更有效地查找文本实例?我应该使用哈希技术吗?

2 个解决方案

#1

According to the PHP manual, yes- strpos() is the quickest way to determine if one string contains another.

根据PHP手册,yes- strpos()是确定一个字符串是否包含另一个字符串的最快方法。

Note:

If you only want to determine if a particular needle occurs within haystack, use the faster and less memory intensive function strpos() instead.

如果您只想确定特定针是否出现在haystack中,请使用更快且内存更少的内存密集型函数strpos()。

This is quoted time and again in any php.net article about other string comparators (I pulled this one from strstr())

这是在任何关于其他字符串比较器的php.net文章中引用的一次又一次(我从strstr()中提取了这个)

Although there are two changes that should be made to your statement.

虽然应该对您的陈述进行两处更改。

if (strpos($storage->getMessage($i),'chocolate') !== FALSE)

This is because if(0) evaluates to false (and therefore doesn't run), however strpos() can return 0 if the needle is at the very beginning (position 0) of the haystack. Also, removing htmlentities() will make your code run a lot faster. All that htmlentities() does is replace certain characters with their appropriate HTML equivalent. For instance, it replaces every & with &

这是因为if(0)求值为false(因此不运行),但是如果针位于haystack的最开头(位置0),strpos()可以返回0。此外,删除htmlentities()将使您的代码运行更快。 htmlentities()所做的就是用适当的HTML等价替换某些字符。例如,它取代了每个&与&

As you can imagine, checking every character in a string individually and replacing many of them takes extra memory and processor power. Not only that, but it's unnecessary if you plan on just doing a text comparison. For instance, compare the following statements:

可以想象,单独检查字符串中的每个字符并替换其中的许多字符需要额外的内存和处理器能力。不仅如此,如果您计划进行文本比较,则不需要这样做。例如,比较以下语句:

strpos('Billy & Sally', '&'); // 6
strpos('Billy &amp; Sally', '&'); // 6
strpos('Billy & Sally', 'S'); // 8
strpos('Billy &amp; Sally', 'S') // 12

Or, in the worst case, you may even cause something true to evaluate to false.

或者,在最坏的情况下,您甚至可能会将某些内容评估为false。

strpos('<img src...', '<'); // 0
strpos('&lt;img src...','<'); // FALSE

In order to circumvent this you'd end up using even more HTML entities.

为了避免这种情况,你最终会使用更多的HTML实体。

strpos('&lt;img src...', '&lt;'); // 0

But this, as you can imagine, is not only annoying to code but gets redundant. You're better off excluding HTML entities entirely. Usually HTML entities is only used when you're outputting text. Not comparing.

但是,正如您可以想象的那样,这不仅令代码烦恼,而且变得多余。你最好完全排除HTML实体。通常,HTML实体仅在您输出文本时使用。不比较。

#2

strpos is likely to be faster than preg_match and the alternatives in this case, the best idea would be to do some benchmarks of your own with real example data and see what is best for your needs, although that may be overdoing it. Don't worry too much about performance until it starts to become a problem

strpos可能比preg_match更快,在这种情况下的替代品,最好的想法是用真实的示例数据做一些你自己的基准测试,看看什么最适合你的需求,尽管这可能是过度的。在性能开始成为问题之前,不要过分担心性能

#1