I need to check a string to see if any word in it has multiple occurences. So basically I will accept:
我需要检查一个字符串,看看它中是否有任何单词出现多次。基本上我会接受:
"google makes love"
“谷歌做爱”
but I don't accept:
但我不接受:
"google makes google love" or "google makes love love google" etc.
“google make google love”或“google make love love google”等。
Any ideas? Really don't know any way to approach this, any help would be greatly appreciated.
有任何想法吗?真的不知道如何处理这个,任何帮助将不胜感激。
9 个解决方案
#1
Based on Wicked Flea code:
基于Wicked Flea代码:
function single_use_of_words($str) {
$words = explode(' ', trim($str)); //Trim to prevent any extra blank
if (count(array_unique($words)) == count($words)) {
return true; //Same amount of words
}
return false;
}
#2
Try this:
function single_use_of_words($str) {
$words = explode(' ', $str);
$words = array_unique($words);
return implode(' ', $words);
}
#3
No need for loops or arrays:
不需要循环或数组:
<?php
$needle = 'cat';
$haystack = 'cat in the cat hat';
if ( occursMoreThanOnce($haystack, $needle) ) {
echo 'Success';
}
function occursMoreThanOnce($haystack, $needle) {
return strpos($haystack, $needle) !== strrpos($haystack, $needle);
}
?>
#4
<?php
$words = preg_split('\b', $string, PREG_SPLIT_NO_EMPTY);
$wordsUnique = array_unique($words);
if (count($words) != count($wordsUnique)) {
echo 'Duplicate word found!';
}
?>
#5
The regular expression way would definitely be my choice.
正则表达方式绝对是我的选择。
I did a little test on a string of 320 words with Veynom's function and a regular expression
我用一个包含Veynom功能和正则表达式的320个单词进行了一点测试
function preg( $txt ) {
return !preg_match( '/\b(\w+)\b.*?\1/', $txt );
}
Here's the test
这是测试
$time['preg'] = microtime( true );
for( $i = 0; $i < 1000; $i++ ) {
preg( $txt );
}
$time['preg'] = microtime( true ) - $time['preg'];
$time['veynom-thewickedflea'] = microtime( true );
for( $i = 0; $i < 1000; $i++ ) {
single_use_of_words( $txt );
}
$time['veynom-thewickedflea'] = microtime( true ) - $time['veynom-thewickedflea'];
print_r( $time );
And here's the result I got
这是我得到的结果
Array
(
[preg] => 0.197616815567
[veynom-thewickedflea] => 0.487532138824
)
Which suggests that the RegExp solution, as well as being a lot more concise is more than twice as fast. ( for a string of 320 words anr 1000 iterations )
这表明RegExp解决方案以及更简洁的解决方案速度提高了两倍多。 (对于一个320字的字符串和1000次迭代)
When I run the test over 10 000 iterations I get
当我运行测试超过10 000次迭代时,我得到了
Array
(
[preg] => 1.51235699654
[veynom-thewickedflea] => 4.99487900734
)
The non RegExp solution also uses a lot more memory.
非RegExp解决方案还使用了更多内存。
So.. Regular Expressions for me cos they've got a full tank of gas
所以..正则表达式对我来说,他们有一整箱汽油
EDIT
The text I tested against has duplicate words, If it doesn't, the results may be different. I'll post another set of results.
编辑我测试的文本有重复的单词,如果没有,结果可能会有所不同。我会发布另一组结果。
Update
With the duplicates stripped out ( now 186 words ) the results for 1000 iterations is:
更新删除重复项(现为186个单词)后,1000次迭代的结果为:
Array
(
[preg] => 0.235826015472
[veynom-thewickedflea] => 0.2528860569
)
About evens
#6
function Accept($str)
{
$words = explode(" ", trim($str));
$len = count($words);
for ($i = 0; $i < $len; $i++)
{
for ($p = 0; $p < $len; $p++)
{
if ($p != $i && $words[$i] == $words[$p])
{
return false;
}
}
}
return true;
}
EDIT
Entire test script. Note, when printing "false" php just prints nothing but true is printed as "1".
整个测试脚本。注意,当打印“假”时,php只打印,但是true打印为“1”。
<?php
function Accept($str)
{
$words = explode(" ", trim($str));
$len = count($words);
for ($i = 0; $i < $len; $i++)
{
for ($p = 0; $p < $len; $p++)
{
if ($p != $i && $words[$i] == $words[$p])
{
return false;
}
}
}
return true;
}
echo Accept("google makes love"), ", ", Accept("google makes google love"), ", ",
Accept("google makes love love google"), ", ", Accept("babe health insurance babe");
?>
Prints the correct output:
打印正确的输出:
1, , ,
#7
This seems fairly fast. It would be interesting to see (for all the answers) how the memory usage and time taken increase as you increase the length of the input string.
这似乎相当快。看到(对于所有答案),当你增加输入字符串的长度时,内存使用和时间的增加会很有趣。
function check($str) {
//remove double spaces
$c = 1;
while ($c) $str = str_replace(' ', ' ', $str, $c);
//split into array of words
$words = explode(' ', $str);
foreach ($words as $key => $word) {
//remove current word from array
unset($words[$key]);
//if it still exists in the array it must be duplicated
if (in_array($word, $words)) {
return false;
}
}
return true;
}
Edit
Fixed issue with multiple spaces. I'm not sure whether it is better to remove these at the start (as I have) or check each word is non-empty in the foreach.
修复了多个空格的问题。我不确定是否最好在开始时删除它们(正如我所知)或检查每个单词在foreach中是否为空。
#8
The simplest method is to loop through each word and check against all previous words for duplicates.
最简单的方法是遍历每个单词并检查所有先前的单词是否有重复。
#9
#1
Based on Wicked Flea code:
基于Wicked Flea代码:
function single_use_of_words($str) {
$words = explode(' ', trim($str)); //Trim to prevent any extra blank
if (count(array_unique($words)) == count($words)) {
return true; //Same amount of words
}
return false;
}
#2
Try this:
function single_use_of_words($str) {
$words = explode(' ', $str);
$words = array_unique($words);
return implode(' ', $words);
}
#3
No need for loops or arrays:
不需要循环或数组:
<?php
$needle = 'cat';
$haystack = 'cat in the cat hat';
if ( occursMoreThanOnce($haystack, $needle) ) {
echo 'Success';
}
function occursMoreThanOnce($haystack, $needle) {
return strpos($haystack, $needle) !== strrpos($haystack, $needle);
}
?>
#4
<?php
$words = preg_split('\b', $string, PREG_SPLIT_NO_EMPTY);
$wordsUnique = array_unique($words);
if (count($words) != count($wordsUnique)) {
echo 'Duplicate word found!';
}
?>
#5
The regular expression way would definitely be my choice.
正则表达方式绝对是我的选择。
I did a little test on a string of 320 words with Veynom's function and a regular expression
我用一个包含Veynom功能和正则表达式的320个单词进行了一点测试
function preg( $txt ) {
return !preg_match( '/\b(\w+)\b.*?\1/', $txt );
}
Here's the test
这是测试
$time['preg'] = microtime( true );
for( $i = 0; $i < 1000; $i++ ) {
preg( $txt );
}
$time['preg'] = microtime( true ) - $time['preg'];
$time['veynom-thewickedflea'] = microtime( true );
for( $i = 0; $i < 1000; $i++ ) {
single_use_of_words( $txt );
}
$time['veynom-thewickedflea'] = microtime( true ) - $time['veynom-thewickedflea'];
print_r( $time );
And here's the result I got
这是我得到的结果
Array
(
[preg] => 0.197616815567
[veynom-thewickedflea] => 0.487532138824
)
Which suggests that the RegExp solution, as well as being a lot more concise is more than twice as fast. ( for a string of 320 words anr 1000 iterations )
这表明RegExp解决方案以及更简洁的解决方案速度提高了两倍多。 (对于一个320字的字符串和1000次迭代)
When I run the test over 10 000 iterations I get
当我运行测试超过10 000次迭代时,我得到了
Array
(
[preg] => 1.51235699654
[veynom-thewickedflea] => 4.99487900734
)
The non RegExp solution also uses a lot more memory.
非RegExp解决方案还使用了更多内存。
So.. Regular Expressions for me cos they've got a full tank of gas
所以..正则表达式对我来说,他们有一整箱汽油
EDIT
The text I tested against has duplicate words, If it doesn't, the results may be different. I'll post another set of results.
编辑我测试的文本有重复的单词,如果没有,结果可能会有所不同。我会发布另一组结果。
Update
With the duplicates stripped out ( now 186 words ) the results for 1000 iterations is:
更新删除重复项(现为186个单词)后,1000次迭代的结果为:
Array
(
[preg] => 0.235826015472
[veynom-thewickedflea] => 0.2528860569
)
About evens
#6
function Accept($str)
{
$words = explode(" ", trim($str));
$len = count($words);
for ($i = 0; $i < $len; $i++)
{
for ($p = 0; $p < $len; $p++)
{
if ($p != $i && $words[$i] == $words[$p])
{
return false;
}
}
}
return true;
}
EDIT
Entire test script. Note, when printing "false" php just prints nothing but true is printed as "1".
整个测试脚本。注意,当打印“假”时,php只打印,但是true打印为“1”。
<?php
function Accept($str)
{
$words = explode(" ", trim($str));
$len = count($words);
for ($i = 0; $i < $len; $i++)
{
for ($p = 0; $p < $len; $p++)
{
if ($p != $i && $words[$i] == $words[$p])
{
return false;
}
}
}
return true;
}
echo Accept("google makes love"), ", ", Accept("google makes google love"), ", ",
Accept("google makes love love google"), ", ", Accept("babe health insurance babe");
?>
Prints the correct output:
打印正确的输出:
1, , ,
#7
This seems fairly fast. It would be interesting to see (for all the answers) how the memory usage and time taken increase as you increase the length of the input string.
这似乎相当快。看到(对于所有答案),当你增加输入字符串的长度时,内存使用和时间的增加会很有趣。
function check($str) {
//remove double spaces
$c = 1;
while ($c) $str = str_replace(' ', ' ', $str, $c);
//split into array of words
$words = explode(' ', $str);
foreach ($words as $key => $word) {
//remove current word from array
unset($words[$key]);
//if it still exists in the array it must be duplicated
if (in_array($word, $words)) {
return false;
}
}
return true;
}
Edit
Fixed issue with multiple spaces. I'm not sure whether it is better to remove these at the start (as I have) or check each word is non-empty in the foreach.
修复了多个空格的问题。我不确定是否最好在开始时删除它们(正如我所知)或检查每个单词在foreach中是否为空。
#8
The simplest method is to loop through each word and check against all previous words for duplicates.
最简单的方法是遍历每个单词并检查所有先前的单词是否有重复。