In my php application user able to enter the tags(like here while ask question). I assume it will be regexp, and I used one - mb_split('\W+', $text) - to split by non-word characters.
在我的php应用程序中,用户可以输入标签(比如在这里问问题)。我假设它是regexp,我使用了一个- mb_split('\W+', $text) -通过非单词字符进行分割。
But I want to allow users to enter characters like "-,_,+,#" etc which are valid ones to be in url and are common.
但我希望允许用户输入“-、_、+、#”等字符,这些字符在url中是有效的,并且很常见。
Is there exiting solutions for this, or may be best practicles?
是否有现成的解决方案,或者可能是最好的实践?
thanks.
谢谢。
8 个解决方案
#1
23
Use the explode() function and separate by either spaces or commas. Example:
使用blow()函数并通过空格或逗号分隔。例子:
$string = 'tag1 tag-2 tag#3';
$tags = explode(' ', $string); //Tags will be an array
#2
9
Split by whitespace \s+
instead.
用空格分隔,用s+代替。
#3
3
Split on \s+ (whitespace) instead of \W+ (non-alphanumeric).
分割成s+(空格)而不是W+(非字母数字)。
#4
2
I suppose you could first try to clean up the string before splitting it into tags:
我想你可以先清理一下字符串,然后再把它分成标签:
# List characters that you would want to exclude from your tags and clean the string
$exclude = array( '/[?&\/]/', '/\s+/');
$replacements = array('', ' ');
$tags = preg_replace($exclude, $replacements, $tags);
# Now split:
$tagsArray = explode(' ', $tags);
You could probably adopt a white list approach to this as well, and rather have characters you accept listed in your pattern.
您也可以采用白名单方法来处理这个问题,并且您可以在您的模式中列出您所接受的字符。
#5
2
You said you wanted it to work like the * tagger. This tagger splits them by the whitespace character " ".
你说你想让它像*标签那样工作。这个标记符通过空格字符“”分割它们。
If you would like this to be your behavior as well, simply use:
如果你也希望这是你的行为,只需使用:
mb_split('\s+', $text)
instead of:
而不是:
mb_split('\W+', $text)
Good luck!
好运!
#6
1
I use this smart_explode () function to parse tags in my app:
我使用smart_explosion()函数来解析我的app中的标签:
function smart_explode ($exploder, $string, $sort = '') {
if (trim ($string) != '') {
$string = explode ($exploder, $string);
foreach ($string as $i => $k) {
$string[$i] = trim ($k);
if ($k == '') unset ($string[$i]);
}
$u = array_unique ($string);
if ('sort' == $sort) sort ($u);
return $u;
} else {
return array ();
}
}
It explodes a $string into an array by using $exploder as a separator (usually a comma), removes the duplicated, trims the spaces around tags, and even sorts the tags for you if $sort is 'sort'. It will return an empty array when nothing is inside the $string.
它使用$爆裂器作为分隔符(通常是逗号),将$字符串分解为数组,删除重复的字符串,整理标记周围的空格,甚至在$sort是“sort”时为您对标记进行排序。当$字符串中没有任何内容时,它将返回一个空数组。
The usage is like:
用法是:
$mytaglist = smart_explode (',', ' PHP, ,,regEx ,PHP');
The above will return:
以上将返回:
array ('PHP', 'regEx')
To filter the characters you don’t like, do a
要过滤掉你不喜欢的字符,做一个。
$mytaglist = str_replace (array ('?', '$', '%'), '_', $mytaglist);
before smart_exploding (listing the “bad” characters in the array to get replaced with an underscore).
在smart_explosion之前(在数组中列出要用下划线替换的“坏”字符)。
#7
1
The correct approach to handling tags depends on your preferences on processing input: You can either remove the invalid tags entirely, or try and clean the tags so they become valid.
处理标签的正确方法取决于您对处理输入的首选项:您可以完全删除无效的标记,或者尝试清理这些标记,以便它们变得有效。
Whitelisting approach to defining valid characters should be used in cleaning the input - there is simply too many problematic characters to blacklist.
定义有效字符的白处理方法应该用于清除输入—有太多的问题字符被列入黑名单。
mb_internal_encoding('utf8');
$tags= 'to# do!"¤ fix-this str&ing';
$allowedLetters='\w';
// Note that the hyphen must be first or last in a character class pattern,
// to match hyphens, instead of specifying a character set range
$allowedSpecials='_+#-';
The first approach removes invalid tags entirely:
第一种方法完全删除无效标记:
// The first way: Ignoring invalid tags
$tagArray = mb_split(' ', $tags);
$pattern = '^[' . $allowedLetters . $allowedSpecials . ']+$';
$validTags = array();
foreach($tagArray as $tag)
{
$tag = trim($tag);
$isValid = mb_ereg_match($pattern, $tag);
if ($isValid)
$validTags[] = $tag;
}
The second approach tries to clean the tags:
第二种方法尝试清理标签:
// The second way: Cleaning up the tag input
// Remove non-whitelisted characters
$pattern = '[^' . $allowedLetters . $allowedSpecials .']';
$cleanTags = mb_ereg_replace($pattern, ' ', $tags);
// Trim multiple white spaces.
$pattern = '\s+';
$cleanTags = mb_ereg_replace($pattern, ' ', $cleanTags);
$tags = mb_split(' ',$cleanTags);
Replacing illegal characters with whitespace leads to problems sometimes - for example the above "str&ing" is converted to "str ing". Removing the illegal characters entirely would result in "string", which is more useful in some cases.
用空格替换非法字符有时会导致问题——例如上面的“str&ing”被转换为“str ing”。完全删除非法字符将导致“string”,这在某些情况下更有用。
#8
0
Use preg_match_all
.
使用preg_match_all。
$tags = array();
if(preg_match_all('/\s*(.*)\s*/U',$tags)) unset($tags[0]);
//now in $tags you have an array of tags.
if tags are in UTF-8, add u
modifier to the regexp.
如果标签是UTF-8,则添加u修饰符到regexp。
#1
23
Use the explode() function and separate by either spaces or commas. Example:
使用blow()函数并通过空格或逗号分隔。例子:
$string = 'tag1 tag-2 tag#3';
$tags = explode(' ', $string); //Tags will be an array
#2
9
Split by whitespace \s+
instead.
用空格分隔,用s+代替。
#3
3
Split on \s+ (whitespace) instead of \W+ (non-alphanumeric).
分割成s+(空格)而不是W+(非字母数字)。
#4
2
I suppose you could first try to clean up the string before splitting it into tags:
我想你可以先清理一下字符串,然后再把它分成标签:
# List characters that you would want to exclude from your tags and clean the string
$exclude = array( '/[?&\/]/', '/\s+/');
$replacements = array('', ' ');
$tags = preg_replace($exclude, $replacements, $tags);
# Now split:
$tagsArray = explode(' ', $tags);
You could probably adopt a white list approach to this as well, and rather have characters you accept listed in your pattern.
您也可以采用白名单方法来处理这个问题,并且您可以在您的模式中列出您所接受的字符。
#5
2
You said you wanted it to work like the * tagger. This tagger splits them by the whitespace character " ".
你说你想让它像*标签那样工作。这个标记符通过空格字符“”分割它们。
If you would like this to be your behavior as well, simply use:
如果你也希望这是你的行为,只需使用:
mb_split('\s+', $text)
instead of:
而不是:
mb_split('\W+', $text)
Good luck!
好运!
#6
1
I use this smart_explode () function to parse tags in my app:
我使用smart_explosion()函数来解析我的app中的标签:
function smart_explode ($exploder, $string, $sort = '') {
if (trim ($string) != '') {
$string = explode ($exploder, $string);
foreach ($string as $i => $k) {
$string[$i] = trim ($k);
if ($k == '') unset ($string[$i]);
}
$u = array_unique ($string);
if ('sort' == $sort) sort ($u);
return $u;
} else {
return array ();
}
}
It explodes a $string into an array by using $exploder as a separator (usually a comma), removes the duplicated, trims the spaces around tags, and even sorts the tags for you if $sort is 'sort'. It will return an empty array when nothing is inside the $string.
它使用$爆裂器作为分隔符(通常是逗号),将$字符串分解为数组,删除重复的字符串,整理标记周围的空格,甚至在$sort是“sort”时为您对标记进行排序。当$字符串中没有任何内容时,它将返回一个空数组。
The usage is like:
用法是:
$mytaglist = smart_explode (',', ' PHP, ,,regEx ,PHP');
The above will return:
以上将返回:
array ('PHP', 'regEx')
To filter the characters you don’t like, do a
要过滤掉你不喜欢的字符,做一个。
$mytaglist = str_replace (array ('?', '$', '%'), '_', $mytaglist);
before smart_exploding (listing the “bad” characters in the array to get replaced with an underscore).
在smart_explosion之前(在数组中列出要用下划线替换的“坏”字符)。
#7
1
The correct approach to handling tags depends on your preferences on processing input: You can either remove the invalid tags entirely, or try and clean the tags so they become valid.
处理标签的正确方法取决于您对处理输入的首选项:您可以完全删除无效的标记,或者尝试清理这些标记,以便它们变得有效。
Whitelisting approach to defining valid characters should be used in cleaning the input - there is simply too many problematic characters to blacklist.
定义有效字符的白处理方法应该用于清除输入—有太多的问题字符被列入黑名单。
mb_internal_encoding('utf8');
$tags= 'to# do!"¤ fix-this str&ing';
$allowedLetters='\w';
// Note that the hyphen must be first or last in a character class pattern,
// to match hyphens, instead of specifying a character set range
$allowedSpecials='_+#-';
The first approach removes invalid tags entirely:
第一种方法完全删除无效标记:
// The first way: Ignoring invalid tags
$tagArray = mb_split(' ', $tags);
$pattern = '^[' . $allowedLetters . $allowedSpecials . ']+$';
$validTags = array();
foreach($tagArray as $tag)
{
$tag = trim($tag);
$isValid = mb_ereg_match($pattern, $tag);
if ($isValid)
$validTags[] = $tag;
}
The second approach tries to clean the tags:
第二种方法尝试清理标签:
// The second way: Cleaning up the tag input
// Remove non-whitelisted characters
$pattern = '[^' . $allowedLetters . $allowedSpecials .']';
$cleanTags = mb_ereg_replace($pattern, ' ', $tags);
// Trim multiple white spaces.
$pattern = '\s+';
$cleanTags = mb_ereg_replace($pattern, ' ', $cleanTags);
$tags = mb_split(' ',$cleanTags);
Replacing illegal characters with whitespace leads to problems sometimes - for example the above "str&ing" is converted to "str ing". Removing the illegal characters entirely would result in "string", which is more useful in some cases.
用空格替换非法字符有时会导致问题——例如上面的“str&ing”被转换为“str ing”。完全删除非法字符将导致“string”,这在某些情况下更有用。
#8
0
Use preg_match_all
.
使用preg_match_all。
$tags = array();
if(preg_match_all('/\s*(.*)\s*/U',$tags)) unset($tags[0]);
//now in $tags you have an array of tags.
if tags are in UTF-8, add u
modifier to the regexp.
如果标签是UTF-8,则添加u修饰符到regexp。