创建/拆分字符串到标记的最佳方式

In my php application user able to enter the tags(like here while ask question). I assume it will be regexp, and I used one - mb_split('\W+', $text) - to split by non-word characters.

在我的php应用程序中，用户可以输入标签(比如在这里问问题)。我假设它是regexp，我使用了一个- mb_split('\W+'， $text) -通过非单词字符进行分割。

But I want to allow users to enter characters like "-,_,+,#" etc which are valid ones to be in url and are common.

但我希望允许用户输入“-、_、+、#”等字符，这些字符在url中是有效的，并且很常见。

Is there exiting solutions for this, or may be best practicles?

是否有现成的解决方案，或者可能是最好的实践?

thanks.

谢谢。

8 个解决方案

#1

Use the explode() function and separate by either spaces or commas. Example:

使用blow()函数并通过空格或逗号分隔。例子:

$string = 'tag1 tag-2 tag#3';
$tags = explode(' ', $string); //Tags will be an array

#2

Split by whitespace \s+ instead.

用空格分隔，用s+代替。

#3

Split on \s+ (whitespace) instead of \W+ (non-alphanumeric).

分割成s+(空格)而不是W+(非字母数字)。

#4

I suppose you could first try to clean up the string before splitting it into tags:

我想你可以先清理一下字符串，然后再把它分成标签:

# List characters that you would want to exclude from your tags and clean the string
$exclude = array( '/[?&\/]/', '/\s+/');
$replacements = array('', ' '); 
$tags = preg_replace($exclude, $replacements,  $tags);

# Now split:
$tagsArray = explode(' ', $tags);

You could probably adopt a white list approach to this as well, and rather have characters you accept listed in your pattern.

您也可以采用白名单方法来处理这个问题，并且您可以在您的模式中列出您所接受的字符。

#5

You said you wanted it to work like the * tagger. This tagger splits them by the whitespace character " ".

你说你想让它像*标签那样工作。这个标记符通过空格字符“”分割它们。

If you would like this to be your behavior as well, simply use:

如果你也希望这是你的行为，只需使用:

mb_split('\s+', $text)

instead of:

而不是:

mb_split('\W+', $text)

Good luck!

好运！

#6

I use this smart_explode () function to parse tags in my app:

我使用smart_explosion()函数来解析我的app中的标签:

function smart_explode ($exploder, $string, $sort = '') {
  if (trim ($string) != '') {
    $string = explode ($exploder, $string);
    foreach ($string as $i => $k) {
      $string[$i] = trim ($k);
      if ($k == '') unset ($string[$i]);
    }
    $u = array_unique ($string);
    if ('sort' == $sort) sort ($u);
    return $u;
  } else {
    return array ();
  }
}

It explodes a $string into an array by using $exploder as a separator (usually a comma), removes the duplicated, trims the spaces around tags, and even sorts the tags for you if $sort is 'sort'. It will return an empty array when nothing is inside the $string.

它使用$爆裂器作为分隔符(通常是逗号)，将$字符串分解为数组，删除重复的字符串，整理标记周围的空格，甚至在$sort是“sort”时为您对标记进行排序。当$字符串中没有任何内容时，它将返回一个空数组。

The usage is like:

用法是:

$mytaglist = smart_explode (',', '  PHP,  ,,regEx ,PHP');

The above will return:

以上将返回:

array ('PHP', 'regEx')

To filter the characters you don’t like, do a

要过滤掉你不喜欢的字符，做一个。

 $mytaglist = str_replace (array ('?', '$', '%'), '_', $mytaglist);

before smart_exploding (listing the “bad” characters in the array to get replaced with an underscore).

在smart_explosion之前(在数组中列出要用下划线替换的“坏”字符)。

#7

The correct approach to handling tags depends on your preferences on processing input: You can either remove the invalid tags entirely, or try and clean the tags so they become valid.

处理标签的正确方法取决于您对处理输入的首选项:您可以完全删除无效的标记，或者尝试清理这些标记，以便它们变得有效。

Whitelisting approach to defining valid characters should be used in cleaning the input - there is simply too many problematic characters to blacklist.

定义有效字符的白处理方法应该用于清除输入—有太多的问题字符被列入黑名单。

mb_internal_encoding('utf8');

$tags= 'to# do!"¤ fix-this str&ing';
$allowedLetters='\w';
// Note that the hyphen must be first or last in a character class pattern,
// to match hyphens, instead of specifying a character set range
$allowedSpecials='_+#-';

The first approach removes invalid tags entirely:

第一种方法完全删除无效标记:

// The first way: Ignoring invalid tags

$tagArray = mb_split(' ', $tags);

$pattern = '^[' . $allowedLetters . $allowedSpecials . ']+$';

$validTags = array();
foreach($tagArray as $tag)
{
    $tag = trim($tag);
    $isValid = mb_ereg_match($pattern, $tag);
    if ($isValid)
        $validTags[] = $tag;
}

The second approach tries to clean the tags:

第二种方法尝试清理标签:

// The second way: Cleaning up the tag input

// Remove non-whitelisted characters
$pattern = '[^' . $allowedLetters . $allowedSpecials .']';

$cleanTags = mb_ereg_replace($pattern, ' ', $tags);

// Trim multiple white spaces.
$pattern = '\s+';
$cleanTags = mb_ereg_replace($pattern, ' ', $cleanTags);

$tags = mb_split(' ',$cleanTags);

Replacing illegal characters with whitespace leads to problems sometimes - for example the above "str&ing" is converted to "str ing". Removing the illegal characters entirely would result in "string", which is more useful in some cases.

用空格替换非法字符有时会导致问题——例如上面的“str&ing”被转换为“str ing”。完全删除非法字符将导致“string”，这在某些情况下更有用。

#8

Use preg_match_all.

使用preg_match_all。

$tags = array();
if(preg_match_all('/\s*(.*)\s*/U',$tags)) unset($tags[0]);
//now in $tags you have an array of tags.

if tags are in UTF-8, add u modifier to the regexp.

如果标签是UTF-8，则添加u修饰符到regexp。

#1