如何从文本中删除变音符?

时间:2021-09-27 22:16:09

I am making a swedish website, and swedish letters are å, ä, and ö.

我正在制作一个瑞典网站,瑞典字母是a, a, o。

I need to make a string entered by a user to become url-safe with PHP.

我需要创建一个用户输入的字符串,以使PHP成为url安全的。

Basically, need to convert all characters to underscore, all EXCEPT these:

基本上,需要将所有字符转换为下划线,除了这些:

 A-Z, a-z, 1-9

and all swedish should be converted like this:

所有瑞典人都应该像这样转变:

'å' to 'a' and 'ä' to 'a' and 'ö' to 'o' (just remove the dots above).

“a”到“a”,“a”到“a”,“o”到“o”(去掉上面的点)。

The rest should become underscores as I said.

剩下的应该像我说的那样成为下划线。

Im not good at regular expressions so I would appreciate the help guys!

我不擅长正则表达式,所以我很感激那些帮忙的人!

Thanks

谢谢

NOTE: NOT URLENCODE...I need to store it in a database... etc etc, urlencode wont work for me.

注意:不是URLENCODE……我需要将它存储在数据库中……等等,urlencode对我没用。

9 个解决方案

#1


13  

// normalize data (remove accent marks) using PHP's *intl* extension
$data = normalizer_normalize($data);

// replace everything NOT in the sets you specified with an underscore
$data = preg_replace("#[^A-Za-z1-9]#","_", $data);

#2


18  

This should be useful which handles almost all the cases.

这对于处理几乎所有的情况应该是有用的。

function Unaccent($string)
{
    return preg_replace('~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml|caron);~i', '$1', htmlentities($string, ENT_COMPAT, 'UTF-8'));
}

#3


18  

Use iconv to convert strings from a given encoding to ASCII, then replace non-alphanumeric characters using preg_replace:

使用iconv将给定编码的字符串转换为ASCII,然后使用preg_replace替换非字母数字字符:

$input = 'räksmörgås och köttbullar'; // UTF8 encoded
$input = iconv('UTF-8', 'ASCII//TRANSLIT', $input);
$input = preg_replace('/[^a-zA-Z0-9]/', '_', $input);
echo $input;

Result:

结果:

raksmorgas_och_kottbullar

#4


7  

and all swedish should be converted like this:

所有瑞典人都应该像这样转变:

'å' to 'a' and 'ä' to 'a' and 'ö' to 'o' (just remove the dots above).

“a”到“a”,“a”到“a”,“o”到“o”(去掉上面的点)。

Use normalizer_normalize() to get rid of diacritical marks.

使用normalizer_normalize()来除去标记。

The rest should become underscores as I said.

剩下的应该像我说的那样成为下划线。

Use preg_replace() with a pattern of [\W] (i.o.w: any character which doesn't match letters, digits or underscore) to replace them by underscores.

使用preg_replace()模式为[\W] (i.o)。w:任何不匹配字母、数字或下划线的字符都可以用下划线代替。

Final result should look like:

最终结果应如下:

$data = preg_replace('[\W]', '_', normalizer_normalize($data));

#5


4  

If you're just interested in making things URL safe, then you want urlencode.

如果你只想让东西的URL安全,那么你需要urlencode。

Returns a string in which all non-alphanumeric characters except -_. have been replaced with a percent (%) sign followed by two hex digits and spaces encoded as plus (+) signs. It is encoded the same way that the posted data from a WWW form is encoded, that is the same way as in application/x-www-form-urlencoded media type. This differs from the » RFC 1738 encoding (see rawurlencode()) in that for historical reasons, spaces are encoded as plus (+) signs.

返回一个字符串,其中除-_之外所有非字母数字字符。用百分号(%)和两个十六进制数字以及加号(+)符号编码的空格替换。它的编码方式与来自WWW表单的已发布数据的编码方式相同,这与应用程序/x- WWW -form- urlencoding媒体类型的编码方式相同。这与»RFC 1738编码(参见rawurlencode())不同,因为历史原因,空格被编码为+(+)符号。

If you really want to strip all non A-Z, a-z, 1-9 (what's wrong with 0, by the way?), then you want:

如果你真的想去掉所有的非A-Z, A-Z, 1-9(顺便问一下,0有什么问题?)

$mynewstring = preg_replace('/[^A-Za-z1-9]/', '', $str);

#6


2  

as simple as

那么简单

 $str = str_replace(array('å', 'ä', 'ö'), array('a', 'a', 'o'), $str); 
 $str = preg_replace('/[^a-z0-9]+/', '_', strtolower($str));

assuming you use the same encoding for your data and your code.

假设您对数据和代码使用相同的编码。

#7


1  

One simple solution is to use str_replace function with search and replace letter arrays.

一个简单的解决方案是使用搜索和替换字母数组的str_replace函数。

#8


0  

You don't need fancy regexps to filter the swedish chars, just use the strtr function to "translate" them, like:

您不需要华丽的regexp来过滤瑞典chars,只需使用strtr函数来“翻译”它们,比如:

$your_URL = "www.mäåö.com";
$good_URL = strtr($your_URL, "äåöë etc...", "aaoe etc...");
echo $good_URL;

->output: www.maao.com :)

- >输出:www.maao.com:)

#9


0  

If intl php extension is enabled, you can use Transliterator like this :

如果启用了intl php扩展,可以使用如下的音译器:

protected function removeDiacritics($string)
{
    $transliterator = \Transliterator::create('NFD; [:Nonspacing Mark:] Remove; NFC;');
    return $transliterator->transliterate($string);
}

To remove other special chars (not diacritics only like 'æ')

删除其他特殊字符(不区分标志只喜欢“æ”)

protected function removeDiacritics($string)
{
    $transliterator = \Transliterator::createFromRules(
        ':: Any-Latin; :: Latin-ASCII; :: NFD; :: [:Nonspacing Mark:] Remove; :: NFC;',
        \Transliterator::FORWARD
    );
    return $transliterator->transliterate($string);
}

#1


13  

// normalize data (remove accent marks) using PHP's *intl* extension
$data = normalizer_normalize($data);

// replace everything NOT in the sets you specified with an underscore
$data = preg_replace("#[^A-Za-z1-9]#","_", $data);

#2


18  

This should be useful which handles almost all the cases.

这对于处理几乎所有的情况应该是有用的。

function Unaccent($string)
{
    return preg_replace('~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml|caron);~i', '$1', htmlentities($string, ENT_COMPAT, 'UTF-8'));
}

#3


18  

Use iconv to convert strings from a given encoding to ASCII, then replace non-alphanumeric characters using preg_replace:

使用iconv将给定编码的字符串转换为ASCII,然后使用preg_replace替换非字母数字字符:

$input = 'räksmörgås och köttbullar'; // UTF8 encoded
$input = iconv('UTF-8', 'ASCII//TRANSLIT', $input);
$input = preg_replace('/[^a-zA-Z0-9]/', '_', $input);
echo $input;

Result:

结果:

raksmorgas_och_kottbullar

#4


7  

and all swedish should be converted like this:

所有瑞典人都应该像这样转变:

'å' to 'a' and 'ä' to 'a' and 'ö' to 'o' (just remove the dots above).

“a”到“a”,“a”到“a”,“o”到“o”(去掉上面的点)。

Use normalizer_normalize() to get rid of diacritical marks.

使用normalizer_normalize()来除去标记。

The rest should become underscores as I said.

剩下的应该像我说的那样成为下划线。

Use preg_replace() with a pattern of [\W] (i.o.w: any character which doesn't match letters, digits or underscore) to replace them by underscores.

使用preg_replace()模式为[\W] (i.o)。w:任何不匹配字母、数字或下划线的字符都可以用下划线代替。

Final result should look like:

最终结果应如下:

$data = preg_replace('[\W]', '_', normalizer_normalize($data));

#5


4  

If you're just interested in making things URL safe, then you want urlencode.

如果你只想让东西的URL安全,那么你需要urlencode。

Returns a string in which all non-alphanumeric characters except -_. have been replaced with a percent (%) sign followed by two hex digits and spaces encoded as plus (+) signs. It is encoded the same way that the posted data from a WWW form is encoded, that is the same way as in application/x-www-form-urlencoded media type. This differs from the » RFC 1738 encoding (see rawurlencode()) in that for historical reasons, spaces are encoded as plus (+) signs.

返回一个字符串,其中除-_之外所有非字母数字字符。用百分号(%)和两个十六进制数字以及加号(+)符号编码的空格替换。它的编码方式与来自WWW表单的已发布数据的编码方式相同,这与应用程序/x- WWW -form- urlencoding媒体类型的编码方式相同。这与»RFC 1738编码(参见rawurlencode())不同,因为历史原因,空格被编码为+(+)符号。

If you really want to strip all non A-Z, a-z, 1-9 (what's wrong with 0, by the way?), then you want:

如果你真的想去掉所有的非A-Z, A-Z, 1-9(顺便问一下,0有什么问题?)

$mynewstring = preg_replace('/[^A-Za-z1-9]/', '', $str);

#6


2  

as simple as

那么简单

 $str = str_replace(array('å', 'ä', 'ö'), array('a', 'a', 'o'), $str); 
 $str = preg_replace('/[^a-z0-9]+/', '_', strtolower($str));

assuming you use the same encoding for your data and your code.

假设您对数据和代码使用相同的编码。

#7


1  

One simple solution is to use str_replace function with search and replace letter arrays.

一个简单的解决方案是使用搜索和替换字母数组的str_replace函数。

#8


0  

You don't need fancy regexps to filter the swedish chars, just use the strtr function to "translate" them, like:

您不需要华丽的regexp来过滤瑞典chars,只需使用strtr函数来“翻译”它们,比如:

$your_URL = "www.mäåö.com";
$good_URL = strtr($your_URL, "äåöë etc...", "aaoe etc...");
echo $good_URL;

->output: www.maao.com :)

- >输出:www.maao.com:)

#9


0  

If intl php extension is enabled, you can use Transliterator like this :

如果启用了intl php扩展,可以使用如下的音译器:

protected function removeDiacritics($string)
{
    $transliterator = \Transliterator::create('NFD; [:Nonspacing Mark:] Remove; NFC;');
    return $transliterator->transliterate($string);
}

To remove other special chars (not diacritics only like 'æ')

删除其他特殊字符(不区分标志只喜欢“æ”)

protected function removeDiacritics($string)
{
    $transliterator = \Transliterator::createFromRules(
        ':: Any-Latin; :: Latin-ASCII; :: NFD; :: [:Nonspacing Mark:] Remove; :: NFC;',
        \Transliterator::FORWARD
    );
    return $transliterator->transliterate($string);
}