Possible Duplicate:
PHP: Replace umlauts with closest 7-bit ASCII equivalent in an UTF-8 string可能重复:PHP:用UTF-8字符串中最接近的7位ASCII等效替换变音符号
I will be dealing with an external source that may give me data back like : Tōkyō à á â ã
我将处理一个可能给我数据的外部资源:Tōkyōàáâ
is there a way to convert the fancy characters to standard a-z A-Z Tokyo a a a a
有没有办法将花哨的角色转换成标准的a-z A-*京a a a a
If there are other characters that don't match with any alphabet, they could be ignored.
如果有其他字符与任何字母不匹配,则可以忽略它们。
Is a large regex with all the from
and to
values the only way, or, is there a simpler way to go about it?
是一个大的正则表达式,所有的from和to值是唯一的方法,或者,有更简单的方法吗?
1 个解决方案
#1
3
Something like this (taken from the Symphony CMS project) should get you started:
这样的事情(取自Symphony CMS项目)应该让你开始:
$transliterations = array(
// Alphabetical
'/À/' => 'A', '/Á/' => 'A', '/Â/' => 'A', '/Ã/' => 'A', '/Ä/' => 'Ae',
'/Å/' => 'A', '/Ā/' => 'A', '/Ą/' => 'A', '/Ă/' => 'A', '/Æ/' => 'Ae',
'/Ç/' => 'C', '/Ć/' => 'C', '/Č/' => 'C', '/Ĉ/' => 'C', '/Ċ/' => 'C',
'/Ď/' => 'D', '/Đ/' => 'D', '/Ð/' => 'D', '/È/' => 'E', '/É/' => 'E',
'/Ê/' => 'E', '/Ë/' => 'E', '/Ē/' => 'E', '/Ę/' => 'E', '/Ě/' => 'E',
'/Ĕ/' => 'E', '/Ė/' => 'E', '/Ĝ/' => 'G', '/Ğ/' => 'G', '/Ġ/' => 'G',
'/Ģ/' => 'G', '/Ĥ/' => 'H', '/Ħ/' => 'H', '/Ì/' => 'I', '/Í/' => 'I',
'/Î/' => 'I', '/Ï/' => 'I', '/Ī/' => 'I', '/Ĩ/' => 'I', '/Ĭ/' => 'I',
'/Į/' => 'I', '/İ/' => 'I', '/IJ/' => 'Ij', '/Ĵ/' => 'J', '/Ķ/' => 'K',
'/Ł/' => 'L', '/Ľ/' => 'L', '/Ĺ/' => 'L', '/Ļ/' => 'L', '/Ŀ/' => 'L',
'/Ñ/' => 'N', '/Ń/' => 'N', '/Ň/' => 'N', '/Ņ/' => 'N', '/Ŋ/' => 'N',
'/Ò/' => 'O', '/Ó/' => 'O', '/Ô/' => 'O', '/Õ/' => 'O', '/Ö/' => 'Oe',
'/Ø/' => 'O', '/Ō/' => 'O', '/Ő/' => 'O', '/Ŏ/' => 'O', '/Œ/' => 'Oe',
'/Ŕ/' => 'R', '/Ř/' => 'R', '/Ŗ/' => 'R', '/Ś/' => 'S', '/Š/' => 'S',
'/Ş/' => 'S', '/Ŝ/' => 'S', '/Ș/' => 'S', '/Ť/' => 'T', '/Ţ/' => 'T',
'/Ŧ/' => 'T', '/Ț/' => 'T', '/Ù/' => 'U', '/Ú/' => 'U', '/Û/' => 'U',
'/Ü/' => 'Ue', '/Ū/' => 'U', '/Ů/' => 'U', '/Ű/' => 'U', '/Ŭ/' => 'U',
'/Ũ/' => 'U', '/Ų/' => 'U', '/Ŵ/' => 'W', '/Ý/' => 'Y', '/Ŷ/' => 'Y',
'/Ÿ/' => 'Y', '/Y/' => 'Y', '/Ź/' => 'Z', '/Ž/' => 'Z', '/Ż/' => 'Z',
'/Þ/' => 'T',
'/à/' => 'a', '/á/' => 'a', '/â/' => 'a', '/ã/' => 'a', '/ä/' => 'ae',
'/å/' => 'a', '/ā/' => 'a', '/ą/' => 'a', '/ă/' => 'a', '/æ/' => 'ae',
'/ç/' => 'c', '/ć/' => 'c', '/č/' => 'c', '/ĉ/' => 'c', '/ċ/' => 'c',
'/ď/' => 'd', '/đ/' => 'd', '/ð/' => 'd', '/è/' => 'e', '/é/' => 'e',
'/ê/' => 'e', '/ë/' => 'e', '/ē/' => 'e', '/ę/' => 'e', '/ě/' => 'e',
'/ĕ/' => 'e', '/ė/' => 'e', '/ĝ/' => 'g', '/ğ/' => 'g', '/ġ/' => 'g',
'/ģ/' => 'g', '/ĥ/' => 'h', '/ħ/' => 'h', '/ì/' => 'i', '/í/' => 'i',
'/î/' => 'i', '/ï/' => 'i', '/ī/' => 'i', '/ĩ/' => 'i', '/ĭ/' => 'i',
'/į/' => 'i', '/ı/' => 'i', '/ij/' => 'ij', '/ĵ/' => 'j', '/ķ/' => 'k',
'/ł/' => 'l', '/ľ/' => 'l', '/ĺ/' => 'l', '/ļ/' => 'l', '/ŀ/' => 'l',
'/ñ/' => 'n', '/ń/' => 'n', '/ň/' => 'n', '/ņ/' => 'n', '/ŋ/' => 'n',
'/ò/' => 'o', '/ó/' => 'o', '/ô/' => 'o', '/õ/' => 'o', '/ö/' => 'oe',
'/ø/' => 'o', '/ō/' => 'o', '/ő/' => 'o', '/ŏ/' => 'o', '/œ/' => 'oe',
'/ŕ/' => 'r', '/ř/' => 'r', '/ŗ/' => 'r', '/ś/' => 's', '/š/' => 's',
'/ş/' => 's', '/ŝ/' => 's', '/ș/' => 's', '/ť/' => 't', '/ţ/' => 't',
'/ŧ/' => 't', '/ț/' => 't', '/ù/' => 'u', '/ú/' => 'u', '/û/' => 'u',
'/ü/' => 'ue', '/ū/' => 'u', '/ů/' => 'u', '/ű/' => 'u', '/ŭ/' => 'u',
'/ũ/' => 'u', '/ų/' => 'u', '/ŵ/' => 'w', '/ý/' => 'y', '/ŷ/' => 'y',
'/ÿ/' => 'y', '/y/' => 'y', '/ź/' => 'z', '/ž/' => 'z', '/ż/' => 'z',
'/þ/' => 't', '/ß/' => 'ss', '/ſ/' => 'ss', '/ƒ/' => 'f', '/ĸ/' => 'k',
'/ʼn/' => 'n',
// Symbolic
'/\(/' => null, '/\)/' => null, '/,/' => null,
'/–/' => '-', '/-/' => '-', '/„/' => '"',
'/“/' => '"', '/”/' => '"', '/—/' => '-',
'/¿/' => null, '/‽/' => null, '/¡/' => null,
// Ampersands
'/©/' => 'c',
'/^&(?!&)$/' => 'and',
'/^&(?!&)/' => 'and-',
'/&(?!&)&/' => '-and',
'/&(?!&)/' => '-and-',
);
You can also use iconv
, but this isn't flawlessly, Ü
for example, will get returned as "U
, while it should get returned as Ue
.
你也可以使用iconv,但这并不是完美无缺的,例如,它会以“U”的形式返回,而它应该以Ue的形式返回。
#1
3
Something like this (taken from the Symphony CMS project) should get you started:
这样的事情(取自Symphony CMS项目)应该让你开始:
$transliterations = array(
// Alphabetical
'/À/' => 'A', '/Á/' => 'A', '/Â/' => 'A', '/Ã/' => 'A', '/Ä/' => 'Ae',
'/Å/' => 'A', '/Ā/' => 'A', '/Ą/' => 'A', '/Ă/' => 'A', '/Æ/' => 'Ae',
'/Ç/' => 'C', '/Ć/' => 'C', '/Č/' => 'C', '/Ĉ/' => 'C', '/Ċ/' => 'C',
'/Ď/' => 'D', '/Đ/' => 'D', '/Ð/' => 'D', '/È/' => 'E', '/É/' => 'E',
'/Ê/' => 'E', '/Ë/' => 'E', '/Ē/' => 'E', '/Ę/' => 'E', '/Ě/' => 'E',
'/Ĕ/' => 'E', '/Ė/' => 'E', '/Ĝ/' => 'G', '/Ğ/' => 'G', '/Ġ/' => 'G',
'/Ģ/' => 'G', '/Ĥ/' => 'H', '/Ħ/' => 'H', '/Ì/' => 'I', '/Í/' => 'I',
'/Î/' => 'I', '/Ï/' => 'I', '/Ī/' => 'I', '/Ĩ/' => 'I', '/Ĭ/' => 'I',
'/Į/' => 'I', '/İ/' => 'I', '/IJ/' => 'Ij', '/Ĵ/' => 'J', '/Ķ/' => 'K',
'/Ł/' => 'L', '/Ľ/' => 'L', '/Ĺ/' => 'L', '/Ļ/' => 'L', '/Ŀ/' => 'L',
'/Ñ/' => 'N', '/Ń/' => 'N', '/Ň/' => 'N', '/Ņ/' => 'N', '/Ŋ/' => 'N',
'/Ò/' => 'O', '/Ó/' => 'O', '/Ô/' => 'O', '/Õ/' => 'O', '/Ö/' => 'Oe',
'/Ø/' => 'O', '/Ō/' => 'O', '/Ő/' => 'O', '/Ŏ/' => 'O', '/Œ/' => 'Oe',
'/Ŕ/' => 'R', '/Ř/' => 'R', '/Ŗ/' => 'R', '/Ś/' => 'S', '/Š/' => 'S',
'/Ş/' => 'S', '/Ŝ/' => 'S', '/Ș/' => 'S', '/Ť/' => 'T', '/Ţ/' => 'T',
'/Ŧ/' => 'T', '/Ț/' => 'T', '/Ù/' => 'U', '/Ú/' => 'U', '/Û/' => 'U',
'/Ü/' => 'Ue', '/Ū/' => 'U', '/Ů/' => 'U', '/Ű/' => 'U', '/Ŭ/' => 'U',
'/Ũ/' => 'U', '/Ų/' => 'U', '/Ŵ/' => 'W', '/Ý/' => 'Y', '/Ŷ/' => 'Y',
'/Ÿ/' => 'Y', '/Y/' => 'Y', '/Ź/' => 'Z', '/Ž/' => 'Z', '/Ż/' => 'Z',
'/Þ/' => 'T',
'/à/' => 'a', '/á/' => 'a', '/â/' => 'a', '/ã/' => 'a', '/ä/' => 'ae',
'/å/' => 'a', '/ā/' => 'a', '/ą/' => 'a', '/ă/' => 'a', '/æ/' => 'ae',
'/ç/' => 'c', '/ć/' => 'c', '/č/' => 'c', '/ĉ/' => 'c', '/ċ/' => 'c',
'/ď/' => 'd', '/đ/' => 'd', '/ð/' => 'd', '/è/' => 'e', '/é/' => 'e',
'/ê/' => 'e', '/ë/' => 'e', '/ē/' => 'e', '/ę/' => 'e', '/ě/' => 'e',
'/ĕ/' => 'e', '/ė/' => 'e', '/ĝ/' => 'g', '/ğ/' => 'g', '/ġ/' => 'g',
'/ģ/' => 'g', '/ĥ/' => 'h', '/ħ/' => 'h', '/ì/' => 'i', '/í/' => 'i',
'/î/' => 'i', '/ï/' => 'i', '/ī/' => 'i', '/ĩ/' => 'i', '/ĭ/' => 'i',
'/į/' => 'i', '/ı/' => 'i', '/ij/' => 'ij', '/ĵ/' => 'j', '/ķ/' => 'k',
'/ł/' => 'l', '/ľ/' => 'l', '/ĺ/' => 'l', '/ļ/' => 'l', '/ŀ/' => 'l',
'/ñ/' => 'n', '/ń/' => 'n', '/ň/' => 'n', '/ņ/' => 'n', '/ŋ/' => 'n',
'/ò/' => 'o', '/ó/' => 'o', '/ô/' => 'o', '/õ/' => 'o', '/ö/' => 'oe',
'/ø/' => 'o', '/ō/' => 'o', '/ő/' => 'o', '/ŏ/' => 'o', '/œ/' => 'oe',
'/ŕ/' => 'r', '/ř/' => 'r', '/ŗ/' => 'r', '/ś/' => 's', '/š/' => 's',
'/ş/' => 's', '/ŝ/' => 's', '/ș/' => 's', '/ť/' => 't', '/ţ/' => 't',
'/ŧ/' => 't', '/ț/' => 't', '/ù/' => 'u', '/ú/' => 'u', '/û/' => 'u',
'/ü/' => 'ue', '/ū/' => 'u', '/ů/' => 'u', '/ű/' => 'u', '/ŭ/' => 'u',
'/ũ/' => 'u', '/ų/' => 'u', '/ŵ/' => 'w', '/ý/' => 'y', '/ŷ/' => 'y',
'/ÿ/' => 'y', '/y/' => 'y', '/ź/' => 'z', '/ž/' => 'z', '/ż/' => 'z',
'/þ/' => 't', '/ß/' => 'ss', '/ſ/' => 'ss', '/ƒ/' => 'f', '/ĸ/' => 'k',
'/ʼn/' => 'n',
// Symbolic
'/\(/' => null, '/\)/' => null, '/,/' => null,
'/–/' => '-', '/-/' => '-', '/„/' => '"',
'/“/' => '"', '/”/' => '"', '/—/' => '-',
'/¿/' => null, '/‽/' => null, '/¡/' => null,
// Ampersands
'/©/' => 'c',
'/^&(?!&)$/' => 'and',
'/^&(?!&)/' => 'and-',
'/&(?!&)&/' => '-and',
'/&(?!&)/' => '-and-',
);
You can also use iconv
, but this isn't flawlessly, Ü
for example, will get returned as "U
, while it should get returned as Ue
.
你也可以使用iconv,但这并不是完美无缺的,例如,它会以“U”的形式返回,而它应该以Ue的形式返回。