For a web application, we need to link to some user generated content. A users types in a title for e.g. a product and we generate an SEO friendly url for that product:
对于web应用程序,我们需要链接到一些用户生成的内容。用户键入一个标题,例如一个产品,我们为该产品生成一个SEO友好的url:
like this
像这样
title: a nice product
www.user.com/product/a-nice-product
title: أبجد هوز
www.user.com/product/أبجد هوز
The problem is that those foreign language url's aren't supported and a browser refuses to open those links. I've seen wordpress setups support that kind of url's so I guess it's possible to do this.
问题是这些外文url不支持,而浏览器拒绝打开这些链接。我看到wordpress设置支持这样的url所以我想这是可能的。
Does anyone know how we should support this in php?
有人知道我们应该如何用php来支持它吗?
wikipedia handles this just fine: http://ar.wikipedia.org
*处理的很好:http://ar.wikipedia.org。
4 个解决方案
#1
6
Although the URL itself only allows US-ASCII characters, you can use Unicode characters in the URI path if you encode them with UTF-8 and then convert them in US-ASCII characters by using the percent-encoding:
虽然URL本身只允许US-ASCII字符,但是如果您用UTF-8编码它们,您可以在URI路径中使用Unicode字符,然后使用百分比编码将它们转换为US-ASCII字符:
A system that internally provides identifiers in the form of a different character encoding, such as EBCDIC, will generally perform character translation of textual identifiers to UTF-8 [STD63] (or some other superset of the US-ASCII character encoding) at an internal interface, thereby providing more meaningful identifiers than those resulting from simply percent-encoding the original octets.
系统内部提供了标识符的形式不同的字符编码,如EBCDIC、通常会执行utf - 8字符翻译文本标识符(STD63)(或其他超集us - ascii字符编码)在一个内部接口,从而提供更有意义的标识符简单的百分比编码产生的比原来的八位字节。
So you can do something like this (assuming UTF-8):
你可以这样做(假设UTF-8):
$title = 'أبجد هوز';
$path = '/product/'.rawurlencode($title);
echo $path; // "/product/%D8%A3%D8%A8%D8%AC%D8%AF%20%D9%87%D9%88%D8%B2"
Although the URI path is actually encoded with the percent-encoding, most modern browsers will display the characters this sequence represents in Unicode when UTF-8 is used.
虽然URI路径实际上是用百分比编码进行编码的,但是大多数现代浏览器将在使用UTF-8时显示该序列表示的字符。
#2
1
You're in trouble I'm afraid. The encoding of the URL is at the discretion of the browser. I've encountered the same problem when trying to support URLs with Norwegian special characters and its simply not consistently possible.
恐怕你有麻烦了。URL的编码是由浏览器决定的。我遇到过同样的问题,在尝试支持带有挪威特殊字符的url时,它根本不可能实现。
You may be able to redirect a browser to the UTF-8 URL, but it might reply to you in ISO. It gets even worse in some cases where browsers (firefox for instance) will mix ISO and UTF-8 formatting in the same url (this happens particularly with get parameters).
您可以将浏览器重定向到UTF-8 URL,但它可能会在ISO中回复您。在某些情况下,浏览器(例如firefox)会在相同的url中混合ISO和UTF-8格式(这在get参数中尤为明显)。
My suggestion is simply; Don't do it, use either English (better SEO too!) or spell it phonetically.
我的建议是;不要这样做,要么用英语(更好的SEO !),要么用拼音拼写。
#3
0
You might need to use IDNA encoding on the non-ASCII portion of the URL.
您可能需要在URL的非ascii部分上使用IDNA编码。
http://en.wikipedia.org/wiki/Internationalized_domain_name
http://en.wikipedia.org/wiki/Internationalized_domain_name
#4
0
You should do urlencode the Arabic or unicode text
您应该使用urlencode阿拉伯语或unicode文本。
urlencode('كلام-عربي')
And its very important to add the charset code to the head tag of the page, otherwise the link will not work
而且,将charset代码添加到页面的head标签非常重要,否则链接将不起作用。
<meta charset="utf-8">
#1
6
Although the URL itself only allows US-ASCII characters, you can use Unicode characters in the URI path if you encode them with UTF-8 and then convert them in US-ASCII characters by using the percent-encoding:
虽然URL本身只允许US-ASCII字符,但是如果您用UTF-8编码它们,您可以在URI路径中使用Unicode字符,然后使用百分比编码将它们转换为US-ASCII字符:
A system that internally provides identifiers in the form of a different character encoding, such as EBCDIC, will generally perform character translation of textual identifiers to UTF-8 [STD63] (or some other superset of the US-ASCII character encoding) at an internal interface, thereby providing more meaningful identifiers than those resulting from simply percent-encoding the original octets.
系统内部提供了标识符的形式不同的字符编码,如EBCDIC、通常会执行utf - 8字符翻译文本标识符(STD63)(或其他超集us - ascii字符编码)在一个内部接口,从而提供更有意义的标识符简单的百分比编码产生的比原来的八位字节。
So you can do something like this (assuming UTF-8):
你可以这样做(假设UTF-8):
$title = 'أبجد هوز';
$path = '/product/'.rawurlencode($title);
echo $path; // "/product/%D8%A3%D8%A8%D8%AC%D8%AF%20%D9%87%D9%88%D8%B2"
Although the URI path is actually encoded with the percent-encoding, most modern browsers will display the characters this sequence represents in Unicode when UTF-8 is used.
虽然URI路径实际上是用百分比编码进行编码的,但是大多数现代浏览器将在使用UTF-8时显示该序列表示的字符。
#2
1
You're in trouble I'm afraid. The encoding of the URL is at the discretion of the browser. I've encountered the same problem when trying to support URLs with Norwegian special characters and its simply not consistently possible.
恐怕你有麻烦了。URL的编码是由浏览器决定的。我遇到过同样的问题,在尝试支持带有挪威特殊字符的url时,它根本不可能实现。
You may be able to redirect a browser to the UTF-8 URL, but it might reply to you in ISO. It gets even worse in some cases where browsers (firefox for instance) will mix ISO and UTF-8 formatting in the same url (this happens particularly with get parameters).
您可以将浏览器重定向到UTF-8 URL,但它可能会在ISO中回复您。在某些情况下,浏览器(例如firefox)会在相同的url中混合ISO和UTF-8格式(这在get参数中尤为明显)。
My suggestion is simply; Don't do it, use either English (better SEO too!) or spell it phonetically.
我的建议是;不要这样做,要么用英语(更好的SEO !),要么用拼音拼写。
#3
0
You might need to use IDNA encoding on the non-ASCII portion of the URL.
您可能需要在URL的非ascii部分上使用IDNA编码。
http://en.wikipedia.org/wiki/Internationalized_domain_name
http://en.wikipedia.org/wiki/Internationalized_domain_name
#4
0
You should do urlencode the Arabic or unicode text
您应该使用urlencode阿拉伯语或unicode文本。
urlencode('كلام-عربي')
And its very important to add the charset code to the head tag of the page, otherwise the link will not work
而且,将charset代码添加到页面的head标签非常重要,否则链接将不起作用。
<meta charset="utf-8">