网址中是否允许使用方括号?

时间:2022-01-12 21:42:30

Are square brackets in URLs allowed?

是否允许使用URL中的方括号?

I noticed that Apache commons HttpClient (3.0.1) throws an IOException, wget and Firefox however accept square brackets.

我注意到Apache commons HttpClient(3.0.1)抛出IOException,wget和Firefox接受方括号。

URL example:

http://example.com/path/to/file[3].html

My HTTP client encounters such URLs but I'm not sure whether to patch the code or to throw an exception (as it actually should be).

我的HTTP客户端遇到这样的URL,但我不确定是要修补代码还是抛出异常(实际应该是这样)。

10 个解决方案

#1


33  

RFC 3986 states

RFC 3986声明

A host identified by an Internet Protocol literal address, version 6 [RFC3513] or later, is distinguished by enclosing the IP literal within square brackets ("[" and "]"). This is the only place where square bracket characters are allowed in the URI syntax.

由Internet协议文字地址(版本6 [RFC3513]或更高版本)标识的主机通过将IP文本括在方括号(“[”和“]”)中来区分。这是URI语法中唯一允许使用方括号字符的位置。

So you should not be seeing such URI's in the wild in theory, as they should arrive encoded.

所以你不应该在理论上看到这样的URI,因为它们应该到达编码。

#2


8  

I know this question is a bit old, but I just wanted to note that PHP uses brackets to pass arrays in a URL.

我知道这个问题有点旧,但我只想注意PHP使用括号来传递URL中的数组。

http://www.example.com/foo.php?bar[]=1&bar[]=2&bar[]=3

In this case $_GET['bar'] will contain array(1, 2, 3).

在这种情况下,$ _GET ['bar']将包含数组(1,2,3)。

#3


5  

Any browser or web-enabled software that accepts URLs and is not throwing an exception when special characters are introduced is almost guaranteed to be encoding the special characters behind the scenes. Curly brackets, square brackets, spaces, etc all have special encoded ways of representing them so as not to produce conflicts. As per the previous answers, the safest way to deal with these is to URL-encode them before handing them off to something that will try to resolve the URL.

任何浏览器或支持Web的软件接受URL并且在引入特殊字符时不会引发异常几乎可以保证在幕后编码特殊字符。卷括号,方括号,空格等都有特殊的编码方式来表示它们,以免产生冲突。根据之前的答案,处理这些问题最安全的方法是对它们进行URL编码,然后再将它们交给试图解析URL的内容。

#4


3  

Pretty much the only characters not allowed in pathnames are # and ? as they signify the end of the path.

几乎唯一不允许在路径名中使用的字符是#和?因为它们意味着路径的终点。

The uri rfc will have the definative answer:

uri rfc将有一个明确的答案:

http://www.ietf.org/rfc/rfc1738.txt

Unsafe:

Characters can be unsafe for a number of reasons. The space character is unsafe because significant spaces may disappear and insignificant spaces may be introduced when URLs are transcribed or typeset or subjected to the treatment of word-processing programs. The characters "<" and ">" are unsafe because they are used as the delimiters around URLs in free text; the quote mark (""") is used to delimit URLs in some systems. The character "#" is unsafe and should always be encoded because it is used in World Wide Web and in other systems to delimit a URL from a fragment/anchor identifier that might follow it. The character "%" is unsafe because it is used for encodings of other characters. Other characters are unsafe because gateways and other transport agents are known to sometimes modify such characters. These characters are "{", "}", "|", "\", "^", "~", "[", "]", and "`".

出于多种原因,角色可能不安全。空间字符是不安全的,因为当URL被转录或排版或受到文字处理程序的处理时,重要的空间可能会消失并且可能引入无关紧要的空间。字符“<”和“>”是不安全的,因为它们被用作*文本中URL的分隔符;引号(“”“)用于分隔某些系统中的URL。字符”#“是不安全的,应始终进行编码,因为它在万维网和其他系统中用于从片段/锚点分隔URL可能跟随它标识符。因为它是用于其他字符进行编码的字符“%”是不安全的。其它字符是不安全的,因为网关和其他传输剂已知有时修改这样的字符,这些字符是“{”,“} “,”|“,”\“,”^“,”〜“,”[“,”]“和”`“。

All unsafe characters must always be encoded within a URL. For example, the character "#" must be encoded within URLs even in systems that do not normally deal with fragment or anchor identifiers, so that if the URL is copied into another system that does use them, it will not be necessary to change the URL encoding.

所有不安全的字符必须始终在URL中编码。例如,字符“#”必须在URL中编码,即使在通常不处理片段或锚标识符的系统中也是如此,因此如果将URL复制到另一个使用它们的系统中,则无需更改URL编码。

The answer is that they should be hex encoded, but knowing postel's law, most things will accept them verbatim.

答案是它们应该是十六进制编码,但是知道postel定律,大多数事情都会逐字接受它们。

#5


2  

For using the HttpClient commons class, you want to look into the org.apache.commons.httpclient.util.URIUtil class, specifically the encode() method. Use it to URI-encode the URL before trying to fetch it.

要使用HttpClient commons类,您需要查看org.apache.commons.httpclient.util.URIUtil类,特别是encode()方法。在尝试获取URL之前,使用它对URL进行URI编码。

#6


2  

* seems to not encode them:

*似乎不编码它们:

https://*.com/search?q=square+brackets+[url]

#7


1  

Best to URL encode those, as they are clearly not supported in all web servers. Sometimes, even when there is a standard, not everyone follows it.

最好对URL进行编码,因为它们在所有Web服务器中都不受支持。有时,即使有标准,也不是每个人都遵循它。

#8


1  

According to the URL specification, the square brackets are not valid URL characters.

根据URL规范,方括号不是有效的URL字符。

Here's the relevant snippets:

这是相关的片段:

The "national" and "punctuation" characters do not appear in any productions and therefore may not appear in URLs.
national { | } | vline | [ | ] | \ | ^ | ~
punctuation < | >

“国家”和“标点符号”字符不会出现在任何作品中,因此可能不会出现在网址中。国家{| } | vline | [| ] | \ | ^ | 〜标点符号<| >

#9


1  

Square brackets [ and ] in URLs are not often supported.

通常不支持URL中的方括号[和]。

Replace them by %5B and %5D:

  • Using a command line, the following example is based on bash and sed:

    使用命令行,以下示例基于bash和sed:

    url='http://example.com?day=[0-3][0-9]'
    encoded_url="$( sed 's/\[/%5B/g;s/]/%5D/g' <<< "$url")"
    
  • Using Java URLEncoder.encode(String s, String enc)

    使用Java URLEncoder.encode(String s,String enc)

  • Using PHP rawurlencode() or urlencode()

    使用PHP rawurlencode()或urlencode()

    <?php
    echo '<a href="http://example.com/day/',
        rawurlencode('[0-3][0-9]'), '">';
    ?>
    

    output:

    <a href="http://example.com/day/%5B0-3%5D%5B0-9%5D">
    

    or:

    <?php
    $query_string = 'day=' . urlencode('[0-3][0-9]') .
                    '&month=' . urlencode('[0-1][0-9]');
    echo '<a href="http://example.com?',
          htmlentities($query_string), '">';
    ?>
    
  • Using your favorite programming language... Please extend this answer by posting a comment or editing directly this answer to add the function you use from your programming language ;-)

    使用您最喜欢的编程语言...请通过发表评论或直接编辑此答案来扩展此答案,以便从您的编程语言中添加您使用的功能;-)

For more details, see the RFC 3986 specifying the URL syntax. The Appendix A is about %-encoding in the query string (brackets as belonging to “gen-delims” to be %-encoded).

有关更多详细信息,请参阅指定URL语法的RFC 3986。附录A是关于查询字符串中的%编码(括号中属于“gen-delims”为%编码)。

#10


0  

Square brackets are considered unsafe, but majority of browsers will parse those correctly. Having said that it is better to replace square brackets with some other characters.

方括号被认为是不安全的,但大多数浏览器会正确解析它们。话虽如此,最好用其他字符替换方括号。

#1


33  

RFC 3986 states

RFC 3986声明

A host identified by an Internet Protocol literal address, version 6 [RFC3513] or later, is distinguished by enclosing the IP literal within square brackets ("[" and "]"). This is the only place where square bracket characters are allowed in the URI syntax.

由Internet协议文字地址(版本6 [RFC3513]或更高版本)标识的主机通过将IP文本括在方括号(“[”和“]”)中来区分。这是URI语法中唯一允许使用方括号字符的位置。

So you should not be seeing such URI's in the wild in theory, as they should arrive encoded.

所以你不应该在理论上看到这样的URI,因为它们应该到达编码。

#2


8  

I know this question is a bit old, but I just wanted to note that PHP uses brackets to pass arrays in a URL.

我知道这个问题有点旧,但我只想注意PHP使用括号来传递URL中的数组。

http://www.example.com/foo.php?bar[]=1&bar[]=2&bar[]=3

In this case $_GET['bar'] will contain array(1, 2, 3).

在这种情况下,$ _GET ['bar']将包含数组(1,2,3)。

#3


5  

Any browser or web-enabled software that accepts URLs and is not throwing an exception when special characters are introduced is almost guaranteed to be encoding the special characters behind the scenes. Curly brackets, square brackets, spaces, etc all have special encoded ways of representing them so as not to produce conflicts. As per the previous answers, the safest way to deal with these is to URL-encode them before handing them off to something that will try to resolve the URL.

任何浏览器或支持Web的软件接受URL并且在引入特殊字符时不会引发异常几乎可以保证在幕后编码特殊字符。卷括号,方括号,空格等都有特殊的编码方式来表示它们,以免产生冲突。根据之前的答案,处理这些问题最安全的方法是对它们进行URL编码,然后再将它们交给试图解析URL的内容。

#4


3  

Pretty much the only characters not allowed in pathnames are # and ? as they signify the end of the path.

几乎唯一不允许在路径名中使用的字符是#和?因为它们意味着路径的终点。

The uri rfc will have the definative answer:

uri rfc将有一个明确的答案:

http://www.ietf.org/rfc/rfc1738.txt

Unsafe:

Characters can be unsafe for a number of reasons. The space character is unsafe because significant spaces may disappear and insignificant spaces may be introduced when URLs are transcribed or typeset or subjected to the treatment of word-processing programs. The characters "<" and ">" are unsafe because they are used as the delimiters around URLs in free text; the quote mark (""") is used to delimit URLs in some systems. The character "#" is unsafe and should always be encoded because it is used in World Wide Web and in other systems to delimit a URL from a fragment/anchor identifier that might follow it. The character "%" is unsafe because it is used for encodings of other characters. Other characters are unsafe because gateways and other transport agents are known to sometimes modify such characters. These characters are "{", "}", "|", "\", "^", "~", "[", "]", and "`".

出于多种原因,角色可能不安全。空间字符是不安全的,因为当URL被转录或排版或受到文字处理程序的处理时,重要的空间可能会消失并且可能引入无关紧要的空间。字符“<”和“>”是不安全的,因为它们被用作*文本中URL的分隔符;引号(“”“)用于分隔某些系统中的URL。字符”#“是不安全的,应始终进行编码,因为它在万维网和其他系统中用于从片段/锚点分隔URL可能跟随它标识符。因为它是用于其他字符进行编码的字符“%”是不安全的。其它字符是不安全的,因为网关和其他传输剂已知有时修改这样的字符,这些字符是“{”,“} “,”|“,”\“,”^“,”〜“,”[“,”]“和”`“。

All unsafe characters must always be encoded within a URL. For example, the character "#" must be encoded within URLs even in systems that do not normally deal with fragment or anchor identifiers, so that if the URL is copied into another system that does use them, it will not be necessary to change the URL encoding.

所有不安全的字符必须始终在URL中编码。例如,字符“#”必须在URL中编码,即使在通常不处理片段或锚标识符的系统中也是如此,因此如果将URL复制到另一个使用它们的系统中,则无需更改URL编码。

The answer is that they should be hex encoded, but knowing postel's law, most things will accept them verbatim.

答案是它们应该是十六进制编码,但是知道postel定律,大多数事情都会逐字接受它们。

#5


2  

For using the HttpClient commons class, you want to look into the org.apache.commons.httpclient.util.URIUtil class, specifically the encode() method. Use it to URI-encode the URL before trying to fetch it.

要使用HttpClient commons类,您需要查看org.apache.commons.httpclient.util.URIUtil类,特别是encode()方法。在尝试获取URL之前,使用它对URL进行URI编码。

#6


2  

* seems to not encode them:

*似乎不编码它们:

https://*.com/search?q=square+brackets+[url]

#7


1  

Best to URL encode those, as they are clearly not supported in all web servers. Sometimes, even when there is a standard, not everyone follows it.

最好对URL进行编码,因为它们在所有Web服务器中都不受支持。有时,即使有标准,也不是每个人都遵循它。

#8


1  

According to the URL specification, the square brackets are not valid URL characters.

根据URL规范,方括号不是有效的URL字符。

Here's the relevant snippets:

这是相关的片段:

The "national" and "punctuation" characters do not appear in any productions and therefore may not appear in URLs.
national { | } | vline | [ | ] | \ | ^ | ~
punctuation < | >

“国家”和“标点符号”字符不会出现在任何作品中,因此可能不会出现在网址中。国家{| } | vline | [| ] | \ | ^ | 〜标点符号<| >

#9


1  

Square brackets [ and ] in URLs are not often supported.

通常不支持URL中的方括号[和]。

Replace them by %5B and %5D:

  • Using a command line, the following example is based on bash and sed:

    使用命令行,以下示例基于bash和sed:

    url='http://example.com?day=[0-3][0-9]'
    encoded_url="$( sed 's/\[/%5B/g;s/]/%5D/g' <<< "$url")"
    
  • Using Java URLEncoder.encode(String s, String enc)

    使用Java URLEncoder.encode(String s,String enc)

  • Using PHP rawurlencode() or urlencode()

    使用PHP rawurlencode()或urlencode()

    <?php
    echo '<a href="http://example.com/day/',
        rawurlencode('[0-3][0-9]'), '">';
    ?>
    

    output:

    <a href="http://example.com/day/%5B0-3%5D%5B0-9%5D">
    

    or:

    <?php
    $query_string = 'day=' . urlencode('[0-3][0-9]') .
                    '&month=' . urlencode('[0-1][0-9]');
    echo '<a href="http://example.com?',
          htmlentities($query_string), '">';
    ?>
    
  • Using your favorite programming language... Please extend this answer by posting a comment or editing directly this answer to add the function you use from your programming language ;-)

    使用您最喜欢的编程语言...请通过发表评论或直接编辑此答案来扩展此答案,以便从您的编程语言中添加您使用的功能;-)

For more details, see the RFC 3986 specifying the URL syntax. The Appendix A is about %-encoding in the query string (brackets as belonging to “gen-delims” to be %-encoded).

有关更多详细信息,请参阅指定URL语法的RFC 3986。附录A是关于查询字符串中的%编码(括号中属于“gen-delims”为%编码)。

#10


0  

Square brackets are considered unsafe, but majority of browsers will parse those correctly. Having said that it is better to replace square brackets with some other characters.

方括号被认为是不安全的,但大多数浏览器会正确解析它们。话虽如此,最好用其他字符替换方括号。