解析带有花括号的URI, URI::InvalidURIError: bad URI(不是URI吗?)

时间:2022-06-12 20:40:51

Using ruby 1.9.2-p290. I came across an issue trying to parse a URI like the following:

使用ruby 1.9.2-p290。我遇到一个问题,试图解析一个URI,如下所示:

require 'uri'
my_uri = "http://www.anyserver.com/getdata?anyparameter={330C-B5A2}"
the_uri = URI.parse(my_uri)

issuing the following error:

发出以下错误:

URI::InvalidURIError: bad URI(is not URI?)

I require a different solution than encoding the curly braces every time like this:

我需要一种不同于每次像这样编码大括号的解决方案:

new_uri = URI.encode("http://www.anyserver.com/getdata?anyparameter={330C-B5A2}")
=> "http://www.anyserver.com/getdata?anyparameter=%7B330C-B5A2%7D"

Now I can parse the new_uri as usual, but had to do this every time I needed it. What is the simplest way to achieve this without doing it every time?

现在我可以像往常一样解析new_uri,但每次需要时都必须这样做。没有每次都这样做,最简单的方法是什么?

I post my own solution as I hadn't seen this exactly as I solved it.

我发布了我自己的解决方案,因为我并没有看到确切的解决方案。


# Accepts URIs when they contain curly braces
# This overrides the DEFAULT_PARSER with the UNRESERVED key, including '{' and '}'
module URI
  def self.parse(uri)
    URI::Parser.new(:UNRESERVED => URI::REGEXP::PATTERN::UNRESERVED + "\{\}").parse(uri)
  end
end

Now I can use URI.parse(uri) with uri containing curly braces and no error is thrown.

现在,我可以使用uri(包含花括号的uri)的uril .parse(uri),不会抛出错误。

2 个解决方案

#1


5  

# Need to not fail when uri contains curly braces
# This overrides the DEFAULT_PARSER with the UNRESERVED key, including '{' and '}'
# DEFAULT_PARSER is used everywhere, so its better to override it once
module URI
  remove_const :DEFAULT_PARSER
  unreserved = REGEXP::PATTERN::UNRESERVED
  DEFAULT_PARSER = Parser.new(:UNRESERVED => unreserved + "\{\}")
end

Following up the same issue, since DEFAULT_PARSER is used everywhere, its better to substitute it completely insted of just for the URI#parse method. Additionally this avoids allocating memory for the instantiation of a new Parser object every time.

跟踪相同的问题,因为DEFAULT_PARSER在任何地方都使用,最好将它完全替换为URI#parse方法。此外,这避免每次都为新解析器对象的实例化分配内存。

#2


2  

RFC 1738 - http://www.faqs.org/rfcs/rfc1738.html means that you do have to encode the braces

RFC 1738 - http://www.faqs.org/rfcs/rfc1738.html意味着你必须对大括号进行编码

Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.

#1


5  

# Need to not fail when uri contains curly braces
# This overrides the DEFAULT_PARSER with the UNRESERVED key, including '{' and '}'
# DEFAULT_PARSER is used everywhere, so its better to override it once
module URI
  remove_const :DEFAULT_PARSER
  unreserved = REGEXP::PATTERN::UNRESERVED
  DEFAULT_PARSER = Parser.new(:UNRESERVED => unreserved + "\{\}")
end

Following up the same issue, since DEFAULT_PARSER is used everywhere, its better to substitute it completely insted of just for the URI#parse method. Additionally this avoids allocating memory for the instantiation of a new Parser object every time.

跟踪相同的问题,因为DEFAULT_PARSER在任何地方都使用,最好将它完全替换为URI#parse方法。此外,这避免每次都为新解析器对象的实例化分配内存。

#2


2  

RFC 1738 - http://www.faqs.org/rfcs/rfc1738.html means that you do have to encode the braces

RFC 1738 - http://www.faqs.org/rfcs/rfc1738.html意味着你必须对大括号进行编码

Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.