如何在Ruby中解析已编码的URI ?

时间:2022-08-05 19:59:11

I'm trying to parse a URI that has brackets - [ and ] - in it. I have tried to parse this directly with URI.parse but the brackets cause this to fail. I therefore tried to encode the URI with CGI::escape which takes care of the brackets but when I try to parse this encoded URI with URI.parse it doesn't seem to recognise it as a URI and puts the entire URI into the path object.

我正在解析一个URI,其中有括号[和]。我尝试用URI直接解析它。解析,但括号导致此操作失败。因此,我尝试用CGI (CGI::escape)对URI进行编码,它负责处理括号,但当我尝试解析这个URI编码的URI时。解析它似乎没有将其识别为URI,并将整个URI放入path对象中。

To demonstrate in an irb session;

在irb会议上进行演示;

irb(main):001:0> require 'uri'
=> true
irb(main):002:0> require 'cgi'
=> true
irb(main):003:0> name = "http://www.website.com/dir1/dir[2]/file.txt"
=> "http://www.website.com/dir1/dir[2]/file.txt"
irb(main):004:0> encoded_name = CGI::escape(name)
=> "http%3A%2F%2Fwww.website.com%2Fdir1%2Fdir%5B2%5D%2Ffile.txt"
irb(main):005:0> parsed_name = URI.parse(encoded_name)
=> #<URI::Generic:0x00000001e8f520 URL:http%3A%2F%2Fwww.website.com%2Fdir1%2Fdir%5B2%5D%2Ffile.txt>
irb(main):006:0> parsed_name.scheme
=> nil
irb(main):007:0> parsed_name.host
=> nil
irb(main):008:0> parsed_name.path
=> "http%3A%2F%2Fwww.website.com%2Fdir1%2Fdir%5B2%5D%2Ffile.txt"
irb(main):009:0> URI.split(encoded_name)
=> [nil, nil, nil, nil, nil, "http%3A%2F%2Fwww.website.com%2Fdir1%2Fdir%5B2%5D%2Ffile.txt", nil, nil, nil]

Anyway, my work around at the moment is the following ugly, but effective, hack

不管怎样,我目前的工作是以下丑陋但有效的黑客

encoded_name = name.gsub(/\[/,"%5B").gsub(/\]/,"%5D")

Parsing this with URI.parse produces the desired result but won't cope if other strange characters find their way into my URIs. So my question is, is there a solid way of doing this that won't fall down?

解析这个URI。parse生成了所需的结果,但是如果其他奇怪的字符进入我的uri,它就无法处理。所以我的问题是,是否有一种可靠的方法可以做到这一点?

2 个解决方案

#1


5  

The problem lies in trying to apply CGI::escape to the whole URI. When you do that, you lose the front part of the URI that holds the scheme and the URI parser gets lost after that. You may want to try something based on mtyaka's answer:

问题在于尝试应用CGI::转义到整个URI。当您这样做时,您将丢失包含该方案的URI的前端部分,然后将丢失URI解析器。你可能想试试mtyaka的回答:

irb(main):015:0> encoded_name = URI.encode(name, '[]')
=> "http://www.website.com/dir1/dir%5B2%5D/file.txt"
irb(main):016:0> parsed_name = URI.parse(encoded_name)
=> #<URI::HTTP:0xb76ff358 URL:http://www.website.com/dir1/dir%5B2%5D/file.txt>
irb(main):017:0> parsed_name.scheme
=> "http"
irb(main):018:0> parsed_name.host
=> "www.website.com"
irb(main):019:0> parsed_name.path
=> "/dir1/dir%5B2%5D/file.txt"

To get the original path, just URI.decode whatever you get from parsed_name.path.

要获得原始路径,只需使用URI.decode获取parsed_name.path中的任何内容。

#2


3  

You could use URI.encode:

您可以使用URI.encode:

encoded_name = URI.encode(name, '[]')

#1


5  

The problem lies in trying to apply CGI::escape to the whole URI. When you do that, you lose the front part of the URI that holds the scheme and the URI parser gets lost after that. You may want to try something based on mtyaka's answer:

问题在于尝试应用CGI::转义到整个URI。当您这样做时,您将丢失包含该方案的URI的前端部分,然后将丢失URI解析器。你可能想试试mtyaka的回答:

irb(main):015:0> encoded_name = URI.encode(name, '[]')
=> "http://www.website.com/dir1/dir%5B2%5D/file.txt"
irb(main):016:0> parsed_name = URI.parse(encoded_name)
=> #<URI::HTTP:0xb76ff358 URL:http://www.website.com/dir1/dir%5B2%5D/file.txt>
irb(main):017:0> parsed_name.scheme
=> "http"
irb(main):018:0> parsed_name.host
=> "www.website.com"
irb(main):019:0> parsed_name.path
=> "/dir1/dir%5B2%5D/file.txt"

To get the original path, just URI.decode whatever you get from parsed_name.path.

要获得原始路径,只需使用URI.decode获取parsed_name.path中的任何内容。

#2


3  

You could use URI.encode:

您可以使用URI.encode:

encoded_name = URI.encode(name, '[]')