Ruby正则表达式匹配url [重复]

时间:2021-01-03 15:22:00

Possible Duplicates:
Regex to match URL
regex to remove the webpage part of a url in ruby

可能重复:正则表达式匹配URL正则表达式删除ruby中网址的网页部分

I am in search of a regular expression for parsing all the urls in a file.
i tried many of the regular expression i got after googling but it fails in one or the other case . my idea is to write one which checks the presense of http or https at the begening and it will match everything untill it sees a blank space .
any ideas ?
NOTE : i dont need to parse the url but erase all the urls from a file or atleast make it unreadable .

我正在寻找一个正则表达式来解析文件中的所有url。我尝试了谷歌搜索后获得的许多正则表达式,但在一个或另一个案例中失败了。我的想法是写一个在begening检查http或https的presense,它将匹配所有内容,直到它看到一个空格。有任何想法吗 ?注意:我不需要解析网址但删除文件中的所有网址或至少使其无法读取。

2 个解决方案

#1


18  

You can try this:

你可以试试这个:

/https?:\/\/[\S]+/

The \S means any non-whitespace character.

\ S表示任何非空白字符。

(Rubular)

(Rubular)

#2


52  

The standard URI library provides URI.regexp which is the regular expression for url string.

标准URI库提供URI.regexp,它是url字符串的正则表达式。

 require 'uri'
 string.scan(URI.regexp)

http://ruby-doc.org/stdlib/libdoc/uri/rdoc/index.html

http://ruby-doc.org/stdlib/libdoc/uri/rdoc/index.html

#1


18  

You can try this:

你可以试试这个:

/https?:\/\/[\S]+/

The \S means any non-whitespace character.

\ S表示任何非空白字符。

(Rubular)

(Rubular)

#2


52  

The standard URI library provides URI.regexp which is the regular expression for url string.

标准URI库提供URI.regexp,它是url字符串的正则表达式。

 require 'uri'
 string.scan(URI.regexp)

http://ruby-doc.org/stdlib/libdoc/uri/rdoc/index.html

http://ruby-doc.org/stdlib/libdoc/uri/rdoc/index.html