Possible Duplicates:
Regex to match URL
regex to remove the webpage part of a url in ruby可能重复:正则表达式匹配URL正则表达式删除ruby中网址的网页部分
I am in search of a regular expression for parsing all the urls in a file.
i tried many of the regular expression i got after googling but it fails in one or the other case . my idea is to write one which checks the presense of http or https at the begening and it will match everything untill it sees a blank space .
any ideas ?
NOTE : i dont need to parse the url but erase all the urls from a file or atleast make it unreadable .
我正在寻找一个正则表达式来解析文件中的所有url。我尝试了谷歌搜索后获得的许多正则表达式,但在一个或另一个案例中失败了。我的想法是写一个在begening检查http或https的presense,它将匹配所有内容,直到它看到一个空格。有任何想法吗 ?注意:我不需要解析网址但删除文件中的所有网址或至少使其无法读取。
2 个解决方案
#1
18
You can try this:
你可以试试这个:
/https?:\/\/[\S]+/
The \S
means any non-whitespace character.
\ S表示任何非空白字符。
(Rubular)
#2
52
The standard URI library provides URI.regexp
which is the regular expression for url string.
标准URI库提供URI.regexp,它是url字符串的正则表达式。
require 'uri'
string.scan(URI.regexp)
http://ruby-doc.org/stdlib/libdoc/uri/rdoc/index.html
http://ruby-doc.org/stdlib/libdoc/uri/rdoc/index.html
#1
18
You can try this:
你可以试试这个:
/https?:\/\/[\S]+/
The \S
means any non-whitespace character.
\ S表示任何非空白字符。
(Rubular)
#2
52
The standard URI library provides URI.regexp
which is the regular expression for url string.
标准URI库提供URI.regexp,它是url字符串的正则表达式。
require 'uri'
string.scan(URI.regexp)
http://ruby-doc.org/stdlib/libdoc/uri/rdoc/index.html
http://ruby-doc.org/stdlib/libdoc/uri/rdoc/index.html