I have a huge list of URL's, in the format:
我有一个庞大的URL列表,格式如下:
- http://www.example.com/dest/uk/bath/
- http://www.example.com/dest/aus/sydney/
- http://www.example.com/dest/aus/
- http://www.example.com/dest/uk/
- http://www.example.com/dest/nor/
What RegEx could I use to get the last three URL's, but miss the first two, so that every URL without a city attached is given, but the ones with cities are denied?
我可以使用什么RegEx来获取最后三个URL,但是错过前两个URL,以便给出没有附加城市的每个URL,但是城市的URL被拒绝了?
Note: I am using Google Analytics, so I need to use RegEx's to monitor my URL's with their advanced feature. As of right now Google is rejecting each regular expression.
注意:我使用的是Google Analytics,因此我需要使用RegEx来监控我的网址及其高级功能。截至目前,谷歌拒绝接受每个正则表达式。
4 个解决方案
#1
tj111's current solution doesn't work - it matches all your urls.
tj111的当前解决方案不起作用 - 它匹配您的所有网址。
Here's one that works (and I checked with your values). It also matches, no matter if there is a trailing slash or not:
这是一个有效的(我检查了你的价值观)。它也匹配,无论是否有斜杠:
http:\/\/.*dest\/\w+/?$
#2
Generally, the best suggestion I can make for parsing URL's with a Regex is don't.
通常,我可以使用正则表达式解析URL的最佳建议是。
Your time is much much better spent finding a libary that exists for your language dedicated to the task of processing URLs.
花在查找专用于处理URL任务的语言的库中,花费的时间要好得多。
It will have worked out all the edge cases, be fully RFC compliant, be bug free, secure, and have a great user interface so you can just suck out the bits you really want.
它将解决所有边缘情况,完全符合RFC,无错误,安全,并且具有出色的用户界面,因此您可以吸取您真正想要的位。
In your case, the suggested way to process it would be, using your URL library, extract the element s and then work explicitly on them.
在您的情况下,建议的处理方法是使用您的URL库提取元素,然后明确地处理它们。
That way, at most you'll have to deal with the path on its own, and not have to worry so much wether its
这样,至多你必须自己处理这条路径,而不必担心它的问题
http://site.com/
https://site.com/
http://site.com:80/
http://www.site.com/
Unless you really want to.
除非你真的想要。
For the "Path" you might even wish to use a splitter ( or a dedicated path parser ) to tokenise the path into elements first just to be sure.
对于“Path”,您甚至可能希望使用拆分器(或专用路径解析器)来首先将路径标记为元素以确保。
#3
/http:\/\/www\.site\.com\/dest\/\w+\/?$/i
matches if they're all the same site with the "dest" there. you could also do this:
匹配,如果他们都在那里与“dest”相同的网站。你也可以这样做:
/\w+:\/\/[^/]+\/dest\/\w+\/?$/i
which will match any site with any protocal (http,ftp) and any site with the /dest/country at the end, and an optional /
这将匹配任何网站与任何protocal(http,ftp)和任何最终/ dest / country的网站,以及一个可选的/
Note, that this will only work with a subset of what the urls could legitimately be.
请注意,这只适用于网址可以合法使用的子集。
#4
Try this regular expression:
试试这个正则表达式:
^http://www\.example\.com/dest/[^/]+/$
This would only match the last three URLs.
这只会匹配最后三个网址。
#1
tj111's current solution doesn't work - it matches all your urls.
tj111的当前解决方案不起作用 - 它匹配您的所有网址。
Here's one that works (and I checked with your values). It also matches, no matter if there is a trailing slash or not:
这是一个有效的(我检查了你的价值观)。它也匹配,无论是否有斜杠:
http:\/\/.*dest\/\w+/?$
#2
Generally, the best suggestion I can make for parsing URL's with a Regex is don't.
通常,我可以使用正则表达式解析URL的最佳建议是。
Your time is much much better spent finding a libary that exists for your language dedicated to the task of processing URLs.
花在查找专用于处理URL任务的语言的库中,花费的时间要好得多。
It will have worked out all the edge cases, be fully RFC compliant, be bug free, secure, and have a great user interface so you can just suck out the bits you really want.
它将解决所有边缘情况,完全符合RFC,无错误,安全,并且具有出色的用户界面,因此您可以吸取您真正想要的位。
In your case, the suggested way to process it would be, using your URL library, extract the element s and then work explicitly on them.
在您的情况下,建议的处理方法是使用您的URL库提取元素,然后明确地处理它们。
That way, at most you'll have to deal with the path on its own, and not have to worry so much wether its
这样,至多你必须自己处理这条路径,而不必担心它的问题
http://site.com/
https://site.com/
http://site.com:80/
http://www.site.com/
Unless you really want to.
除非你真的想要。
For the "Path" you might even wish to use a splitter ( or a dedicated path parser ) to tokenise the path into elements first just to be sure.
对于“Path”,您甚至可能希望使用拆分器(或专用路径解析器)来首先将路径标记为元素以确保。
#3
/http:\/\/www\.site\.com\/dest\/\w+\/?$/i
matches if they're all the same site with the "dest" there. you could also do this:
匹配,如果他们都在那里与“dest”相同的网站。你也可以这样做:
/\w+:\/\/[^/]+\/dest\/\w+\/?$/i
which will match any site with any protocal (http,ftp) and any site with the /dest/country at the end, and an optional /
这将匹配任何网站与任何protocal(http,ftp)和任何最终/ dest / country的网站,以及一个可选的/
Note, that this will only work with a subset of what the urls could legitimately be.
请注意,这只适用于网址可以合法使用的子集。
#4
Try this regular expression:
试试这个正则表达式:
^http://www\.example\.com/dest/[^/]+/$
This would only match the last three URLs.
这只会匹配最后三个网址。