import restr="x8f8dL:s://www.qqq.zzz/iziv8ds8f8.dafidsao.dsfsi"str2=re.match("[a-zA-Z]*//([a-zA-Z]*)",str)print str2.group()current result=> errorexpected => wwwqqqzzz
I want to extract the string wwwqqqzzz
. How I do that?
我想提取字符串wwwqqqzzz。我怎么做呢?
Maybe there are a lot of dots, such as:
也许有很多点,比如:
"whatever..s#$@.d.:af//wwww.xxx.yn.zsdfsd.asfds.f.ds.fsd.whatever/123.dfiid"
In this case, I basically want the stuff bounded by //
and /
. How do I achieve that?
在这种情况下,我基本上想要以//和/为界的东西。我怎么做到的?
One additional question:
一个额外的问题:
import restr="xxx.yyy.xxx:80"m = re.search(r"([^:]*)", str)str2=m.group(0)print str2str2=m.group(1)print str2
Seems that m.group(0)
and m.group(1)
are the same.
似乎m.g(0)和m.g(1)是一样的。
4 个解决方案
#1
36
match
tries to match the entire string. Use search
instead. The following pattern would then match your requirements:
match试图匹配整个字符串。使用搜索。以下图案将符合您的要求:
m = re.search(r"//([^/]*)", str)print m.group(1)
Basically, we are looking for /
, then consume as many non-slash characters as possible. And those non-slash characters will be captured in group number 1.
基本上,我们正在寻找/,然后使用尽可能多的非斜杠字符。这些非斜杠字符将在组1中被捕获。
In fact, there is a slightly more advanced technique that does the same, but does not require capturing (which is generally time-consuming). It uses a so-called lookbehind:
事实上,有一种稍微高级一点的技术也可以做到这一点,但是不需要捕获(这通常很耗时)。它使用了所谓的“后视镜”:
m = re.search(r"(?<=//)[^/]*", str)print m.group()
Lookarounds are not included in the actual match, hence the desired result.
在实际的匹配中没有包含查找,因此得到了期望的结果。
This (or any other reasonable regex solution) will not remove the .
s immediately. But this can easily be done in a second step:
这个(或任何其他合理的regex解决方案)不会立即删除.s。但这很容易在第二步中实现:
m = re.search(r"(?<=//)[^/]*", str)host = m.group()cleanedHost = host.replace(".", "")
That does not even require regular expressions.
这甚至不需要正则表达式。
Of course, if you want to remove everything except for letters and digits (e.g. to turn www.regular-expressions.info
into wwwregularexpressionsinfo
) then you are better off using the regex version of replace
:
当然,如果您想删除除字母和数字之外的所有内容(例如,将www.regular-expressions.info转换为wwwregularexpressionsinfo),那么最好使用regex版本的replace:
cleanedHost = re.sub(r"[^a-zA-Z0-9]+", "", host)
#3
2
output=re.findall("(?<=//)\w+.*(?=/)",str)final=re.sub(r"[^a-zA-Z0-9]+", "", output [0])print final
#4
-1
import restr="x8f8dL:s://www.qqq.zzz/iziv8ds8f8.dafidsao.dsfsi"re.findall('//([a-z.]*)', str)
#1
36
match
tries to match the entire string. Use search
instead. The following pattern would then match your requirements:
match试图匹配整个字符串。使用搜索。以下图案将符合您的要求:
m = re.search(r"//([^/]*)", str)print m.group(1)
Basically, we are looking for /
, then consume as many non-slash characters as possible. And those non-slash characters will be captured in group number 1.
基本上,我们正在寻找/,然后使用尽可能多的非斜杠字符。这些非斜杠字符将在组1中被捕获。
In fact, there is a slightly more advanced technique that does the same, but does not require capturing (which is generally time-consuming). It uses a so-called lookbehind:
事实上,有一种稍微高级一点的技术也可以做到这一点,但是不需要捕获(这通常很耗时)。它使用了所谓的“后视镜”:
m = re.search(r"(?<=//)[^/]*", str)print m.group()
Lookarounds are not included in the actual match, hence the desired result.
在实际的匹配中没有包含查找,因此得到了期望的结果。
This (or any other reasonable regex solution) will not remove the .
s immediately. But this can easily be done in a second step:
这个(或任何其他合理的regex解决方案)不会立即删除.s。但这很容易在第二步中实现:
m = re.search(r"(?<=//)[^/]*", str)host = m.group()cleanedHost = host.replace(".", "")
That does not even require regular expressions.
这甚至不需要正则表达式。
Of course, if you want to remove everything except for letters and digits (e.g. to turn www.regular-expressions.info
into wwwregularexpressionsinfo
) then you are better off using the regex version of replace
:
当然,如果您想删除除字母和数字之外的所有内容(例如,将www.regular-expressions.info转换为wwwregularexpressionsinfo),那么最好使用regex版本的replace:
cleanedHost = re.sub(r"[^a-zA-Z0-9]+", "", host)
#2
#3
2
output=re.findall("(?<=//)\w+.*(?=/)",str)final=re.sub(r"[^a-zA-Z0-9]+", "", output [0])print final
#4
-1
import restr="x8f8dL:s://www.qqq.zzz/iziv8ds8f8.dafidsao.dsfsi"re.findall('//([a-z.]*)', str)