使用Python re.match提取字符串

时间:2021-11-01 22:38:14
import restr="x8f8dL:s://www.qqq.zzz/iziv8ds8f8.dafidsao.dsfsi"str2=re.match("[a-zA-Z]*//([a-zA-Z]*)",str)print str2.group()current result=> errorexpected => wwwqqqzzz

I want to extract the string wwwqqqzzz. How I do that?

我想提取字符串wwwqqqzzz。我怎么做呢?

Maybe there are a lot of dots, such as:

也许有很多点,比如:

"whatever..s#$@.d.:af//wwww.xxx.yn.zsdfsd.asfds.f.ds.fsd.whatever/123.dfiid"

In this case, I basically want the stuff bounded by // and /. How do I achieve that?

在这种情况下,我基本上想要以//和/为界的东西。我怎么做到的?

One additional question:

一个额外的问题:

import restr="xxx.yyy.xxx:80"m = re.search(r"([^:]*)", str)str2=m.group(0)print str2str2=m.group(1)print str2

Seems that m.group(0) and m.group(1) are the same.

似乎m.g(0)和m.g(1)是一样的。

4 个解决方案

#1


36  

match tries to match the entire string. Use search instead. The following pattern would then match your requirements:

match试图匹配整个字符串。使用搜索。以下图案将符合您的要求:

m = re.search(r"//([^/]*)", str)print m.group(1)

Basically, we are looking for /, then consume as many non-slash characters as possible. And those non-slash characters will be captured in group number 1.

基本上,我们正在寻找/,然后使用尽可能多的非斜杠字符。这些非斜杠字符将在组1中被捕获。

In fact, there is a slightly more advanced technique that does the same, but does not require capturing (which is generally time-consuming). It uses a so-called lookbehind:

事实上,有一种稍微高级一点的技术也可以做到这一点,但是不需要捕获(这通常很耗时)。它使用了所谓的“后视镜”:

m = re.search(r"(?<=//)[^/]*", str)print m.group()

Lookarounds are not included in the actual match, hence the desired result.

在实际的匹配中没有包含查找,因此得到了期望的结果。

This (or any other reasonable regex solution) will not remove the .s immediately. But this can easily be done in a second step:

这个(或任何其他合理的regex解决方案)不会立即删除.s。但这很容易在第二步中实现:

m = re.search(r"(?<=//)[^/]*", str)host = m.group()cleanedHost = host.replace(".", "")

That does not even require regular expressions.

这甚至不需要正则表达式。

Of course, if you want to remove everything except for letters and digits (e.g. to turn www.regular-expressions.info into wwwregularexpressionsinfo) then you are better off using the regex version of replace:

当然,如果您想删除除字母和数字之外的所有内容(例如,将www.regular-expressions.info转换为wwwregularexpressionsinfo),那么最好使用regex版本的replace:

cleanedHost = re.sub(r"[^a-zA-Z0-9]+", "", host)

#2


3  

print re.sub(r"[.]","",re.search(r"(?<=//).*?(?=/)",str).group(0))

See this demo.

看到这个演示。

#3


2  

output=re.findall("(?<=//)\w+.*(?=/)",str)final=re.sub(r"[^a-zA-Z0-9]+", "", output [0])print final

#4


-1  

import restr="x8f8dL:s://www.qqq.zzz/iziv8ds8f8.dafidsao.dsfsi"re.findall('//([a-z.]*)', str)

#1


36  

match tries to match the entire string. Use search instead. The following pattern would then match your requirements:

match试图匹配整个字符串。使用搜索。以下图案将符合您的要求:

m = re.search(r"//([^/]*)", str)print m.group(1)

Basically, we are looking for /, then consume as many non-slash characters as possible. And those non-slash characters will be captured in group number 1.

基本上,我们正在寻找/,然后使用尽可能多的非斜杠字符。这些非斜杠字符将在组1中被捕获。

In fact, there is a slightly more advanced technique that does the same, but does not require capturing (which is generally time-consuming). It uses a so-called lookbehind:

事实上,有一种稍微高级一点的技术也可以做到这一点,但是不需要捕获(这通常很耗时)。它使用了所谓的“后视镜”:

m = re.search(r"(?<=//)[^/]*", str)print m.group()

Lookarounds are not included in the actual match, hence the desired result.

在实际的匹配中没有包含查找,因此得到了期望的结果。

This (or any other reasonable regex solution) will not remove the .s immediately. But this can easily be done in a second step:

这个(或任何其他合理的regex解决方案)不会立即删除.s。但这很容易在第二步中实现:

m = re.search(r"(?<=//)[^/]*", str)host = m.group()cleanedHost = host.replace(".", "")

That does not even require regular expressions.

这甚至不需要正则表达式。

Of course, if you want to remove everything except for letters and digits (e.g. to turn www.regular-expressions.info into wwwregularexpressionsinfo) then you are better off using the regex version of replace:

当然,如果您想删除除字母和数字之外的所有内容(例如,将www.regular-expressions.info转换为wwwregularexpressionsinfo),那么最好使用regex版本的replace:

cleanedHost = re.sub(r"[^a-zA-Z0-9]+", "", host)

#2


3  

print re.sub(r"[.]","",re.search(r"(?<=//).*?(?=/)",str).group(0))

See this demo.

看到这个演示。

#3


2  

output=re.findall("(?<=//)\w+.*(?=/)",str)final=re.sub(r"[^a-zA-Z0-9]+", "", output [0])print final

#4


-1  

import restr="x8f8dL:s://www.qqq.zzz/iziv8ds8f8.dafidsao.dsfsi"re.findall('//([a-z.]*)', str)