python regex在字符串后获取值

时间:2023-02-05 20:26:06

I am trying to parse a comma separated string keyword://pass@ip:port. The string is a comma separated string, however the password can contain any character including comma. hence I can not use a split operation based on comma as delimiter.

我试图解析逗号分隔的字符串关键字:// pass @ ip:port。该字符串是逗号分隔的字符串,但密码可以包含任何字符,包括逗号。因此我不能使用基于逗号的拆分操作作为分隔符。

I have tried to use regex to get the string after "myserver://" and later on I can split the rest of the information by using string operation (pass@ip:port/key1) but I could not make it working as I can not fetch the information after the above keyword.

我试图使用正则表达式来获取“myserver://”之后的字符串,之后我可以通过使用字符串操作(传递@ ip:port / key1)来拆分其余的信息,但我无法使其工作,因为我无法在上述关键字后获取信息。

myserver:// is a hardcoded string, and I need to get whatever follows each myserver as a comma separated list (i.e. pass@ip:port/key1, pass2@ip2:port2/key2, etc)

myserver://是一个硬编码的字符串,我需要得到每个myserver后面的任何内容作为逗号分隔列表(即传递@ ip:port / key1,pass2 @ ip2:port2 / key2等)

This is the closest I can get:

这是我能得到的最接近的:

import re  
my_servers="myserver://password,123@ip:port/key1,myserver://pass2@ip2:port2/key2"
result = re.search(r'myserver:\/\/(.*)[,(.*)|\s]', my_servers)

using search I tries to find the occurrence of the "myserver://" keyword followed by any characters, and ends with comma (means it will be followed by myserver://zzz,myserver://qqq) or space (incase of single myserver:// element, but I do not know how to do this better apart of using space as end-indicator). However this does not come out right. How can I do this better with regex?

使用搜索我试图找到“myserver://”关键字后跟任何字符的出现,并以逗号结尾(表示它将跟随myserver:// zzz,myserver:// qqq)或空格(incase of单个myserver://元素,但我不知道如何更好地将空间用作终结指标。然而,这并不正确。如何使用正则表达式更好地完成此操作?

1 个解决方案

#1


2  

You may consider the following splitting approach if you do not need to keep myserver:// in the results:

如果您不需要在结果中保留myserver://,则可以考虑以下拆分方法:

filter(None, re.split(r'\s*,?\s*myserver://', s))

The \s*,?\s*myserver:// pattern matches an optional , enclosed with 0+ whitespaces and then myserver:// substring. See this regex demo. Note we need to remove empty entries to get rid of an empty leading entry as when the match is found at the string start, the empty string at the beginning will be added to the resulting list.

\ s *,?\ s * myserver://模式匹配一​​个可选的,包含0+空格,然后是myserver:// substring。看到这个正则表达式演示。注意我们需要删除空条目以删除空的前导条目,因为在字符串start处找到匹配项时,开头的空字符串将添加到结果列表中。

Alternatively, you can use the lookahead based pattern with a lazy dot matching pattern with re.findall:

或者,您可以使用带有re.findall的延迟点匹配模式的基于先行模式:

rx = r"myserver://(.*?)(?=\s*,\s*myserver://|$)"

See the Python demo

请参阅Python演示

Details:

  • myserver:// - a literal substring
  • myserver:// - 一个文字子字符串

  • (.*?) - Capturing group 1 whose contents will be returned by re.findall matching any 0+ chars other than line break chars, as few as possible, up to the first occurrence (but excluding it)
  • (。*?) - 捕获第1组,其内容将由re.findall返回,匹配除换行符之外的任何0+字符,尽可能少,直到第一次出现(但不包括它)

  • (?=\s*,\s*myserver://|$) - either of the 2 alternatives:
    • \s*,\s*myserver:// - , enclosed with 0+ whitespaces and then a literal myserver:// substring
    • \ s *,\ s * myserver:// - ,附有0+空格,然后是文字myserver:// substring

    • | - or
    • | - 要么

    • $ - end of string.
    • $ - 结束字符串。

  • (?= \ s *,\ s * myserver:// | $) - 两个替代方案中的任何一个:\ s *,\ s * myserver:// - ,用0+空格括起来,然后是文字myserver:// substring | - 或$ - 字符串结尾。

Here is the regex demo.

这是正则表达式演示。

See a Python demo for the both approaches:

有关这两种方法,请参阅Python演示:

import re

s = "myserver://password,123@ip:port/key1,myserver://pass2@ip2:port2/key2"

rx1 = r'\s*,?\s*myserver://'
res1 = filter(None, re.split(rx1, s))
print(res1)

#or
rx2 = r"myserver://(.*?)(?=\s*,\s*myserver://|$)"
res2 = re.findall(rx2, s)
print(res2)

Both will print ['password,123@ip:port/key1', 'pass2@ip2:port2/key2'].

两者都将打印['password,123 @ ip:port / key1','pass2 @ ip2:port2 / key2']。

#1


2  

You may consider the following splitting approach if you do not need to keep myserver:// in the results:

如果您不需要在结果中保留myserver://,则可以考虑以下拆分方法:

filter(None, re.split(r'\s*,?\s*myserver://', s))

The \s*,?\s*myserver:// pattern matches an optional , enclosed with 0+ whitespaces and then myserver:// substring. See this regex demo. Note we need to remove empty entries to get rid of an empty leading entry as when the match is found at the string start, the empty string at the beginning will be added to the resulting list.

\ s *,?\ s * myserver://模式匹配一​​个可选的,包含0+空格,然后是myserver:// substring。看到这个正则表达式演示。注意我们需要删除空条目以删除空的前导条目,因为在字符串start处找到匹配项时,开头的空字符串将添加到结果列表中。

Alternatively, you can use the lookahead based pattern with a lazy dot matching pattern with re.findall:

或者,您可以使用带有re.findall的延迟点匹配模式的基于先行模式:

rx = r"myserver://(.*?)(?=\s*,\s*myserver://|$)"

See the Python demo

请参阅Python演示

Details:

  • myserver:// - a literal substring
  • myserver:// - 一个文字子字符串

  • (.*?) - Capturing group 1 whose contents will be returned by re.findall matching any 0+ chars other than line break chars, as few as possible, up to the first occurrence (but excluding it)
  • (。*?) - 捕获第1组,其内容将由re.findall返回,匹配除换行符之外的任何0+字符,尽可能少,直到第一次出现(但不包括它)

  • (?=\s*,\s*myserver://|$) - either of the 2 alternatives:
    • \s*,\s*myserver:// - , enclosed with 0+ whitespaces and then a literal myserver:// substring
    • \ s *,\ s * myserver:// - ,附有0+空格,然后是文字myserver:// substring

    • | - or
    • | - 要么

    • $ - end of string.
    • $ - 结束字符串。

  • (?= \ s *,\ s * myserver:// | $) - 两个替代方案中的任何一个:\ s *,\ s * myserver:// - ,用0+空格括起来,然后是文字myserver:// substring | - 或$ - 字符串结尾。

Here is the regex demo.

这是正则表达式演示。

See a Python demo for the both approaches:

有关这两种方法,请参阅Python演示:

import re

s = "myserver://password,123@ip:port/key1,myserver://pass2@ip2:port2/key2"

rx1 = r'\s*,?\s*myserver://'
res1 = filter(None, re.split(rx1, s))
print(res1)

#or
rx2 = r"myserver://(.*?)(?=\s*,\s*myserver://|$)"
res2 = re.findall(rx2, s)
print(res2)

Both will print ['password,123@ip:port/key1', 'pass2@ip2:port2/key2'].

两者都将打印['password,123 @ ip:port / key1','pass2 @ ip2:port2 / key2']。