Python Regex用于从字符串中提取特定部分

时间:2022-05-22 00:11:41

I have the following string:

我有以下字符串:

SOURCEFILE:  file_name.dc     : 1 : log: the logging area

I am trying to store anything inbetween the third and the fourth colon in a variable and discard the rest.

我试图在变量中存储第三个和第四个冒号之间的任何内容并丢弃其余部分。

I've tried to make a regular expression to grab this but so far i have this which is wrong :

我试图制作一个正则表达式来抓住这个,但到目前为止,我有这个错误:

([^:]:[^:]*)

I would appreciate some help with this and an explanation of the valid regex so i can learn from my mistake.

我将非常感谢对此的一些帮助以及有效正则表达式的解释,以便我可以从错误中吸取教训。

3 个解决方案

#1


0  

The following assumptions have been made.

已做出以下假设。

  • The number of colons are always the same
  • 冒号总是相同的
  • Colon (:) is always the delimiter.
  • 冒号(:)始终是分隔符。
  • The input string is in the variable input
  • 输入字符串位于变量输入中

It is just a single line then:

那只是一行:

result = map(lambda s: s.strip(), input.split(':'))[3]

If you need to preserve spaces:

如果您需要保留空格:

result = input.split(':')[3]

#2


1  

>>> import re
>>> s = "SOURCEFILE:  file_name.dc     : 1 : log: the logging area"
>>> s1 = re.sub(r"[^\:]*\:[^\:]*\:[^\:]*\:([^\:]*)\:.*", r"\1", s)
>>> print s1
log

#3


0  

import re

s = "SOURCEFILE:  file_name.dc     : 1 : log: the logging area"
#find out all indices of colon in string
indices = [x.start() for x in re.finditer(":",s)]
result = s[:indices[2]] + s[indices[3]+1:] # remove 3rd and 4th colon as well
#result = s[:indices[2]+1] + s[indices[3]:] #remains 3rd and 4th colon

try to find out indices of 3rd and 4th colon, and concatenate the head and tail parts

尝试找出第3和第4个冒号的指数,并连接头部和尾部

#1


0  

The following assumptions have been made.

已做出以下假设。

  • The number of colons are always the same
  • 冒号总是相同的
  • Colon (:) is always the delimiter.
  • 冒号(:)始终是分隔符。
  • The input string is in the variable input
  • 输入字符串位于变量输入中

It is just a single line then:

那只是一行:

result = map(lambda s: s.strip(), input.split(':'))[3]

If you need to preserve spaces:

如果您需要保留空格:

result = input.split(':')[3]

#2


1  

>>> import re
>>> s = "SOURCEFILE:  file_name.dc     : 1 : log: the logging area"
>>> s1 = re.sub(r"[^\:]*\:[^\:]*\:[^\:]*\:([^\:]*)\:.*", r"\1", s)
>>> print s1
log

#3


0  

import re

s = "SOURCEFILE:  file_name.dc     : 1 : log: the logging area"
#find out all indices of colon in string
indices = [x.start() for x in re.finditer(":",s)]
result = s[:indices[2]] + s[indices[3]+1:] # remove 3rd and 4th colon as well
#result = s[:indices[2]+1] + s[indices[3]:] #remains 3rd and 4th colon

try to find out indices of 3rd and 4th colon, and concatenate the head and tail parts

尝试找出第3和第4个冒号的指数,并连接头部和尾部