我怎样才能更具体地表达这个正则表达式呢?

时间:2021-10-12 03:58:15

I'm using python and trying to separate the following string into two strings:

我正在使用python并试图将下面的字符串分成两个字符串:

'"99233 (I21.4,I50.23), 93010 (I21.4,I50.23)"'

stringA = "99233 (I21.4,I50.23),"
stringB = "93010 (I21.4,I50.23)"

I'm using the following expression in python:

我在python中使用的表达式如下:

pattern = re.compile('\d{5}.*[),|"|\n]')

So I do the following:

所以我这样做:

  1. there are always 5 numbers, so \d{5}
  2. 总有5个数字,所以{5}
  3. followed by (...alphanumerics...), so .*
  4. 接下来是(字母数字),所以
  5. then there is an end parens and comma and then another set OR there is a new line
  6. 然后有一个尾括号和逗号,然后有另一个集合或者有一个新行

But my RE keeps matching the whole line. Any suggestions?

但我的RE始终与整条线相匹配。有什么建议吗?

3 个解决方案

#1


1  

You could come up with:

你可以这样想:

import re

string = '99233 (I21.4,I50.23), 93010 (I21.4,I50.23)'
parts = re.split(r'(?<=\)),\ ', string)
print(parts)
# ['99233 (I21.4,I50.23)', '93010 (I21.4,I50.23)']

This uses a positive lookbehind and splits on the space.
See a demo on ideone.com.

这使用了一个积极的后视镜和分割空间。请查看ideone.com上的演示。

#2


1  

import re

data = '"99233 (I21.4,I50.23), 93010 (I21.4,I50.23)"'
print re.findall(r'\d{5}.*\(.*?\)', data)

#3


0  

You can use a positive lookahead:

你可以使用积极的前瞻:

\d{5}.*(?=\))

\ d { 5 }。*(? = \))

Additionally you could make this:

此外,你还可以这样做:

(\d{5})(.*(?=\())(.*)(?=\))

(\ d { 5 })(. *(? = \())(. *)(? = \))

Then you could grab the 5 digit string with back-reference 1, and the inner-string with back-reference 3

然后,您可以使用后引用1获取5位字符串,使用后引用3获取内部字符串

Or you could take it one step further:

或者你可以更进一步:

(\d{5})(.*(?=\())(\((\s{1,}\b|\b))(.*?(?=(\s{1,},|,)))(\s{1,},|,)(\s{1,}\b|\b)(.+)(?=\s{1,}\)|\))

(\ d { 5 })(. *(? = \())(\((\ s { 1,} \ b | \ b))(。* ?(? =(\ s { 1,},|,)))(\ s { 1,},|,)(\ s { 1,} \ b | \ b)(+)(? = \ s { 1,} \)| \))

Then you could get the following:

然后你可以得到以下信息:

5 digit string: Back-reference 1

5位字符串:反向引用1

Left-hand inner value: Back-reference 5

左内值:后引用5

Right-hand inner value: Back-reference 9

右内值:反向引用9

Observe

观察

EDIT: spotted a bug, thus removed the link. Here's the new one:

Regex with test strings

正则表达式测试字符串

#1


1  

You could come up with:

你可以这样想:

import re

string = '99233 (I21.4,I50.23), 93010 (I21.4,I50.23)'
parts = re.split(r'(?<=\)),\ ', string)
print(parts)
# ['99233 (I21.4,I50.23)', '93010 (I21.4,I50.23)']

This uses a positive lookbehind and splits on the space.
See a demo on ideone.com.

这使用了一个积极的后视镜和分割空间。请查看ideone.com上的演示。

#2


1  

import re

data = '"99233 (I21.4,I50.23), 93010 (I21.4,I50.23)"'
print re.findall(r'\d{5}.*\(.*?\)', data)

#3


0  

You can use a positive lookahead:

你可以使用积极的前瞻:

\d{5}.*(?=\))

\ d { 5 }。*(? = \))

Additionally you could make this:

此外,你还可以这样做:

(\d{5})(.*(?=\())(.*)(?=\))

(\ d { 5 })(. *(? = \())(. *)(? = \))

Then you could grab the 5 digit string with back-reference 1, and the inner-string with back-reference 3

然后,您可以使用后引用1获取5位字符串,使用后引用3获取内部字符串

Or you could take it one step further:

或者你可以更进一步:

(\d{5})(.*(?=\())(\((\s{1,}\b|\b))(.*?(?=(\s{1,},|,)))(\s{1,},|,)(\s{1,}\b|\b)(.+)(?=\s{1,}\)|\))

(\ d { 5 })(. *(? = \())(\((\ s { 1,} \ b | \ b))(。* ?(? =(\ s { 1,},|,)))(\ s { 1,},|,)(\ s { 1,} \ b | \ b)(+)(? = \ s { 1,} \)| \))

Then you could get the following:

然后你可以得到以下信息:

5 digit string: Back-reference 1

5位字符串:反向引用1

Left-hand inner value: Back-reference 5

左内值:后引用5

Right-hand inner value: Back-reference 9

右内值:反向引用9

Observe

观察

EDIT: spotted a bug, thus removed the link. Here's the new one:

Regex with test strings

正则表达式测试字符串