I'm using python and trying to separate the following string into two strings:
我正在使用python并试图将下面的字符串分成两个字符串:
'"99233 (I21.4,I50.23), 93010 (I21.4,I50.23)"'
stringA = "99233 (I21.4,I50.23),"
stringB = "93010 (I21.4,I50.23)"
I'm using the following expression in python:
我在python中使用的表达式如下:
pattern = re.compile('\d{5}.*[),|"|\n]')
So I do the following:
所以我这样做:
- there are always 5 numbers, so \d{5}
- 总有5个数字,所以{5}
- followed by (...alphanumerics...), so .*
- 接下来是(字母数字),所以
- then there is an end parens and comma and then another set OR there is a new line
- 然后有一个尾括号和逗号,然后有另一个集合或者有一个新行
But my RE keeps matching the whole line. Any suggestions?
但我的RE始终与整条线相匹配。有什么建议吗?
3 个解决方案
#1
1
You could come up with:
你可以这样想:
import re
string = '99233 (I21.4,I50.23), 93010 (I21.4,I50.23)'
parts = re.split(r'(?<=\)),\ ', string)
print(parts)
# ['99233 (I21.4,I50.23)', '93010 (I21.4,I50.23)']
This uses a positive lookbehind and splits on the space.
See a demo on ideone.com.
这使用了一个积极的后视镜和分割空间。请查看ideone.com上的演示。
#2
1
import re
data = '"99233 (I21.4,I50.23), 93010 (I21.4,I50.23)"'
print re.findall(r'\d{5}.*\(.*?\)', data)
#3
0
You can use a positive lookahead:
你可以使用积极的前瞻:
\d{5}.*(?=\))
\ d { 5 }。*(? = \))
Additionally you could make this:
此外,你还可以这样做:
(\d{5})(.*(?=\())(.*)(?=\))
(\ d { 5 })(. *(? = \())(. *)(? = \))
Then you could grab the 5 digit string with back-reference 1, and the inner-string with back-reference 3
然后,您可以使用后引用1获取5位字符串,使用后引用3获取内部字符串
Or you could take it one step further:
或者你可以更进一步:
(\d{5})(.*(?=\())(\((\s{1,}\b|\b))(.*?(?=(\s{1,},|,)))(\s{1,},|,)(\s{1,}\b|\b)(.+)(?=\s{1,}\)|\))
(\ d { 5 })(. *(? = \())(\((\ s { 1,} \ b | \ b))(。* ?(? =(\ s { 1,},|,)))(\ s { 1,},|,)(\ s { 1,} \ b | \ b)(+)(? = \ s { 1,} \)| \))
Then you could get the following:
然后你可以得到以下信息:
5 digit string: Back-reference 1
5位字符串:反向引用1
Left-hand inner value: Back-reference 5
左内值:后引用5
Right-hand inner value: Back-reference 9
右内值:反向引用9
Observe
观察
EDIT: spotted a bug, thus removed the link. Here's the new one:
正则表达式测试字符串
#1
1
You could come up with:
你可以这样想:
import re
string = '99233 (I21.4,I50.23), 93010 (I21.4,I50.23)'
parts = re.split(r'(?<=\)),\ ', string)
print(parts)
# ['99233 (I21.4,I50.23)', '93010 (I21.4,I50.23)']
This uses a positive lookbehind and splits on the space.
See a demo on ideone.com.
这使用了一个积极的后视镜和分割空间。请查看ideone.com上的演示。
#2
1
import re
data = '"99233 (I21.4,I50.23), 93010 (I21.4,I50.23)"'
print re.findall(r'\d{5}.*\(.*?\)', data)
#3
0
You can use a positive lookahead:
你可以使用积极的前瞻:
\d{5}.*(?=\))
\ d { 5 }。*(? = \))
Additionally you could make this:
此外,你还可以这样做:
(\d{5})(.*(?=\())(.*)(?=\))
(\ d { 5 })(. *(? = \())(. *)(? = \))
Then you could grab the 5 digit string with back-reference 1, and the inner-string with back-reference 3
然后,您可以使用后引用1获取5位字符串,使用后引用3获取内部字符串
Or you could take it one step further:
或者你可以更进一步:
(\d{5})(.*(?=\())(\((\s{1,}\b|\b))(.*?(?=(\s{1,},|,)))(\s{1,},|,)(\s{1,}\b|\b)(.+)(?=\s{1,}\)|\))
(\ d { 5 })(. *(? = \())(\((\ s { 1,} \ b | \ b))(。* ?(? =(\ s { 1,},|,)))(\ s { 1,},|,)(\ s { 1,} \ b | \ b)(+)(? = \ s { 1,} \)| \))
Then you could get the following:
然后你可以得到以下信息:
5 digit string: Back-reference 1
5位字符串:反向引用1
Left-hand inner value: Back-reference 5
左内值:后引用5
Right-hand inner value: Back-reference 9
右内值:反向引用9
Observe
观察
EDIT: spotted a bug, thus removed the link. Here's the new one:
正则表达式测试字符串