I want to search a file's contents using regular expressions. A sample of the file is:
我想使用正则表达式搜索文件的内容。该文件的样本是:
3 General 24
3.1 CR IOT133 (ID: 194) 24
3.1.1 Issue 24
4 Integration 25
4.11 CR IOT025 (ID: 125) 25
10.27 CR IOT111 (ID: 176) 77
And I want to extract the IOTxxx part (so lines 2, 5 and 6 in this example)
我想要提取IOTxxx部分(在本例中是第2、5和6行)
My script is:
我的脚本:
import re
fhandle = open("CR_headers.txt")
inp = fhandle.read()
crnumlist = re.findall('^\d{1,2}\.\d{1,2} CR (IOT\d{3}).*$', inp)
print crnumlist
The last statement prints an empty list. I tried running it from the console as well but the result is the same.
最后一个语句打印一个空列表。我也尝试从控制台运行它,但是结果是一样的。
If I use sublime text find with input: ^\d{1}\.\d{1} CR (IOT\d{3}).*$
I can get the matching lines.
如果我使用的文本找到输入:^ \ d { 1 } \。\ d { 1 } CR(物联网\ d { 3 })。我可以得到匹配的线。
Using python version 2.7.10 and sublime text 2 on a windows 7 box
在windows 7框中使用python版本2.7.10和卓越的文本2。
Any ideas on what I'm doing wrong will be greatly appreciated. Thanx
任何关于我做错的事情的想法都将受到极大的赞赏。谢谢
1 个解决方案
#1
2
You just need to include multi line modifier and define your regex as raw string. You must use the multi-line modifier when these two conditions are met.
您只需包含多行修饰符,并将正则表达式定义为原始字符串。当满足这两个条件时,必须使用多行修改器。
- Whenever anchors
^
,$
are used in your input regex. - 每当主持人^,美元被用于输入正则表达式。
-
And when the input string contain more than one line.
当输入字符串包含多于一行时。
crnumlist = re.findall(r'(?m)^\d{1,2}\.\d{1,2} CR (IOT\d{3}).*$', inp)
Example:
例子:
>>> s = '''3 General 24
3.1 CR IOT133 (ID: 194) 24
3.1.1 Issue 24
4 Integration 25
4.11 CR IOT025 (ID: 125) 25
10.27 CR IOT111 (ID: 176) 77'''
>>> re.findall(r'(?m)^\d{1,2}\.\d{1,2} CR (IOT\d{3}).*$', s)
['IOT133', 'IOT025', 'IOT111']
>>> re.findall(r'^\d{1,2}\.\d{1,2} CR (IOT\d{3}).*$', s)
[]
#1
2
You just need to include multi line modifier and define your regex as raw string. You must use the multi-line modifier when these two conditions are met.
您只需包含多行修饰符,并将正则表达式定义为原始字符串。当满足这两个条件时,必须使用多行修改器。
- Whenever anchors
^
,$
are used in your input regex. - 每当主持人^,美元被用于输入正则表达式。
-
And when the input string contain more than one line.
当输入字符串包含多于一行时。
crnumlist = re.findall(r'(?m)^\d{1,2}\.\d{1,2} CR (IOT\d{3}).*$', inp)
Example:
例子:
>>> s = '''3 General 24
3.1 CR IOT133 (ID: 194) 24
3.1.1 Issue 24
4 Integration 25
4.11 CR IOT025 (ID: 125) 25
10.27 CR IOT111 (ID: 176) 77'''
>>> re.findall(r'(?m)^\d{1,2}\.\d{1,2} CR (IOT\d{3}).*$', s)
['IOT133', 'IOT025', 'IOT111']
>>> re.findall(r'^\d{1,2}\.\d{1,2} CR (IOT\d{3}).*$', s)
[]