Regexp使用出色的文本搜索,但不使用python脚本

时间:2022-08-22 18:10:25

I want to search a file's contents using regular expressions. A sample of the file is:

我想使用正则表达式搜索文件的内容。该文件的样本是:

3 General 24
3.1 CR IOT133 (ID: 194) 24
3.1.1 Issue 24
4 Integration 25
4.11 CR IOT025 (ID: 125) 25
10.27 CR IOT111 (ID: 176) 77

And I want to extract the IOTxxx part (so lines 2, 5 and 6 in this example)

我想要提取IOTxxx部分(在本例中是第2、5和6行)

My script is:

我的脚本:

import re
fhandle = open("CR_headers.txt")
inp = fhandle.read()
crnumlist = re.findall('^\d{1,2}\.\d{1,2} CR (IOT\d{3}).*$', inp)
print crnumlist

The last statement prints an empty list. I tried running it from the console as well but the result is the same.

最后一个语句打印一个空列表。我也尝试从控制台运行它,但是结果是一样的。

If I use sublime text find with input: ^\d{1}\.\d{1} CR (IOT\d{3}).*$ I can get the matching lines.

如果我使用的文本找到输入:^ \ d { 1 } \。\ d { 1 } CR(物联网\ d { 3 })。我可以得到匹配的线。

Using python version 2.7.10 and sublime text 2 on a windows 7 box

在windows 7框中使用python版本2.7.10和卓越的文本2。

Any ideas on what I'm doing wrong will be greatly appreciated. Thanx

任何关于我做错的事情的想法都将受到极大的赞赏。谢谢

1 个解决方案

#1


2  

You just need to include multi line modifier and define your regex as raw string. You must use the multi-line modifier when these two conditions are met.

您只需包含多行修饰符,并将正则表达式定义为原始字符串。当满足这两个条件时,必须使用多行修改器。

  • Whenever anchors ^, $ are used in your input regex.
  • 每当主持人^,美元被用于输入正则表达式。
  • And when the input string contain more than one line.

    当输入字符串包含多于一行时。

    crnumlist = re.findall(r'(?m)^\d{1,2}\.\d{1,2} CR (IOT\d{3}).*$', inp)
    

Example:

例子:

>>> s = '''3 General 24
3.1 CR IOT133 (ID: 194) 24
3.1.1 Issue 24
4 Integration 25
4.11 CR IOT025 (ID: 125) 25
10.27 CR IOT111 (ID: 176) 77'''
>>> re.findall(r'(?m)^\d{1,2}\.\d{1,2} CR (IOT\d{3}).*$', s)
['IOT133', 'IOT025', 'IOT111']
>>> re.findall(r'^\d{1,2}\.\d{1,2} CR (IOT\d{3}).*$', s)
[]

#1


2  

You just need to include multi line modifier and define your regex as raw string. You must use the multi-line modifier when these two conditions are met.

您只需包含多行修饰符,并将正则表达式定义为原始字符串。当满足这两个条件时,必须使用多行修改器。

  • Whenever anchors ^, $ are used in your input regex.
  • 每当主持人^,美元被用于输入正则表达式。
  • And when the input string contain more than one line.

    当输入字符串包含多于一行时。

    crnumlist = re.findall(r'(?m)^\d{1,2}\.\d{1,2} CR (IOT\d{3}).*$', inp)
    

Example:

例子:

>>> s = '''3 General 24
3.1 CR IOT133 (ID: 194) 24
3.1.1 Issue 24
4 Integration 25
4.11 CR IOT025 (ID: 125) 25
10.27 CR IOT111 (ID: 176) 77'''
>>> re.findall(r'(?m)^\d{1,2}\.\d{1,2} CR (IOT\d{3}).*$', s)
['IOT133', 'IOT025', 'IOT111']
>>> re.findall(r'^\d{1,2}\.\d{1,2} CR (IOT\d{3}).*$', s)
[]