Python 3:使用REGEX搜索大型文本文件

时间:2021-07-26 18:15:33

I wish to search a large text file with regex and have set-up the following code:

我希望使用正则表达式搜索大型文本文件,并设置以下代码:

import re

regex = input("REGEX: ")

SearchFunction = re.compile(regex)

f = open('data','r', encoding='utf-8')

result = re.search(SearchFunction, f)

print(result.groups())

f.close()

Of course, this doesn't work because the second argument for re.search should be a string or buffer. However, I cannot insert all of my text file into a string as it is too long (meaning that it would take forever). What is the alternative?

当然,这不起作用,因为re.search的第二个参数应该是字符串或缓冲区。但是,我无法将所有文本文件插入到字符串中,因为它太长(意味着它将需要永远)。有什么选择?

2 个解决方案

#1


6  

You check if the pattern matches for each line. This won't load the entire file to the memory:

您检查每个行的模式是否匹配。这不会将整个文件加载到内存中:

for line in f:
    result = re.search(SearchFunction, line)

#2


5  

You can use a memory-mapped file with the mmap module. Think of it as a file pretending to be a string (or the opposite of a StringIO). You can find an example in this Python Module of the Week article about mmap by Doug Hellman.

您可以将内存映射文件与mmap模块一起使用。可以把它想象成一个伪装成字符串的文件(或者与StringIO相反)。你可以在Doug Hellman的这篇关于mmap的本周Python模块中找到一个例子。

#1


6  

You check if the pattern matches for each line. This won't load the entire file to the memory:

您检查每个行的模式是否匹配。这不会将整个文件加载到内存中:

for line in f:
    result = re.search(SearchFunction, line)

#2


5  

You can use a memory-mapped file with the mmap module. Think of it as a file pretending to be a string (or the opposite of a StringIO). You can find an example in this Python Module of the Week article about mmap by Doug Hellman.

您可以将内存映射文件与mmap模块一起使用。可以把它想象成一个伪装成字符串的文件(或者与StringIO相反)。你可以在Doug Hellman的这篇关于mmap的本周Python模块中找到一个例子。