How can I capture not only a match, but the line number on which it appears?
如何不仅捕获匹配项,而且捕获它所显示的行号?
I have the following script:
我有以下脚本:
re.findall(pattern, a_file.read(), re.MULTILINE)
re.MULTILINE re.findall(模式,a_file.read())
Note: I have a lot of files to parse, and would prefer not to read (or reread) the file line by line.
注意:我有很多文件需要解析,我不希望逐行读取(或重新读取)文件。
2 个解决方案
#1
3
Try iterating through each line (with a count) to determine which line number a match can be found on. It could look something like this:
尝试遍历每一行(使用计数),以确定可以找到匹配的行号。它可以是这样的:
with open('somefile.txt', 'r') as a_file:
linecount = 0
for line in a_file:
linecount += 1
result = re.findall(pattern, line)
...
#2
0
If you are parsing a lot of files, you should consider a shell-script based version of this code. I absolutely love python, but knowing your way around UNIX tools is sometimes much easier. Use the right tool for the right job.
如果您正在解析大量的文件,您应该考虑这个代码的基于shell脚本的版本。我绝对喜欢python,但是了解您使用UNIX工具的方式有时要容易得多。为正确的工作使用正确的工具。
If you have access to a Linux machine, or a command line emulator like cmder you can do the following:
如果您能够访问Linux机器,或者像cmder这样的命令行仿真器,您可以执行以下操作:
find . -name "*.java" -exec grep -n -E "LOGGER.\w+\(" {} \;
The -n gets you the line number, and -E specifies that grep should look for a pattern, not a literal match. This example looks for all *.java files in the current directory structure, (current folder and all subfolders) and searches each one it finds for strings like "LOGGER.info(", "LOGGER.debug(", "LOGGER.error(", which shows me all my logging statements in my java code, but not statements where the LOGGER is initialized.
n获取行号,-E指定grep应该查找模式,而不是文字匹配。这个示例查找所有*。在当前目录结构(当前文件夹和所有子文件夹)中的java文件,并搜索它找到的每个子文件夹,查找“LOGGER.info”(“LOGGER.info”)、“LOGGER.debug”(“LOGGER”)等字符串。error(“,它向我显示java代码中的所有日志记录语句,但不显示初始化日志记录器的语句。
You will generally find that this is also much faster than using a single python script over and over, or even looping through files with a python script.
您通常会发现,这比一次又一次地使用单个python脚本要快得多,甚至可以使用python脚本遍历文件。
Edit: One side note - if you are using a windows console emulator, the final "\;"
is changed to a simple ";"
.
编辑:侧边注-如果您使用的是windows控制台模拟器,最后的“\;”将改为简单的“;”。
#1
3
Try iterating through each line (with a count) to determine which line number a match can be found on. It could look something like this:
尝试遍历每一行(使用计数),以确定可以找到匹配的行号。它可以是这样的:
with open('somefile.txt', 'r') as a_file:
linecount = 0
for line in a_file:
linecount += 1
result = re.findall(pattern, line)
...
#2
0
If you are parsing a lot of files, you should consider a shell-script based version of this code. I absolutely love python, but knowing your way around UNIX tools is sometimes much easier. Use the right tool for the right job.
如果您正在解析大量的文件,您应该考虑这个代码的基于shell脚本的版本。我绝对喜欢python,但是了解您使用UNIX工具的方式有时要容易得多。为正确的工作使用正确的工具。
If you have access to a Linux machine, or a command line emulator like cmder you can do the following:
如果您能够访问Linux机器,或者像cmder这样的命令行仿真器,您可以执行以下操作:
find . -name "*.java" -exec grep -n -E "LOGGER.\w+\(" {} \;
The -n gets you the line number, and -E specifies that grep should look for a pattern, not a literal match. This example looks for all *.java files in the current directory structure, (current folder and all subfolders) and searches each one it finds for strings like "LOGGER.info(", "LOGGER.debug(", "LOGGER.error(", which shows me all my logging statements in my java code, but not statements where the LOGGER is initialized.
n获取行号,-E指定grep应该查找模式,而不是文字匹配。这个示例查找所有*。在当前目录结构(当前文件夹和所有子文件夹)中的java文件,并搜索它找到的每个子文件夹,查找“LOGGER.info”(“LOGGER.info”)、“LOGGER.debug”(“LOGGER”)等字符串。error(“,它向我显示java代码中的所有日志记录语句,但不显示初始化日志记录器的语句。
You will generally find that this is also much faster than using a single python script over and over, or even looping through files with a python script.
您通常会发现,这比一次又一次地使用单个python脚本要快得多,甚至可以使用python脚本遍历文件。
Edit: One side note - if you are using a windows console emulator, the final "\;"
is changed to a simple ";"
.
编辑:侧边注-如果您使用的是windows控制台模拟器,最后的“\;”将改为简单的“;”。