I am using python to search through a text log file line by line and I want to save a certain part of a line as a variable. I am using Regex but don't think I am using it correctly as I am always get None
for my variable string_I_want
. I was looking at other Regex questions on here and saw people adding .group()
to the end of their re.search
but that gives me an error. I am not the most familiar with Regex but can't figure out where am I going wrong?
我正在使用python逐行搜索文本日志文件,我想将一行的某一部分保存为变量。我正在使用Regex,但是不认为我使用的是正确的,因为我的变量string_I_want总是得不到任何值。我查看了这里的其他Regex问题,看到人们在他们的re.search的末尾添加了.group(),但这给了我一个错误。我不是最熟悉的Regex,但不知道我哪里出错了?
Sample log file:
示例日志文件:
2016-03-08 11:23:25 test_data:0317: m=string_I_want max_count: 17655, avg_size: 320, avg_rate: 165
My script:
我的脚本:
def get_data(log_file):
#Read file line by line
with open(log_file) as f:
f = f.readlines()
for line in f:
date = line[0:10]
time = line[11:19]
string_I_want=re.search(r'/m=\w*/g',line)
print date, time, string_I_want
3 个解决方案
#1
2
You need to remove the /.../
delimiters with the global flag, and use a capturing group:
您需要删除/…/带有全局标志的分隔符,并使用捕获组:
mObj = re.search(r'm=(\w+)',line)
if mObj:
string_I_want = mObj.group(1)
See this regex demo and the Python demo:
请看这个regex演示和Python演示:
import re
p = r'm=(\w+)' # Init the regex with a raw string literal (so, no need to use \\w, just \w is enough)
s = "2016-03-08 11:23:25 test_data:0317: m=string_I_want max_count: 17655, avg_size: 320, avg_rate: 165"
mObj = re.search(p, s) # Execute a regex-based search
if mObj: # Check if we got a match
print(mObj.group(1)) # DEMO: Print the Group 1 value
Pattern details:
模式的细节:
-
m=
- matchesm=
literal character sequence (add a space before or\b
if a whole word must be matched) - m= - matches m=文字字符序列
-
(\w+)
- Group 1 capturing 1+ alphanumeric or underscore characters. We can reference this value with the.group(1)
method. - 组1捕获1+字母数字或下划线字符。我们可以使用.group(1)方法引用这个值。
#2
0
Do:
做的事:
(?<=\sm=)\S+
Example:
例子:
In [135]: s = '2016-03-08 11:23:25 test_data:0317: m=string_I_want max_count: 17655, avg_size: 320, avg_rate: 165'
In [136]: re.search(r'(?<=\sm=)\S+', s).group()
Out[136]: 'string_I_want'
#3
0
Here is what you need:
以下是你需要的:
import re
def get_data(logfile):
f = open(logfile,"r")
for line in f.readlines():
s_i_w = re.search( r'(?<=\sm=)\S+', line).group()
if s_i_w:
print s_i_w
f.close()
#1
2
You need to remove the /.../
delimiters with the global flag, and use a capturing group:
您需要删除/…/带有全局标志的分隔符,并使用捕获组:
mObj = re.search(r'm=(\w+)',line)
if mObj:
string_I_want = mObj.group(1)
See this regex demo and the Python demo:
请看这个regex演示和Python演示:
import re
p = r'm=(\w+)' # Init the regex with a raw string literal (so, no need to use \\w, just \w is enough)
s = "2016-03-08 11:23:25 test_data:0317: m=string_I_want max_count: 17655, avg_size: 320, avg_rate: 165"
mObj = re.search(p, s) # Execute a regex-based search
if mObj: # Check if we got a match
print(mObj.group(1)) # DEMO: Print the Group 1 value
Pattern details:
模式的细节:
-
m=
- matchesm=
literal character sequence (add a space before or\b
if a whole word must be matched) - m= - matches m=文字字符序列
-
(\w+)
- Group 1 capturing 1+ alphanumeric or underscore characters. We can reference this value with the.group(1)
method. - 组1捕获1+字母数字或下划线字符。我们可以使用.group(1)方法引用这个值。
#2
0
Do:
做的事:
(?<=\sm=)\S+
Example:
例子:
In [135]: s = '2016-03-08 11:23:25 test_data:0317: m=string_I_want max_count: 17655, avg_size: 320, avg_rate: 165'
In [136]: re.search(r'(?<=\sm=)\S+', s).group()
Out[136]: 'string_I_want'
#3
0
Here is what you need:
以下是你需要的:
import re
def get_data(logfile):
f = open(logfile,"r")
for line in f.readlines():
s_i_w = re.search( r'(?<=\sm=)\S+', line).group()
if s_i_w:
print s_i_w
f.close()