I want to parse some text in python 2.7 and export the the results in an array. For example,
我想在python 2.7中解析一些文本并将结果导出到数组中。例如,
Record # 3741: 2018 Feb 16 13:26:27.632 [00] abc adbada adba adab adbadba bdadba
=> abc_cdf_ghk_adha 468 H S16.7:Output [0] 0 [1] 0 [2] 0 [3] 0
Record # 3742: 2018 Feb 16 13:26:27.632 [00] abc adbada adba adab adbadba bdadba
=> abc_cdf_ghk_adha 474 H S16.7:Output [4] 0 [5] 0 [6] 0 [7] 0 [8] 0
Record # 3795: 2018 Feb 16 13:26:27.633 [00] abc adbada adba adab adbadba bdadba
=> abc_cdf_ghk_adha 468 H S16.7:Output [0] 16861 [1] 16867 [2] 16873 [3] 16878
Record # 3800: 2018 Feb 16 13:26:27.633 [00] abc adbada adba adab adbadba bdadba
=> abc_cdf_ghk_adha 474 H S16.7:Output [4] 16873 [5] 16861 [6] 0 [7] 0 [8] 0
Record # 3931: 2018 Feb 16 13:26:27.634 [00] abc adbada adba adab adbadba bdadba
=> abc_cdf_ghk_adha 468 H S16.7:Output [0] 16873 [1] 16867 [2] 128 [3] 128
Record # 3932: 2018 Feb 16 13:26:27.634 [00] abc adbada adba adab adbadba bdadba
=> abc_cdf_ghk_adha 474 H S16.7:Output [4] 16878 [5] 16873 [6] 16855 [7] 16867 [8] 16873
Record # 3971: 2018 Feb 16 13:26:27.635 [00] abc adbada adba adab adbadba bdadba
=> abc_cdf_ghk_adha 468 H S16.7:Output [0] 16855 [1] 16849 [2] 129 [3] 129
Record # 3974: 2018 Feb 16 13:26:27.635 [00] abc adbada adba adab adbadba bdadba
=> abc_cdf_ghk_adha 474 H S16.7:Output [4] 16867 [5] 16867 [6] 16861 [7] 16861 [8] 16867
From this specific lines I want to parse the even lines and save the numbers in an array. I should have 9 arrays (A0,A1,A2,A3,...A8) and keep updating the arrays in a loop.
从这个特定的行,我想解析偶数行并将数字保存在数组中。我应该有9个数组(A0,A1,A2,A3,... A8)并继续在循环中更新数组。
In this aforementioned case, A0 should be the values after [0]
在上述情况下,A0应该是[0]之后的值。
Variables:
A0 --> [0] numbers after [0]
A1 --> [1] numbers after [1]
A2 --> [2] numbers after [2]
A3 --> [3] numbers after [3]
A4 --> [4] numbers after [4]
A5 --> [5] numbers after [5]
A6 --> [6] numbers after [6]
A7 --> [7] numbers after [7]
A8 --> [8] numbers after [8]
At the end of this loop I will save the variables in a csv file.
在这个循环结束时,我将变量保存在csv文件中。
My approach with python is like this:
我使用python的方法是这样的:
import re
from collections import defaultdict
import numpy as np
import matplotlib.pyplot as plt
result = defaultdict(list)
with open("C:\\Users\\ibrah\\Documents\\Python\\Test\\Input.txt","r") as f:
for c, l in enumerate(f.readlines()): # Get enumerate to keep track of line numbers
if (c+1) % 2 == 0: # Even lines (line numbers start at 0)
for m in re.findall('(?:\[(\d+)\])\s(\d+)', l): # 1st regex group will the the number in the []s and 2nd group will be the number after
result[int(m[0])].append(int(m[1]))
RS0 = 10*(np.log10(result[0]/128))
RS1 = 10*(np.log10(result[1]/128))
RS2 = 10*(np.log10(result[2]/128))
RS3 = 10*(np.log10(result[3]/128))
RS4 = 10*(np.log10(result[4]/128))
RS5 = 10*(np.log10(result[5]/128))
RS6 = 10*(np.log10(result[6]/128))
RS7 = 10*(np.log10(result[7]/128))
RS8 = 10*(np.log10(result[8]/128))
plt.plot([RS0,RS1,RS2,RS3,RS4,RS5,RS6,RS7,RS8])
#plt.plot([RS0])
plt.ylabel('SNR')
plt.show()
** Here I am trying to plot the variables (RS0...RS8) and also export the values to csv file. May you please assist me to finalize my code to perform this operations.
**这里我试图绘制变量(RS0 ... RS8)并将值导出到csv文件。请你协助我完成我的代码以执行此操作。
2 个解决方案
#1
2
You could use a better regex to grab all those occurrences of [X] Y
. Combine that with a with
clause, enumerate()
, and a defaultdict(list)
to keep your code cleaner, you get something like this:
您可以使用更好的正则表达式来获取[X] Y的所有出现。将它与with子句,enumerate()和defaultdict(list)结合使用以保持代码更清晰,您可以得到如下内容:
import re
from collections import defaultdict
result = defaultdict(list)
with open('C:\Users\xxxx\Contents.txt', 'r') as f:
for c, l in enumerate(f.readlines()): # Get enumerate to keep track of line numbers
if (c+1) % 2 == 0: # Even lines (line numbers start at 0)
for m in re.findall('(?:\[(\d+)\])\s(\d+)', l): # 1st regex group will the the number in the []s and 2nd group will be the number after
result[int(m[0])].append(int(m[1]))
A0 = result[0]
A1 = result[1]
# ...
If you want to plot the resulting arrays thereafter, first make sure you first convert the Python lists to numpy arrays via np.array(result[0])
. Otherwise, you'd get:
如果您想在此后绘制结果数组,首先要确保首先通过np.array(result [0])将Python列表转换为numpy数组。否则,你会得到:
TypeError: unsupported operand type(s) for /: 'list' and 'int'
TypeError:/:'list'和'int'的不支持的操作数类型
Then, make sure you handle the 0s in the array somehow before you take their logarithm because log(0)
is undefined. You could try adding 1 to the values and then taking their log.
然后,确保在取对数之前以某种方式处理数组中的0,因为log(0)未定义。您可以尝试在值中添加1,然后记录它们的日志。
Finally, you don't need to create all of those variables -- a list comprehension would work much more elegantly:
最后,您不需要创建所有这些变量 - 列表理解可以更加优雅地工作:
plt.plot([10*np.log10(np.array(result[i])/128) for i in sorted(result.keys())])
#2
-1
If for each line beginning with =>
you want the numbers after Output
as a list of int
you could do something like this:
如果对于以=>开头的每一行,您希望输出后的数字作为int列表,您可以执行以下操作:
import re
with open('file.txt') as f:
lines = [l for l in f.readlines() if l.startswith('=>')]
parsed_lines = []
for line in lines:
numbers = re.findall('\d+', line.split('Output')[1])
parsed_lines.append([int(e) for e in numbers])
print(parsed_lines)
So, for your given file you would have:
因此,对于您的给定文件,您将拥有:
[
[0, 0, 1, 0, 2, 0, 3, 0],
[4, 0, 5, 0, 6, 0, 7, 0, 8, 0],
[0, 16861, 1, 16867, 2, 16873, 3, 16878],
[4, 16873, 5, 16861, 6, 0, 7, 0, 8, 0],
[0, 16873, 1, 16867, 2, 128, 3, 128],
[4, 16878, 5, 16873, 6, 16855, 7, 16867, 8, 16873],
[0, 16855, 1, 16849, 2, 129, 3, 129],
[4, 16867, 5, 16867, 6, 16861, 7, 16861, 8, 16867]
]
#1
2
You could use a better regex to grab all those occurrences of [X] Y
. Combine that with a with
clause, enumerate()
, and a defaultdict(list)
to keep your code cleaner, you get something like this:
您可以使用更好的正则表达式来获取[X] Y的所有出现。将它与with子句,enumerate()和defaultdict(list)结合使用以保持代码更清晰,您可以得到如下内容:
import re
from collections import defaultdict
result = defaultdict(list)
with open('C:\Users\xxxx\Contents.txt', 'r') as f:
for c, l in enumerate(f.readlines()): # Get enumerate to keep track of line numbers
if (c+1) % 2 == 0: # Even lines (line numbers start at 0)
for m in re.findall('(?:\[(\d+)\])\s(\d+)', l): # 1st regex group will the the number in the []s and 2nd group will be the number after
result[int(m[0])].append(int(m[1]))
A0 = result[0]
A1 = result[1]
# ...
If you want to plot the resulting arrays thereafter, first make sure you first convert the Python lists to numpy arrays via np.array(result[0])
. Otherwise, you'd get:
如果您想在此后绘制结果数组,首先要确保首先通过np.array(result [0])将Python列表转换为numpy数组。否则,你会得到:
TypeError: unsupported operand type(s) for /: 'list' and 'int'
TypeError:/:'list'和'int'的不支持的操作数类型
Then, make sure you handle the 0s in the array somehow before you take their logarithm because log(0)
is undefined. You could try adding 1 to the values and then taking their log.
然后,确保在取对数之前以某种方式处理数组中的0,因为log(0)未定义。您可以尝试在值中添加1,然后记录它们的日志。
Finally, you don't need to create all of those variables -- a list comprehension would work much more elegantly:
最后,您不需要创建所有这些变量 - 列表理解可以更加优雅地工作:
plt.plot([10*np.log10(np.array(result[i])/128) for i in sorted(result.keys())])
#2
-1
If for each line beginning with =>
you want the numbers after Output
as a list of int
you could do something like this:
如果对于以=>开头的每一行,您希望输出后的数字作为int列表,您可以执行以下操作:
import re
with open('file.txt') as f:
lines = [l for l in f.readlines() if l.startswith('=>')]
parsed_lines = []
for line in lines:
numbers = re.findall('\d+', line.split('Output')[1])
parsed_lines.append([int(e) for e in numbers])
print(parsed_lines)
So, for your given file you would have:
因此,对于您的给定文件,您将拥有:
[
[0, 0, 1, 0, 2, 0, 3, 0],
[4, 0, 5, 0, 6, 0, 7, 0, 8, 0],
[0, 16861, 1, 16867, 2, 16873, 3, 16878],
[4, 16873, 5, 16861, 6, 0, 7, 0, 8, 0],
[0, 16873, 1, 16867, 2, 128, 3, 128],
[4, 16878, 5, 16873, 6, 16855, 7, 16867, 8, 16873],
[0, 16855, 1, 16849, 2, 129, 3, 129],
[4, 16867, 5, 16867, 6, 16861, 7, 16861, 8, 16867]
]