文本文件使用Python解析并在数组中保存数字并在csv中导出

时间:2022-09-06 00:25:32

I want to parse some text in python 2.7 and export the the results in an array. For example,

我想在python 2.7中解析一些文本并将结果导出到数组中。例如,

Record # 3741: 2018 Feb 16  13:26:27.632  [00] abc adbada adba adab adbadba bdadba 
 =>  abc_cdf_ghk_adha 468 H S16.7:Output [0] 0 [1] 0 [2] 0 [3] 0 
   Record # 3742: 2018 Feb 16  13:26:27.632  [00] abc adbada adba adab adbadba bdadba 
 =>  abc_cdf_ghk_adha 474 H S16.7:Output [4] 0 [5] 0 [6] 0 [7] 0 [8] 0 
   Record # 3795: 2018 Feb 16  13:26:27.633  [00] abc adbada adba adab adbadba bdadba 
 =>  abc_cdf_ghk_adha 468 H S16.7:Output [0] 16861 [1] 16867 [2] 16873 [3] 16878 
   Record # 3800: 2018 Feb 16  13:26:27.633  [00] abc adbada adba adab adbadba bdadba 
 =>  abc_cdf_ghk_adha 474 H S16.7:Output [4] 16873 [5] 16861 [6] 0 [7] 0 [8] 0 
   Record # 3931: 2018 Feb 16  13:26:27.634  [00] abc adbada adba adab adbadba bdadba 
 =>  abc_cdf_ghk_adha 468 H S16.7:Output [0] 16873 [1] 16867 [2] 128 [3] 128 
   Record # 3932: 2018 Feb 16  13:26:27.634  [00] abc adbada adba adab adbadba bdadba 
 =>  abc_cdf_ghk_adha 474 H S16.7:Output [4] 16878 [5] 16873 [6] 16855 [7] 16867 [8] 16873 
   Record # 3971: 2018 Feb 16  13:26:27.635  [00] abc adbada adba adab adbadba bdadba 
 =>  abc_cdf_ghk_adha 468 H S16.7:Output [0] 16855 [1] 16849 [2] 129 [3] 129 
   Record # 3974: 2018 Feb 16  13:26:27.635  [00] abc adbada adba adab adbadba bdadba 
 =>  abc_cdf_ghk_adha 474 H S16.7:Output [4] 16867 [5] 16867 [6] 16861 [7] 16861 [8] 16867

From this specific lines I want to parse the even lines and save the numbers in an array. I should have 9 arrays (A0,A1,A2,A3,...A8) and keep updating the arrays in a loop.

从这个特定的行,我想解析偶数行并将数字保存在数组中。我应该有9个数组(A0,A1,A2,A3,... A8)并继续在循环中更新数组。

In this aforementioned case, A0 should be the values after [0]

在上述情况下,A0应该是[0]之后的值。

Variables: 
A0 --> [0] numbers after [0]
A1 --> [1] numbers after [1]
A2 --> [2] numbers after [2]
A3 --> [3] numbers after [3]
A4 --> [4] numbers after [4]
A5 --> [5] numbers after [5]
A6 --> [6] numbers after [6]
A7 --> [7] numbers after [7]
A8 --> [8] numbers after [8]

At the end of this loop I will save the variables in a csv file.

在这个循环结束时,我将变量保存在csv文件中。

My approach with python is like this:

我使用python的方法是这样的:

import re
from collections import defaultdict
import numpy as np
import matplotlib.pyplot as plt

result = defaultdict(list)
with open("C:\\Users\\ibrah\\Documents\\Python\\Test\\Input.txt","r") as f:
    for c, l in enumerate(f.readlines()): # Get enumerate to keep track of line numbers
        if (c+1) % 2 == 0: # Even lines (line numbers start at 0)
            for m in re.findall('(?:\[(\d+)\])\s(\d+)', l): # 1st regex group will the the number in the []s and 2nd group will be the number after
                result[int(m[0])].append(int(m[1]))
RS0 = 10*(np.log10(result[0]/128))
RS1 = 10*(np.log10(result[1]/128))
RS2 = 10*(np.log10(result[2]/128))
RS3 = 10*(np.log10(result[3]/128))
RS4 = 10*(np.log10(result[4]/128))
RS5 = 10*(np.log10(result[5]/128))
RS6 = 10*(np.log10(result[6]/128))
RS7 = 10*(np.log10(result[7]/128))
RS8 = 10*(np.log10(result[8]/128))
plt.plot([RS0,RS1,RS2,RS3,RS4,RS5,RS6,RS7,RS8])
#plt.plot([RS0])
plt.ylabel('SNR')
plt.show()

** Here I am trying to plot the variables (RS0...RS8) and also export the values to csv file. May you please assist me to finalize my code to perform this operations.

**这里我试图绘制变量(RS0 ... RS8)并将值导出到csv文件。请你协助我完成我的代码以执行此操作。

2 个解决方案

#1


2  

You could use a better regex to grab all those occurrences of [X] Y. Combine that with a with clause, enumerate(), and a defaultdict(list) to keep your code cleaner, you get something like this:

您可以使用更好的正则表达式来获取[X] Y的所有出现。将它与with子句,enumerate()和defaultdict(list)结合使用以保持代码更清晰,您可以得到如下内容:

import re
from collections import defaultdict

result = defaultdict(list)
with open('C:\Users\xxxx\Contents.txt', 'r') as f:
    for c, l in enumerate(f.readlines()): # Get enumerate to keep track of line numbers
        if (c+1) % 2 == 0: # Even lines (line numbers start at 0)
            for m in re.findall('(?:\[(\d+)\])\s(\d+)', l): # 1st regex group will the the number in the []s and 2nd group will be the number after
                result[int(m[0])].append(int(m[1]))

A0 = result[0]
A1 = result[1]
# ...

If you want to plot the resulting arrays thereafter, first make sure you first convert the Python lists to numpy arrays via np.array(result[0]). Otherwise, you'd get:

如果您想在此后绘制结果数组,首先要确保首先通过np.array(result [0])将Python列表转换为numpy数组。否则,你会得到:

TypeError: unsupported operand type(s) for /: 'list' and 'int'

TypeError:/:'list'和'int'的不支持的操作数类型

Then, make sure you handle the 0s in the array somehow before you take their logarithm because log(0) is undefined. You could try adding 1 to the values and then taking their log.

然后,确保在取对数之前以某种方式处理数组中的0,因为log(0)未定义。您可以尝试在值中添加1,然后记录它们的日志。

Finally, you don't need to create all of those variables -- a list comprehension would work much more elegantly:

最后,您不需要创建所有这些变量 - 列表理解可以更加优雅地工作:

plt.plot([10*np.log10(np.array(result[i])/128) for i in sorted(result.keys())])

#2


-1  

If for each line beginning with => you want the numbers after Output as a list of int you could do something like this:

如果对于以=>开头的每一行,您希望输出后的数字作为int列表,您可以执行以下操作:

import re

with open('file.txt') as f:
    lines = [l for l in f.readlines() if l.startswith('=>')]

parsed_lines = []
for line in lines:
    numbers = re.findall('\d+', line.split('Output')[1])
    parsed_lines.append([int(e) for e in numbers])

print(parsed_lines)

So, for your given file you would have:

因此,对于您的给定文件,您将拥有:

[
    [0, 0, 1, 0, 2, 0, 3, 0],
    [4, 0, 5, 0, 6, 0, 7, 0, 8, 0],
    [0, 16861, 1, 16867, 2, 16873, 3, 16878],
    [4, 16873, 5, 16861, 6, 0, 7, 0, 8, 0],
    [0, 16873, 1, 16867, 2, 128, 3, 128],
    [4, 16878, 5, 16873, 6, 16855, 7, 16867, 8, 16873],
    [0, 16855, 1, 16849, 2, 129, 3, 129],
    [4, 16867, 5, 16867, 6, 16861, 7, 16861, 8, 16867]
]

#1


2  

You could use a better regex to grab all those occurrences of [X] Y. Combine that with a with clause, enumerate(), and a defaultdict(list) to keep your code cleaner, you get something like this:

您可以使用更好的正则表达式来获取[X] Y的所有出现。将它与with子句,enumerate()和defaultdict(list)结合使用以保持代码更清晰,您可以得到如下内容:

import re
from collections import defaultdict

result = defaultdict(list)
with open('C:\Users\xxxx\Contents.txt', 'r') as f:
    for c, l in enumerate(f.readlines()): # Get enumerate to keep track of line numbers
        if (c+1) % 2 == 0: # Even lines (line numbers start at 0)
            for m in re.findall('(?:\[(\d+)\])\s(\d+)', l): # 1st regex group will the the number in the []s and 2nd group will be the number after
                result[int(m[0])].append(int(m[1]))

A0 = result[0]
A1 = result[1]
# ...

If you want to plot the resulting arrays thereafter, first make sure you first convert the Python lists to numpy arrays via np.array(result[0]). Otherwise, you'd get:

如果您想在此后绘制结果数组,首先要确保首先通过np.array(result [0])将Python列表转换为numpy数组。否则,你会得到:

TypeError: unsupported operand type(s) for /: 'list' and 'int'

TypeError:/:'list'和'int'的不支持的操作数类型

Then, make sure you handle the 0s in the array somehow before you take their logarithm because log(0) is undefined. You could try adding 1 to the values and then taking their log.

然后,确保在取对数之前以某种方式处理数组中的0,因为log(0)未定义。您可以尝试在值中添加1,然后记录它们的日志。

Finally, you don't need to create all of those variables -- a list comprehension would work much more elegantly:

最后,您不需要创建所有这些变量 - 列表理解可以更加优雅地工作:

plt.plot([10*np.log10(np.array(result[i])/128) for i in sorted(result.keys())])

#2


-1  

If for each line beginning with => you want the numbers after Output as a list of int you could do something like this:

如果对于以=>开头的每一行,您希望输出后的数字作为int列表,您可以执行以下操作:

import re

with open('file.txt') as f:
    lines = [l for l in f.readlines() if l.startswith('=>')]

parsed_lines = []
for line in lines:
    numbers = re.findall('\d+', line.split('Output')[1])
    parsed_lines.append([int(e) for e in numbers])

print(parsed_lines)

So, for your given file you would have:

因此,对于您的给定文件,您将拥有:

[
    [0, 0, 1, 0, 2, 0, 3, 0],
    [4, 0, 5, 0, 6, 0, 7, 0, 8, 0],
    [0, 16861, 1, 16867, 2, 16873, 3, 16878],
    [4, 16873, 5, 16861, 6, 0, 7, 0, 8, 0],
    [0, 16873, 1, 16867, 2, 128, 3, 128],
    [4, 16878, 5, 16873, 6, 16855, 7, 16867, 8, 16873],
    [0, 16855, 1, 16849, 2, 129, 3, 129],
    [4, 16867, 5, 16867, 6, 16861, 7, 16861, 8, 16867]
]