从文件中解析数据并存储在数组中

时间:2021-11-29 21:17:03

I am trying to parse data from a file which has two sets of data included in it. The file has header information for the first 40 lines of the file and then is followed by 1000 lines of two columns of data. An additional file has been appended to the file with the same format. That is, lines 1041 through 1081 have the second file's header information, followed by 1000 lines of two column data. The first column in for both sections of data is the same. Therefore, I want to parse the data file to remove the header section and save the data to a 3x1000 array.

我试图从包含两组数据的文件中解析数据。该文件包含文件前40行的标题信息,然后是1000行两列数据。附加文件已添加到文件中,格式相同。也就是说,行1041到1081具有第二文件的头信息,接着是1000行的两列数据。两个数据部分的第一列是相同的。因此,我想解析数据文件以删除标题部分并将数据保存到3x1000数组。

The file is organized as:

该文件组织为:

Line 1: //Header information

第1行://标题信息

Line 2: //Header information

第2行://标题信息

...

Line 40: 1.000e3 -4.000e-3

第40行:1.000e3 -4.000e-3

Line 41: 1.001e3 -4.324e-3

第41行:1.001e3 -4.324e-3

...

Line 1000: 10.000e3 -78.678e-3

1000行:10.000e3 -78.678e-3

Line 1001: //Header Information

第1001行://标题信息

Line 1002: //Header Information

第1002行://标题信息

Line 1041: 1.000e3 -16.000e-3

1041行:1.000e3 -16.000e-3

Line 41: 1.001e3 -14.324e-3

第41行:1.001e3 -14.324e-3

...

Line 2000: 10.000e3 -22.178e-3

2000行:10.000e3 -22.178e-3

I want to parse on the columned data and output to an array with the format of

我想解析圆柱形数据并输出到格式为的数组

[1.000e3, -4.000e-3, -16.000e-3]

[1.000e3,-4.000e-3,-16.000e-3]

[1.001e3, -4.432e-3, -14.423e-3]

[1.001e3,-4.432e-3,-14.423e-3]

...

[10.00e3. -78.678e-3, -22.178e-3]

[10.00e3。 -78.678e-3,-22.178e-3]

I have tried the following: DATA = [[0 for x in xrange(3)] for x in xrange(10000)]

我尝试了以下内容:DATA = [[x for x in xrange(3)] for x in xrange(10000)]

for i in sort(os.listdir('.')):

for sort in sort(os.listdir('。')):

for lines in range(0, 39):
        dataFile.readline()

for lines in range(0, 10000):
        readData = dataFile.readline()
        dataLine = readData.split()
        DATA[0].append(dataLine[0])
        DATA[1].append(dataLine[1])

for lines in range(0, 39):
        dataFile.readline()

for lines in range(0, 10000):
        readData = dataFile.readline()
        dataLine = readData.split()
        DATA[2].append(dataLine[1])

dataFile.close()

Thanks for your help in advance.

感谢您的帮助。

1 个解决方案

#1


from itertools import islice
def get_headers_and_columns(fhandle):
   return list(islice(fhandle,0,40)),zip(*map(str.split,islice(fhandle,0,1000)))

with open("input.txt") as f_in,open("output.txt","w") as f_out:
    headers, columns = get_headers_and_columns(f_in)
    headers2, columns2 = get_headers_and_columns(f_in)
    columns.append(columns2[-1])
    f_out.write("\n".join(map(" ".join,zip(*columns)))

is one way you could accomplish this ... at least I think that will work

是你实现这一目标的一种方式......至少我认为这样可行

#1


from itertools import islice
def get_headers_and_columns(fhandle):
   return list(islice(fhandle,0,40)),zip(*map(str.split,islice(fhandle,0,1000)))

with open("input.txt") as f_in,open("output.txt","w") as f_out:
    headers, columns = get_headers_and_columns(f_in)
    headers2, columns2 = get_headers_and_columns(f_in)
    columns.append(columns2[-1])
    f_out.write("\n".join(map(" ".join,zip(*columns)))

is one way you could accomplish this ... at least I think that will work

是你实现这一目标的一种方式......至少我认为这样可行