Python:将包含行标题的文本文件读入新的CSV / Excel

时间:2021-01-24 18:14:22

I have a text file that i want to output into a new .csv file. The column headings are in-line to the data and I cannot figure out how to process the file. I am a python newbie.

我有一个文本文件,我想输出到一个新的.csv文件。列标题与数据内联,我无法弄清楚如何处理文件。我是一个蟒蛇新手。

The input file format is:
{"column 1 name":"column 1 value", "column 2 name":"column 2 value", "column 3 name":"column 3 value", "column 4 name":"column 4 value", "column 5 name":"column 5 value"}

输入文件格式为:{“第1列名称”:“第1列值”,“第2列名称”:“第2列值”,“第3列名称”:“第3列值”,“第4列名称”:“第4列值“,”第5列名称“:”第5列值“}

The output file format i want is:
column headers in line 1
comma separated values in lines 2 and beyond

我想要的输出文件格式是:第1行中的列标题第2行及以后的逗号分隔值

There are also times where a value may be blank so i need to account for that so the values don't shift to the wrong column header.

有时候值可能是空白的,所以我需要考虑到这一点,所以值不会转移到错误的列标题。

Thanks in advance!

提前致谢!

1 个解决方案

#1


0  

Your input file format isn't 100% clear. It looks like it is JSON and I am assuming that there is one JSON per line. I'd further assume that there are no line breaks between single entry.

您的输入文件格式不是100%清除。看起来它是JSON,我假设每行有一个JSON。我进一步假设单次输入之间没有换行符。

Your question may best be split into two parts.

您的问题最好分为两部分。

1. Reading Input File - JSON lines

Assumed data test.jl (jl for JSON lines):

假设数据test.jl(JSON行的jl):

{"header1": "value1.1", "header2": "value1.2"}
{"header1": "value2.1", "header2": "value2.2"}

Then you could read that file line by line and JSON parse each line:

然后你可以逐行读取该文件,JSON解析每一行:

import json

with open('test.jl') as input_f:
  data = [json.loads(line) for line in input_f]

print(data)

data here will be a list of dict's, output:

这里的数据将是dict的列表,输出:

[{'header2': 'value1.2', 'header1': 'value1.1'}, {'header2': 'value2.2', 'header1': 'value2.1'}]

2. Writing Output File from a list of dict's

2a. Determining the list of fields

Unless you already have a fixed list of fields, you may need to determine that list first.

除非您已经有固定的字段列表,否则您可能需要先确定该列表。

You could go through every dict, get its keys and build a unique list of it, like so:

您可以浏览每个字典,获取其密钥并构建一个唯一的列表,如下所示:

from functools import reduce

all_keys = sorted(reduce(lambda acc, item: acc | set(item.keys()), data, set()))

print(all_keys)

Here we start with an empty set() (to the right), which will be the first acc and every dict in data will become item. We are adding (using the | operator) the keys() to acc and the return value will become next round's acc (or the final return value). Since we are using sets, there won't be duplicates. The sorted just gives it a final touch but is optional.

这里我们从一个空的set()(右边)开始,这将是第一个acc,数据中的每个dict都将成为item。我们将(使用|运算符)keys()添加到acc,返回值将成为下一轮的acc(或最终返回值)。由于我们使用集合,因此不会重复。排序只是给它一个最后的触摸,但是可选的。

Output:

输出:

['header1', 'header2']

2b. Writing the CSV

Well, there is DictWriter which seems to fit the bill.

好吧,有DictWriter似乎符合要求。

from csv import DictWriter

with open('test.csv', 'w') as csv_f:
  csv_writer = DictWriter(csv_f, fieldnames=all_keys)
  csv_writer.writeheader()
  csv_writer.writerows(data)

Output in test.csv:

test.csv中的输出:

header1,header2
value1.1,value1.2
value2.1,value2.2

#1


0  

Your input file format isn't 100% clear. It looks like it is JSON and I am assuming that there is one JSON per line. I'd further assume that there are no line breaks between single entry.

您的输入文件格式不是100%清除。看起来它是JSON,我假设每行有一个JSON。我进一步假设单次输入之间没有换行符。

Your question may best be split into two parts.

您的问题最好分为两部分。

1. Reading Input File - JSON lines

Assumed data test.jl (jl for JSON lines):

假设数据test.jl(JSON行的jl):

{"header1": "value1.1", "header2": "value1.2"}
{"header1": "value2.1", "header2": "value2.2"}

Then you could read that file line by line and JSON parse each line:

然后你可以逐行读取该文件,JSON解析每一行:

import json

with open('test.jl') as input_f:
  data = [json.loads(line) for line in input_f]

print(data)

data here will be a list of dict's, output:

这里的数据将是dict的列表,输出:

[{'header2': 'value1.2', 'header1': 'value1.1'}, {'header2': 'value2.2', 'header1': 'value2.1'}]

2. Writing Output File from a list of dict's

2a. Determining the list of fields

Unless you already have a fixed list of fields, you may need to determine that list first.

除非您已经有固定的字段列表,否则您可能需要先确定该列表。

You could go through every dict, get its keys and build a unique list of it, like so:

您可以浏览每个字典,获取其密钥并构建一个唯一的列表,如下所示:

from functools import reduce

all_keys = sorted(reduce(lambda acc, item: acc | set(item.keys()), data, set()))

print(all_keys)

Here we start with an empty set() (to the right), which will be the first acc and every dict in data will become item. We are adding (using the | operator) the keys() to acc and the return value will become next round's acc (or the final return value). Since we are using sets, there won't be duplicates. The sorted just gives it a final touch but is optional.

这里我们从一个空的set()(右边)开始,这将是第一个acc,数据中的每个dict都将成为item。我们将(使用|运算符)keys()添加到acc,返回值将成为下一轮的acc(或最终返回值)。由于我们使用集合,因此不会重复。排序只是给它一个最后的触摸,但是可选的。

Output:

输出:

['header1', 'header2']

2b. Writing the CSV

Well, there is DictWriter which seems to fit the bill.

好吧,有DictWriter似乎符合要求。

from csv import DictWriter

with open('test.csv', 'w') as csv_f:
  csv_writer = DictWriter(csv_f, fieldnames=all_keys)
  csv_writer.writeheader()
  csv_writer.writerows(data)

Output in test.csv:

test.csv中的输出:

header1,header2
value1.1,value1.2
value2.1,value2.2