I am trying to load and parse a JSON file in Python. But I'm stuck trying to load the file:
我正在尝试在Python中加载和解析JSON文件。但我正试图加载文件:
import json
json_data = open('file')
data = json.load(json_data)
Yields:
收益率:
ValueError: Extra data: line 2 column 1 - line 225116 column 1 (char 232 - 160128774)
I looked at 18.2. json
— JSON encoder and decoder in the Python documentation, but it's pretty discouraging to read through this horrible-looking documentation.
我看了18.2。json - Python文档中的json编码器和解码器,但是阅读这个可怕的文档非常令人沮丧。
3 个解决方案
#1
152
You have a JSON Lines format text file. You need to parse your file line by line:
您有一个JSON行格式的文本文件。您需要逐行解析您的文件:
import json
data = []
with open('file') as f:
for line in f:
data.append(json.loads(line))
Each line contains valid JSON, but as a whole, it is not a valid JSON value as there is no top-level list or object definition.
每一行都包含有效的JSON,但作为一个整体,它不是一个有效的JSON值,因为没有*列表或对象定义。
Note that because the file contains JSON per line, you are saved the headaches of trying to parse it all in one go or to figure out a streaming JSON parser. You can now opt to process each line separately before moving on to the next, saving memory in the process. You probably don't want to append each result to one list and then process everything if your file is really big.
请注意,由于该文件每行包含JSON,因此您不必为一次性解析它或找出流式JSON解析器而头疼。现在,您可以选择在转移到下一行之前分别处理每一行,以节省进程中的内存。您可能不希望将每个结果附加到一个列表,然后在文件很大的情况下处理所有内容。
If you have a file containing individual JSON objects with delimiters in-between, use How do I use the 'json' module to read in one JSON object at a time? to parse out individual objects using a buffered method.
如果您有一个包含单独的JSON对象、中间有分隔符的文件,请使用如何使用“JSON”模块一次读取一个JSON对象?使用缓冲方法解析单个对象。
#2
5
That is ill-formatted. You have one JSON object per line, but they are not contained in a larger data structure (ie an array). You'll either need to reformat it so that it begins with [
and ends with ]
with a comma at the end of each line, or parse it line by line as separate dictionaries.
这是格式错误。每行有一个JSON对象,但它们不包含在较大的数据结构(即数组)中。您需要重新格式化它,使它以[和]结尾,每一行末尾都有一个逗号,或者将它逐行解析为独立的字典。
#3
2
for those stumbling upon this question: the python jsonlines
library (much younger than this question) elegantly. handles files with one json document per line. see https://jsonlines.readthedocs.io/
对于那些在这个问题上遇到困难的人:python jsonlines库(比这个问题要年轻得多)。每行处理一个json文档的文件。参见https://jsonlines.readthedocs.io/
#1
152
You have a JSON Lines format text file. You need to parse your file line by line:
您有一个JSON行格式的文本文件。您需要逐行解析您的文件:
import json
data = []
with open('file') as f:
for line in f:
data.append(json.loads(line))
Each line contains valid JSON, but as a whole, it is not a valid JSON value as there is no top-level list or object definition.
每一行都包含有效的JSON,但作为一个整体,它不是一个有效的JSON值,因为没有*列表或对象定义。
Note that because the file contains JSON per line, you are saved the headaches of trying to parse it all in one go or to figure out a streaming JSON parser. You can now opt to process each line separately before moving on to the next, saving memory in the process. You probably don't want to append each result to one list and then process everything if your file is really big.
请注意,由于该文件每行包含JSON,因此您不必为一次性解析它或找出流式JSON解析器而头疼。现在,您可以选择在转移到下一行之前分别处理每一行,以节省进程中的内存。您可能不希望将每个结果附加到一个列表,然后在文件很大的情况下处理所有内容。
If you have a file containing individual JSON objects with delimiters in-between, use How do I use the 'json' module to read in one JSON object at a time? to parse out individual objects using a buffered method.
如果您有一个包含单独的JSON对象、中间有分隔符的文件,请使用如何使用“JSON”模块一次读取一个JSON对象?使用缓冲方法解析单个对象。
#2
5
That is ill-formatted. You have one JSON object per line, but they are not contained in a larger data structure (ie an array). You'll either need to reformat it so that it begins with [
and ends with ]
with a comma at the end of each line, or parse it line by line as separate dictionaries.
这是格式错误。每行有一个JSON对象,但它们不包含在较大的数据结构(即数组)中。您需要重新格式化它,使它以[和]结尾,每一行末尾都有一个逗号,或者将它逐行解析为独立的字典。
#3
2
for those stumbling upon this question: the python jsonlines
library (much younger than this question) elegantly. handles files with one json document per line. see https://jsonlines.readthedocs.io/
对于那些在这个问题上遇到困难的人:python jsonlines库(比这个问题要年轻得多)。每行处理一个json文档的文件。参见https://jsonlines.readthedocs.io/