I have a directory that is frequently dumped a set of text files that look like:
我有一个经常转储一组文本文件的目录,如下所示:
- FILEA.json
- FILEA.json
- FILEB.json
- FILEB.json
- FILEC.json
- FILEC.json
Each file contains an 'array' of JSON that looks like this:
每个文件都包含一个JSON的“数组”,如下所示:
[
{
"id" : "blah",
"type" : "thingy",
"ip" : "10.0.0.1",
"extended" : {
"var1" : "blah"
}
},
{
"id" : "blah2",
"type" : "thingy",
"ip" : "10.0.0.2",
"extended" : {
"var1" : "blah"
}
}
]
I would like to know what would be the most efficient way for reading these files and storing each individual JSON string into an array for further processing. I have looked at json.load() but that seems to read in single JSON strings rather than an 'array' of strings.
我想知道读取这些文件以及将每个单独的JSON字符串存储到数组中以进行进一步处理的最有效方法是什么。我看过json.load(),但这似乎是用单个JSON字符串而不是字符串的“数组”读取的。
One possible solution might be to strip the end brackets and split each JSON object via regex?
一种可能的解决方案可能是剥离端括号并通过正则表达式拆分每个JSON对象?
EDIT: Adding some example code:
编辑:添加一些示例代码:
json_array = []
for filename in sorted(os.listdir(CONFIG.DATA_DIR)):
m = re.match('^.*\.JSON$', filename)
if m:
data = json.load(open(CONFIG.DATA_DIR+filename))
for item in data:
json_array.append(item)
return json_array
1 个解决方案
#1
0
You have illegally formatted JSON. You cannot have any stray commas, so I removed them with an expression below.
你有非法格式化的JSON。你不能有任何流浪逗号,所以我用下面的表达式删除它们。
Example
from json import dumps, loads
from os import listdir
from os.path import isfile, join
from re import sub
'''This function transforms a directory of JSON array files and returns a
single-dimensional array of all of the items.
'''
def build_json_array(dir):
result = []
for f in listdir(dir):
if isfile(join(dir, f)):
with open(join(dir, f)) as json_data:
json_text = sub(',\s+\]', ']', json_data.read()) # We need to repair the data
result.extend(loads(json_text))
return result
if __name__ == '__main__':
print(dumps(build_json_array(CONFIG.DATA_DIR), indent=4))
Tip: You can run your JSON data through a Linter before loading and manipulating it.
提示:在加载和操作之前,您可以通过Linter运行JSON数据。
If your JSON is formed correctly, you can simply load the data directly from the file.
如果您的JSON形式正确,您只需直接从文件加载数据即可。
from json import load
result.extend(load(json_data))
Code golf
Implemented a one-liner that utilizes list comprehension and reduces the arrays by flatting them.
实现了一个利用列表理解的单行程序,并通过展平它们来减少数组。
from functools import reduce
from json import dumps, load
from os import listdir
from os.path import isfile, join
def build_json_array(dir):
return reduce(lambda x,y: x+y,(load(open(join(dir,f))) for f in listdir(dir) if isfile(join(dir,f))))
#1
0
You have illegally formatted JSON. You cannot have any stray commas, so I removed them with an expression below.
你有非法格式化的JSON。你不能有任何流浪逗号,所以我用下面的表达式删除它们。
Example
from json import dumps, loads
from os import listdir
from os.path import isfile, join
from re import sub
'''This function transforms a directory of JSON array files and returns a
single-dimensional array of all of the items.
'''
def build_json_array(dir):
result = []
for f in listdir(dir):
if isfile(join(dir, f)):
with open(join(dir, f)) as json_data:
json_text = sub(',\s+\]', ']', json_data.read()) # We need to repair the data
result.extend(loads(json_text))
return result
if __name__ == '__main__':
print(dumps(build_json_array(CONFIG.DATA_DIR), indent=4))
Tip: You can run your JSON data through a Linter before loading and manipulating it.
提示:在加载和操作之前,您可以通过Linter运行JSON数据。
If your JSON is formed correctly, you can simply load the data directly from the file.
如果您的JSON形式正确,您只需直接从文件加载数据即可。
from json import load
result.extend(load(json_data))
Code golf
Implemented a one-liner that utilizes list comprehension and reduces the arrays by flatting them.
实现了一个利用列表理解的单行程序,并通过展平它们来减少数组。
from functools import reduce
from json import dumps, load
from os import listdir
from os.path import isfile, join
def build_json_array(dir):
return reduce(lambda x,y: x+y,(load(open(join(dir,f))) for f in listdir(dir) if isfile(join(dir,f))))