I have 300+ files in a folder each containing 3000+ dictionaries of the form below:
我在一个文件夹中有300多个文件,每个文件包含以下表格的3000多个词典:
{"vol":0.625,"view100":7732,"view50":7732,"view0":7732,"mView100":7732,"mView50":7732,"mView0":7732,"posTop":0,"posBottom":768,"posRight":1024,"posLeft":0,"audio":7732,"inView":1.0,"dur":15070,"full":true,"play":7732,"platform":"ias_i2","timestamp":1519693191,"gmMeasure":true,"gmm":4,"gdr":1,"impId":1861913361,"advId":13505389,"campId":2214346458,"grpId":4532473096,"creativeId":138222749951,"skip":false,"event":"fully_viewable_audible_half_duration_impression","auc":"r","pos":2,"ua":"com.google.ios.youtube/13.06.9 (iPad6,11; U; CPU iOS 11_2_1 like Mac OS X;en_US)","ip":"96.3.52.188","time":1519693200574,"sourceId":2,"channel":"tab","appServerName":"pm01.dal.303net.pvt","doNotTrack":false,"s2s":0}
{"vol":1.0,"view100":8055,"view50":8055,"view0":8055,"mView100":8055,"mView50":8055,"mView0":8055,"posTop":0,"posBottom":360,"posRight":640,"posLeft":0,"audio":8055,"inView":1.0,"dur":15000,"full":false,"play":8055,"platform":"ias_a2","timestamp":1519693191282,"gmMeasure":true,"gmm":4,"gdr":1,"impId":1087849849,"advId":13505389,"campId":2214346458,"grpId":4532473093,"creativeId":138222749951,"skip":false,"event":"fully_viewable_audible_half_duration_impression","auc":"r","pos":1,"ua":"com.google.android.youtube/13.05.52(Linux; U; Android 7.1.1; en_US; SM-J320V Build/NMF26X) gzip","ip":"50.80.2.228","time":1519693200589,"sourceId":2,"channel":"mob","appServerName":"pm01.dal.303net.pvt","doNotTrack":false,"s2s":0}
I need to extract a specific K,V pair lets say and store the entire dictionary containing that pair in a txt file in python.
我需要提取一个特定的K,V对,然后将包含该对的整个字典存储在python的txt文件中。
Here is what I have tried:
这是我尝试过的:
people = [ {'name': "Tom", 'age': 10}, {'name': "Mark", 'age': 5}, {'name': "Pam", 'age': 7} ]
def search(name):
for p in people:
if p['name'] == name:
return p search("Pam")
Is there a simple way to do it?
有一个简单的方法吗?
1 个解决方案
#1
0
There are 2 distinct problems here:
这里有两个不同的问题:
- how to process 300+ text files containing json string (and not dictionnaries)
- how to identify the dictionary(ies?) containing a specific key, value
如何处理包含json字符串(而不是字典)的300多个文本文件
如何识别包含特定键值的字典(ies?)
The fileinput
module could solve first part, the json module could transform each line in a Python dict, and you already have the code to search a key, value pair in a dict.
fileinput模块可以解决第一部分,json模块可以转换Python dict中的每一行,并且你已经有了代码来搜索dict中的键值对。
So assuming that filelist
contains the paths for relevant files (glob
module could help in building it...):
所以假设filelist包含相关文件的路径(glob模块可以帮助构建它...):
for line in fileinput.fileinput(filelist):
if len(line.strip()) != 0: # skip eventual empty lines...
cur = json.loads(line)
if cur[key] == value:
# line contains the text for the dictionary, cur contains the dictionary itself
# fileinput.filename() contains the name of the file
# fileinput.filelineno() is the current line
# for example
print("Found", key, "->", value, "in", fileinput.filename(),
"at line", fileinput.filelineno(), ":\n", line)
#1
0
There are 2 distinct problems here:
这里有两个不同的问题:
- how to process 300+ text files containing json string (and not dictionnaries)
- how to identify the dictionary(ies?) containing a specific key, value
如何处理包含json字符串(而不是字典)的300多个文本文件
如何识别包含特定键值的字典(ies?)
The fileinput
module could solve first part, the json module could transform each line in a Python dict, and you already have the code to search a key, value pair in a dict.
fileinput模块可以解决第一部分,json模块可以转换Python dict中的每一行,并且你已经有了代码来搜索dict中的键值对。
So assuming that filelist
contains the paths for relevant files (glob
module could help in building it...):
所以假设filelist包含相关文件的路径(glob模块可以帮助构建它...):
for line in fileinput.fileinput(filelist):
if len(line.strip()) != 0: # skip eventual empty lines...
cur = json.loads(line)
if cur[key] == value:
# line contains the text for the dictionary, cur contains the dictionary itself
# fileinput.filename() contains the name of the file
# fileinput.filelineno() is the current line
# for example
print("Found", key, "->", value, "in", fileinput.filename(),
"at line", fileinput.filelineno(), ":\n", line)