使用Python从多个文本文件中的多个字典中提取键值对

时间:2023-01-25 18:17:44

I have 300+ files in a folder each containing 3000+ dictionaries of the form below:

我在一个文件夹中有300多个文件,每个文件包含以下表格的3000多个词典:

{"vol":0.625,"view100":7732,"view50":7732,"view0":7732,"mView100":7732,"mView50":7732,"mView0":7732,"posTop":0,"posBottom":768,"posRight":1024,"posLeft":0,"audio":7732,"inView":1.0,"dur":15070,"full":true,"play":7732,"platform":"ias_i2","timestamp":1519693191,"gmMeasure":true,"gmm":4,"gdr":1,"impId":1861913361,"advId":13505389,"campId":2214346458,"grpId":4532473096,"creativeId":138222749951,"skip":false,"event":"fully_viewable_audible_half_duration_impression","auc":"r","pos":2,"ua":"com.google.ios.youtube/13.06.9 (iPad6,11; U; CPU iOS 11_2_1 like Mac OS X;en_US)","ip":"96.3.52.188","time":1519693200574,"sourceId":2,"channel":"tab","appServerName":"pm01.dal.303net.pvt","doNotTrack":false,"s2s":0}

{"vol":1.0,"view100":8055,"view50":8055,"view0":8055,"mView100":8055,"mView50":8055,"mView0":8055,"posTop":0,"posBottom":360,"posRight":640,"posLeft":0,"audio":8055,"inView":1.0,"dur":15000,"full":false,"play":8055,"platform":"ias_a2","timestamp":1519693191282,"gmMeasure":true,"gmm":4,"gdr":1,"impId":1087849849,"advId":13505389,"campId":2214346458,"grpId":4532473093,"creativeId":138222749951,"skip":false,"event":"fully_viewable_audible_half_duration_impression","auc":"r","pos":1,"ua":"com.google.android.youtube/13.05.52(Linux; U; Android 7.1.1; en_US; SM-J320V Build/NMF26X) gzip","ip":"50.80.2.228","time":1519693200589,"sourceId":2,"channel":"mob","appServerName":"pm01.dal.303net.pvt","doNotTrack":false,"s2s":0}

I need to extract a specific K,V pair lets say and store the entire dictionary containing that pair in a txt file in python.

我需要提取一个特定的K,V对,然后将包含该对的整个字典存储在python的txt文件中。

Here is what I have tried:

这是我尝试过的:

people = [ {'name': "Tom", 'age': 10}, {'name': "Mark", 'age': 5}, {'name': "Pam", 'age': 7} ] 

def search(name):
    for p in people:
        if p['name'] == name:
            return p search("Pam")

Is there a simple way to do it?

有一个简单的方法吗?

1 个解决方案

#1


0  

There are 2 distinct problems here:

这里有两个不同的问题:

  • how to process 300+ text files containing json string (and not dictionnaries)
  • 如何处理包含json字符串(而不是字典)的300多个文本文件

  • how to identify the dictionary(ies?) containing a specific key, value
  • 如何识别包含特定键值的字典(ies?)

The fileinput module could solve first part, the json module could transform each line in a Python dict, and you already have the code to search a key, value pair in a dict.

fileinput模块可以解决第一部分,json模块可以转换Python dict中的每一行,并且你已经有了代码来搜索dict中的键值对。

So assuming that filelist contains the paths for relevant files (glob module could help in building it...):

所以假设filelist包含相关文件的路径(glob模块可以帮助构建它...):

for line in fileinput.fileinput(filelist):
    if len(line.strip()) != 0:       # skip eventual empty lines...
    cur = json.loads(line)
    if cur[key] == value:
        # line contains the text for the dictionary, cur contains the dictionary itself
        # fileinput.filename() contains the name of the file
        # fileinput.filelineno() is the current line
        # for example
        print("Found", key, "->", value, "in", fileinput.filename(),
            "at line", fileinput.filelineno(), ":\n", line)

#1


0  

There are 2 distinct problems here:

这里有两个不同的问题:

  • how to process 300+ text files containing json string (and not dictionnaries)
  • 如何处理包含json字符串(而不是字典)的300多个文本文件

  • how to identify the dictionary(ies?) containing a specific key, value
  • 如何识别包含特定键值的字典(ies?)

The fileinput module could solve first part, the json module could transform each line in a Python dict, and you already have the code to search a key, value pair in a dict.

fileinput模块可以解决第一部分,json模块可以转换Python dict中的每一行,并且你已经有了代码来搜索dict中的键值对。

So assuming that filelist contains the paths for relevant files (glob module could help in building it...):

所以假设filelist包含相关文件的路径(glob模块可以帮助构建它...):

for line in fileinput.fileinput(filelist):
    if len(line.strip()) != 0:       # skip eventual empty lines...
    cur = json.loads(line)
    if cur[key] == value:
        # line contains the text for the dictionary, cur contains the dictionary itself
        # fileinput.filename() contains the name of the file
        # fileinput.filelineno() is the current line
        # for example
        print("Found", key, "->", value, "in", fileinput.filename(),
            "at line", fileinput.filelineno(), ":\n", line)