I created a script to process each item in a 3-column excel file exported as a .txt file into 3 lists (1 list for each column). There are 22 lines in the .txt file, including the header. With these 3 lists, I'm trying to create a nested dictionary where each column is a key, a key within a value, or a value within a value (ie: {Tag1:{Tag2:Tag3}...} for however many items there are in the lists.
我创建了一个脚本来处理作为.txt文件导出的3列excel文件中的每个项目到3个列表(每列1个列表)。 .txt文件中有22行,包括标题。通过这3个列表,我正在尝试创建一个嵌套字典,其中每列是键,值中的键或值中的值(即:{Tag1:{Tag2:Tag3} ...}但是列表中有很多项目。
When I zip these lists into a nested dictionary, it truncates the list and zips only 19 items into the dictionary, not 22. Could someone troubleshoot my code and see what the dictionary is doing to my list?
当我将这些列表压缩到嵌套字典中时,它会截断列表并仅将19个项目压缩到字典中,而不是22个。有人可以解决我的代码并查看字典对我的列表做了什么吗?
Here's the .txt file for reference:
这是.txt文件供参考:
Here's my script:
这是我的脚本:
import glob
source_file = glob.glob('file_path/test.txt')[0]
time = []
code = []
identifier = []
data_set = {}
for line in open (source_file,'r'):
line_split = line.split('\t')
tag_3 = line_split[-1].replace('\n','')
tag_2 = line_split[1]
tag_1 = line_split[0]
time.append(tag_3)
code.append(tag_2)
identifier.append(tag_1)
data_set = {a:{b:c} for a,b,c in zip(identifier, code, time)}
EDIT: here's a link to a downloadable version to the file: https://drive.google.com/file/d/0B2s43FKt5BZgQldULXVOR0RBeTg/view?usp=sharing
编辑:这是指向该文件的可下载版本的链接:https://drive.google.com/file/d/0B2s43FKt5BZgQldULXVOR0RBeTg/view?usp =sharing
EDIT 2: This should be the desired output:
编辑2:这应该是所需的输出:
data_set = {
'Tag1':{'Tag2':'Tag3'},
'0.1M':{'20':'10'},
'0.1MCD':{'2':'1'},
'0.25M':{'17':'1'},
'0.25MC':{'18':'1'},
'0.5MCN':{'16':'1'},
'0.MCD8':{'15':'1'},
'10':{'36':'5'},
'1029':{'75':'17'},
'1029A':{'22':'15'},
'1029B':{'49':'18'},
'1029BCD':{'23':'15'},
'1029BCDA':{'27':'18'},
'109B8N':{'63':'10'},
'1193D4M':{'51':'16'},
'1193D4N':{'2':'11'},
'1193D8M':{'17':'16'},
'11938N':{'25':'12'},
'1193CD4M':{'53':'16'},
'1193CD4N':{'83':'13'},
'118M':{'20':'16'},
'1193BCN':{'16':'7'},
}
EDIT 3: It turns out the dictionary truncates the value if there are duplicate values in the lists. Is there anyway to avoid this?
编辑3:如果列表中存在重复值,则字典会截断该值。反正有没有避免这个?
3 个解决方案
#1
0
You can try this:
你可以试试这个:
s = [b for b in [i.strip('\n').split() for i in open('file.txt')] if b]
final_data = {a:{b:c} for a, b, c in s}
Output:
{'1029BCDA': {'27': '18'}, '1029BCD': {'23': '15'}, '0.25M': {'17': '1'}, '118M': {'20': '16'}, '0.1M': {'20': '10'}, '11934D4N': {'83': '13'}, '0.5MCD8': {'15': '1'}, '1193D8M': {'17': '16'}, '1193CD4M': {'53': '16'}, '109B8N': {'63': '10'}, '10': {'36': '5'}, '1193D4M': {'51': '16'}, '1193D4N': {'2': '11'}, '0.1MCD': {'2': '1'}, '1193BCN': {'16': '7'}, '0.25MC': {'18': '1'}, '11938N': {'25': '12'}, '0.5MCN': {'16': '1'}, 'Tag1': {'Tag2': 'Tag3'}, '1029': {'75': '17'}, '1029A': {'22': '15'}, '1029B': {'49': '18'}}
Edit: using a collections.defaultdict
for correct handling of duplicate values:
编辑:使用collections.defaultdict正确处理重复值:
from collections import defaultdict
d = defaultdict(list)
for a, b, c in s:
d[a].append({b:c})
#2
0
Dictionary can't have duplicate keys. What you can do is to define dictionary to hold a list of values. In your case change {Tag1:{Tag2:Tag3}} to be {Tag1:[{Tag2:Tag3}]}
字典不能有重复的键。你可以做的是定义字典来保存值列表。在您的情况下,将{Tag1:{Tag2:Tag3}}更改为{Tag1:[{Tag2:Tag3}]}
#3
0
What you want is called a “bag”. Try the collections.Counter class.
你想要的是一个“袋子”。试试collections.Counter类。
#1
0
You can try this:
你可以试试这个:
s = [b for b in [i.strip('\n').split() for i in open('file.txt')] if b]
final_data = {a:{b:c} for a, b, c in s}
Output:
{'1029BCDA': {'27': '18'}, '1029BCD': {'23': '15'}, '0.25M': {'17': '1'}, '118M': {'20': '16'}, '0.1M': {'20': '10'}, '11934D4N': {'83': '13'}, '0.5MCD8': {'15': '1'}, '1193D8M': {'17': '16'}, '1193CD4M': {'53': '16'}, '109B8N': {'63': '10'}, '10': {'36': '5'}, '1193D4M': {'51': '16'}, '1193D4N': {'2': '11'}, '0.1MCD': {'2': '1'}, '1193BCN': {'16': '7'}, '0.25MC': {'18': '1'}, '11938N': {'25': '12'}, '0.5MCN': {'16': '1'}, 'Tag1': {'Tag2': 'Tag3'}, '1029': {'75': '17'}, '1029A': {'22': '15'}, '1029B': {'49': '18'}}
Edit: using a collections.defaultdict
for correct handling of duplicate values:
编辑:使用collections.defaultdict正确处理重复值:
from collections import defaultdict
d = defaultdict(list)
for a, b, c in s:
d[a].append({b:c})
#2
0
Dictionary can't have duplicate keys. What you can do is to define dictionary to hold a list of values. In your case change {Tag1:{Tag2:Tag3}} to be {Tag1:[{Tag2:Tag3}]}
字典不能有重复的键。你可以做的是定义字典来保存值列表。在您的情况下,将{Tag1:{Tag2:Tag3}}更改为{Tag1:[{Tag2:Tag3}]}
#3
0
What you want is called a “bag”. Try the collections.Counter class.
你想要的是一个“袋子”。试试collections.Counter类。