在python中处理字典的最佳方法是什么?

时间:2022-02-22 23:13:51

I need to run a script on a lot of files. I'm trying to build a library of data so I won't have to redo the computations again. Right now I'm using json dump to output the results of each file as a txt containing a dictionary as follows:

我需要在很多文件上运行一个脚本。我正在尝试构建一个数据库,所以我不必再次重做计算。现在我正在使用json dump将每个文件的结果输出为包含字典的txt,如下所示:

{"ARG": [98.1704330444336, 41.769107818603516, 73.10748291015625, 45.386558532714844, 66.13928985595703, 170.6997833251953, 181.3068084716797, 163.4752960205078, 105.4854507446289], "LEU": [28.727693557739258, 37.46043014526367, 13.47089672088623, 53.70556640625, 4.947306156158447, 0.17834201455116272], "ASP": [], "THR": [82.61577606201172, 66.58378601074219], "ILE": [114.99510192871094, 0.0, 41.7198600769043], "CYS": [], "LYS": [132.67730712890625, 34.025794982910156, 116.17617797851562, 95.01632690429688], "PHE": [2.027207136154175, 14.673666000366211, 33.46115493774414], "VAL": [], "SER": [87.324462890625, 100.39542388916016, 20.75590705871582, 49.42512893676758], "ASN": [115.7877197265625, 68.15550994873047, 79.04554748535156, 62.12760543823242], "MET": [], "TRP": [5.433267593383789], "GLN": [103.35163879394531, 12.17470932006836, 83.19425201416016, 81.73150634765625, 31.622051239013672], "PRO": [116.5839614868164], "TYR": [143.76821899414062], "GLU": [32.767948150634766, 112.40697479248047, 151.73361206054688, 53.77445602416992, 137.96853637695312, 137.53512573242188], "ALA": [81.7466812133789, 59.530941009521484, 30.13962173461914, 88.2237319946289], "GLY": [68.45809936523438], "HIS": []}

I can reload the dictionary again with json load. I'm trying to know what the best way to handle my data is, knowing that I will be using all these txt files to join them into one huge dictionary. The keys will be the same in all dictionaries. I will try to append all these "list" values into one big list as value for each key. I will do some mathematical operations, addition, division, draw histograms, clustering,..etc.

我可以用json load重新加载字典。我试图知道处理我的数据的最佳方法是什么,知道我将使用所有这些txt文件将它们连接成一个巨大的字典。所有词典中的键都是相同的。我将尝试将所有这些“列表”值附加到一个大列表中作为每个键的值。我会做一些数学运算,加法,除法,绘制直方图,聚类等等。

I want to know how you would do it, and if what I described above is going to be inefficient or computationally expensive giving that the data will be huge.

我想知道你将如何做到这一点,如果我上面所描述的将是效率低下或计算成本高,因为数据将是巨大的。

1 个解决方案

#1


0  

As always it depends. If you are sure that there will be a lot of data, you can consider using pandas library for python (http://pandas.pydata.org/).

一如既往地取决于。如果您确定会有大量数据,可以考虑使用pandas库进行python(http://pandas.pydata.org/)。

It is very powerful data analysis library and it enables you to do additions, divisions, histograms etc. directly on it's data types. I found it very helpful and easy to use when solving issues similar (I believe) to yours.

它是一个非常强大的数据分析库,它使您可以直接在其数据类型上执行添加,除法,直方图等。我发现在解决与您类似的问题(我相信)时,它非常有用且易于使用。

If you go with this solution you can use pandas' DataFrame objects (instead of pythons dict) to store data and do all mentioned operations on this object.

如果你使用这个解决方案,你可以使用pandas的DataFrame对象(而不是pythons dict)来存储数据并对这个对象进行所有提到的操作。

Pandas data types also have a nice interfacec for writing to/reading from files (i.e. DataFrame.to_json(...))

Pandas数据类型也有一个很好的接口,用于写入/读取文件(即DataFrame.to_json(...))

#1


0  

As always it depends. If you are sure that there will be a lot of data, you can consider using pandas library for python (http://pandas.pydata.org/).

一如既往地取决于。如果您确定会有大量数据,可以考虑使用pandas库进行python(http://pandas.pydata.org/)。

It is very powerful data analysis library and it enables you to do additions, divisions, histograms etc. directly on it's data types. I found it very helpful and easy to use when solving issues similar (I believe) to yours.

它是一个非常强大的数据分析库,它使您可以直接在其数据类型上执行添加,除法,直方图等。我发现在解决与您类似的问题(我相信)时,它非常有用且易于使用。

If you go with this solution you can use pandas' DataFrame objects (instead of pythons dict) to store data and do all mentioned operations on this object.

如果你使用这个解决方案,你可以使用pandas的DataFrame对象(而不是pythons dict)来存储数据并对这个对象进行所有提到的操作。

Pandas data types also have a nice interfacec for writing to/reading from files (i.e. DataFrame.to_json(...))

Pandas数据类型也有一个很好的接口,用于写入/读取文件(即DataFrame.to_json(...))