如何使用pandas将文件中的值添加到字典中

I have a text file containing integers, e.g.

我有一个包含整数的文本文件，例如

123
456
678

I want do read them and put them in a dict, so I later can easily see if an integer was present, e.g.

我想读取它们并将它们放入dict中，所以我稍后可以很容易地看出是否存在整数，例如：

{456: True, 123: True, 678: True}

What is the most efficient way to achieve this? I am open to not using dict, if there is some other way I can easily lookup values quickly.

实现这一目标的最有效方法是什么？我愿意不使用dict，如果有其他方法我可以很容易地快速查找值。

At the moment I am using pandas like this:

目前我正在使用这样的熊猫：

    df = pd.read_csv(filename, header=None, compression='zip')

    mydict={}

    for index, row in df.iterrows():
        mydict[row[0]] = True

which works, but since the file contain 20 million integers, it takes a while to load it into the dictionary.

哪个有效，但由于该文件包含2000万个整数，因此需要一段时间才能将其加载到字典中。

3 个解决方案

#1

Well this is not a CSV file, so I don't see why you want to parse it as a CSV.

那么这不是一个CSV文件，所以我不明白你为什么要把它解析为CSV。

You can use dictionary comprehension here:

你可以在这里使用字典理解：

with open(filename) as f:
    mydict = {int(l): True for l in f}

#2

A set might be the most convenient data type here:

一个集合可能是这里最方便的数据类型：

myset = set(int(line.strip()) for line in open(filename))

And test if an integer was in the file using in:

并测试文件中是否包含整数：

>>> 123 in myset
Out[]: True

#3

Option 1

You can add a column to the dataframe that has True in all rows, then use zip to generate a dictionary as follows:

您可以向所有行中包含True的数据框添加一列，然后使用zip生成字典，如下所示：

df = pd.read_csv(filename, header=None, compression='zip')
df[1] = True
d = {k: v for k,v in zip(df[0], df[1])}

Option 2

As you are open to suggestions other than using a dictionary, if you already have the dataframe loaded, you can use it to check if an integer is there as follows:

由于您对使用字典以外的建议持开放态度，如果您已经加载了数据框，则可以使用它来检查是否存在整数，如下所示：

>>> df = pd.DataFrame([123,456,678]) 
>>> df
     0
0  123
1  456
2  678
>>> df.values == 123 
array([[ True],
       [False],
       [False]], dtype=bool)
>>> (df.values == 123).any() 
True
>>>

Then in your conditional logic, you can do something like the following:

然后在条件逻辑中，您可以执行以下操作：

if (df.values == 123).any():  # if 123 is in the dataframe 
   # do something

#1

Well this is not a CSV file, so I don't see why you want to parse it as a CSV.

那么这不是一个CSV文件，所以我不明白你为什么要把它解析为CSV。

You can use dictionary comprehension here:

你可以在这里使用字典理解：

with open(filename) as f:
    mydict = {int(l): True for l in f}

#2

A set might be the most convenient data type here:

一个集合可能是这里最方便的数据类型：

myset = set(int(line.strip()) for line in open(filename))

And test if an integer was in the file using in:

并测试文件中是否包含整数：

>>> 123 in myset
Out[]: True

#3

Option 1

You can add a column to the dataframe that has True in all rows, then use zip to generate a dictionary as follows:

您可以向所有行中包含True的数据框添加一列，然后使用zip生成字典，如下所示：

df = pd.read_csv(filename, header=None, compression='zip')
df[1] = True
d = {k: v for k,v in zip(df[0], df[1])}

Option 2

As you are open to suggestions other than using a dictionary, if you already have the dataframe loaded, you can use it to check if an integer is there as follows:

由于您对使用字典以外的建议持开放态度，如果您已经加载了数据框，则可以使用它来检查是否存在整数，如下所示：

>>> df = pd.DataFrame([123,456,678]) 
>>> df
     0
0  123
1  456
2  678
>>> df.values == 123 
array([[ True],
       [False],
       [False]], dtype=bool)
>>> (df.values == 123).any() 
True
>>>

Then in your conditional logic, you can do something like the following:

然后在条件逻辑中，您可以执行以下操作：

if (df.values == 123).any():  # if 123 is in the dataframe 
   # do something

秒客网

如何使用pandas将文件中的值添加到字典中

3 个解决方案

#1

#2

#3

Option 1

Option 2

#1

#2

#3

Option 1

Option 2

相关文章