I have a text file containing integers, e.g.
我有一个包含整数的文本文件,例如
123
456
678
I want do read them and put them in a dict, so I later can easily see if an integer was present, e.g.
我想读取它们并将它们放入dict中,所以我稍后可以很容易地看出是否存在整数,例如:
{456: True, 123: True, 678: True}
What is the most efficient way to achieve this? I am open to not using dict, if there is some other way I can easily lookup values quickly.
实现这一目标的最有效方法是什么?我愿意不使用dict,如果有其他方法我可以很容易地快速查找值。
At the moment I am using pandas like this:
目前我正在使用这样的熊猫:
df = pd.read_csv(filename, header=None, compression='zip')
mydict={}
for index, row in df.iterrows():
mydict[row[0]] = True
which works, but since the file contain 20 million integers, it takes a while to load it into the dictionary.
哪个有效,但由于该文件包含2000万个整数,因此需要一段时间才能将其加载到字典中。
3 个解决方案
#1
3
Well this is not a CSV file, so I don't see why you want to parse it as a CSV.
那么这不是一个CSV文件,所以我不明白你为什么要把它解析为CSV。
You can use dictionary comprehension here:
你可以在这里使用字典理解:
with open(filename) as f:
mydict = {int(l): True for l in f}
#2
3
A set might be the most convenient data type here:
一个集合可能是这里最方便的数据类型:
myset = set(int(line.strip()) for line in open(filename))
And test if an integer was in the file using in
:
并测试文件中是否包含整数:
>>> 123 in myset
Out[]: True
#3
1
Option 1
You can add a column to the dataframe that has True in all rows, then use zip to generate a dictionary as follows:
您可以向所有行中包含True的数据框添加一列,然后使用zip生成字典,如下所示:
df = pd.read_csv(filename, header=None, compression='zip')
df[1] = True
d = {k: v for k,v in zip(df[0], df[1])}
Option 2
As you are open to suggestions other than using a dictionary, if you already have the dataframe loaded, you can use it to check if an integer is there as follows:
由于您对使用字典以外的建议持开放态度,如果您已经加载了数据框,则可以使用它来检查是否存在整数,如下所示:
>>> df = pd.DataFrame([123,456,678])
>>> df
0
0 123
1 456
2 678
>>> df.values == 123
array([[ True],
[False],
[False]], dtype=bool)
>>> (df.values == 123).any()
True
>>>
Then in your conditional logic, you can do something like the following:
然后在条件逻辑中,您可以执行以下操作:
if (df.values == 123).any(): # if 123 is in the dataframe
# do something
#1
3
Well this is not a CSV file, so I don't see why you want to parse it as a CSV.
那么这不是一个CSV文件,所以我不明白你为什么要把它解析为CSV。
You can use dictionary comprehension here:
你可以在这里使用字典理解:
with open(filename) as f:
mydict = {int(l): True for l in f}
#2
3
A set might be the most convenient data type here:
一个集合可能是这里最方便的数据类型:
myset = set(int(line.strip()) for line in open(filename))
And test if an integer was in the file using in
:
并测试文件中是否包含整数:
>>> 123 in myset
Out[]: True
#3
1
Option 1
You can add a column to the dataframe that has True in all rows, then use zip to generate a dictionary as follows:
您可以向所有行中包含True的数据框添加一列,然后使用zip生成字典,如下所示:
df = pd.read_csv(filename, header=None, compression='zip')
df[1] = True
d = {k: v for k,v in zip(df[0], df[1])}
Option 2
As you are open to suggestions other than using a dictionary, if you already have the dataframe loaded, you can use it to check if an integer is there as follows:
由于您对使用字典以外的建议持开放态度,如果您已经加载了数据框,则可以使用它来检查是否存在整数,如下所示:
>>> df = pd.DataFrame([123,456,678])
>>> df
0
0 123
1 456
2 678
>>> df.values == 123
array([[ True],
[False],
[False]], dtype=bool)
>>> (df.values == 123).any()
True
>>>
Then in your conditional logic, you can do something like the following:
然后在条件逻辑中,您可以执行以下操作:
if (df.values == 123).any(): # if 123 is in the dataframe
# do something