I have a table of 1.6M IP ranges with organization names. The IP addresses are converted to integers. The table is in the form of:
我有一个包含1.6M个IP范围和组织名称的表。IP地址被转换成整数。该表的形式为:
I have a list of 2000 unique ip addresses (e.g. 321223, 531223, ....) that need to be translated to an organization name.
我有2000个不同的ip地址列表(如321223年,321223年,....)需要翻译一个组织的名字。
I loaded the translation table as a mysql table with an index on IP_from and IP_to. I looped through the 2000 IP addresses, running one query per ip address, and after 15 minutes the report was still running. The query I'm using is
我将翻译表作为mysql表加载,其中包含IP_from和IP_to的索引。我浏览了2000个IP地址,每个IP地址运行一个查询,15分钟后报告仍在运行。我使用的查询是
select organization from iptable where ip_addr BETWEEN ip_start AND ip_end
Is there a more efficient way to do this batch look-up? I'll use my fingers if it's a good solution. And in case someone has a Ruby-specific solution, I want to mention that I'm using Ruby.
是否有更有效的方法来进行批查找?如果这是个好办法,我就用我的手指。如果有人有特定于Ruby的解决方案,我想说我正在使用Ruby。
2 个解决方案
#1
5
Given that you already have an index on ip_start
, this is how to use it best, assuming that you want to make one access per IP (1234
in this example):
假设您已经有了一个关于ip_start的索引,这就是如何最好地使用它的方法,假设您希望对每个IP进行一次访问(本例中为1234):
select organization from (
select ip_end, organization
from iptable
where ip_start <= 1234
order by ip_start desc
limit 1
) subqry where 1234 <= ip_end
This will use your index to start a scan which stops immediately because of the limit 1
. The cost should only be marginally higher than the one of a simple indexed access. Of course, this technique relies on the fact that the ranges defined by ip_start
and ip_end
never overlap.
这将使用索引启动扫描,该扫描由于限制1而立即停止。成本只应该略高于简单的索引访问。当然,这种技术依赖于ip_start和ip_end定义的范围从不重叠这一事实。
The problem with your original approach is that mysql, being unaware of this constraint, can only use the index to determine where to start or stop the scan that (it thinks) it needs in order to find all matches for your query.
您的原始方法的问题是,mysql不知道这个约束,只能使用索引来决定从哪里开始或停止扫描(它认为),以便为您的查询找到所有匹配。
#2
0
Possibly the most efficient way of doing a lookup of this kind is loading the list of addresses you want to look up into a temporary table in the database and finding the intersection with an SQL join, rather than checking each address with a separate SQL statement.
进行此类查找的最有效方法可能是,将希望查找的地址列表加载到数据库中的临时表中,并找到与SQL连接的交集,而不是使用单独的SQL语句检查每个地址。
In any case you'll need to have an index on (IP_from, IP_to).
在任何情况下,都需要有一个on (IP_from, IP_to)的索引。
#1
5
Given that you already have an index on ip_start
, this is how to use it best, assuming that you want to make one access per IP (1234
in this example):
假设您已经有了一个关于ip_start的索引,这就是如何最好地使用它的方法,假设您希望对每个IP进行一次访问(本例中为1234):
select organization from (
select ip_end, organization
from iptable
where ip_start <= 1234
order by ip_start desc
limit 1
) subqry where 1234 <= ip_end
This will use your index to start a scan which stops immediately because of the limit 1
. The cost should only be marginally higher than the one of a simple indexed access. Of course, this technique relies on the fact that the ranges defined by ip_start
and ip_end
never overlap.
这将使用索引启动扫描,该扫描由于限制1而立即停止。成本只应该略高于简单的索引访问。当然,这种技术依赖于ip_start和ip_end定义的范围从不重叠这一事实。
The problem with your original approach is that mysql, being unaware of this constraint, can only use the index to determine where to start or stop the scan that (it thinks) it needs in order to find all matches for your query.
您的原始方法的问题是,mysql不知道这个约束,只能使用索引来决定从哪里开始或停止扫描(它认为),以便为您的查询找到所有匹配。
#2
0
Possibly the most efficient way of doing a lookup of this kind is loading the list of addresses you want to look up into a temporary table in the database and finding the intersection with an SQL join, rather than checking each address with a separate SQL statement.
进行此类查找的最有效方法可能是,将希望查找的地址列表加载到数据库中的临时表中,并找到与SQL连接的交集,而不是使用单独的SQL语句检查每个地址。
In any case you'll need to have an index on (IP_from, IP_to).
在任何情况下,都需要有一个on (IP_from, IP_to)的索引。