迭代大型列表时Python很慢

时间:2021-02-26 22:24:28

I am currently selecting a large list of rows from a database using pyodbc. The result is then copied to a large list, and then i am trying to iterate over the list. Before I abandon python, and try to create this in C#, I wanted to know if there was something I was doing wrong.

我目前正在使用pyodbc从数据库中选择一个大的行列表。然后将结果复制到一个大型列表,然后我尝试迭代列表。在我放弃python并尝试在C#中创建它之前,我想知道是否有什么我做错了。

clientItems.execute("Select ids from largetable where year =?", year);
allIDRows = clientItemsCursor.fetchall() #takes maybe 8 seconds.

for clientItemrow in allIDRows:
    aID = str(clientItemRow[0])
    # Do something with str -- Removed because I was trying to determine what was slow
    count = count+1

Some more information:

更多信息:

  • The for loop is currently running at about 5 loops per second, and that seems insanely slow to me.
  • for循环当前以每秒约5个循环运行,这对我来说似乎非常慢。
  • The total rows selected is ~489,000.
  • 选择的总行数为~489,000。
  • The machine its running on has lots of RAM and CPU. It seems to only run one or two cores, and ram is 1.72GB of 4gb.
  • 它运行的机器有很多RAM和CPU。它似乎只运行一个或两个核心,而ram是1.72GB的4GB。

Can anyone tell me whats wrong? Do scripts just run this slow?

谁能告诉我什么错了?脚本运行这么慢吗?

Thanks

谢谢

5 个解决方案

#1


17  

This should not be slow with Python native lists - but maybe ODBC's driver is returning a "lazy" object that tries to be smart but just gets slow. Try just doing

使用Python本机列表时,这应该不会很慢 - 但是ODBC的驱动程序可能会返回一个“懒惰”对象,该对象试图变得聪明但速度变慢。试试吧

allIDRows = list(clientItemsCursor.fetchall())

allIDRows = list(clientItemsCursor.fetchall())

in your code and post further benchmarks.

在您的代码中并发布进一步的基准测试。

(Python lists can get slow if you start inserting things in its middle, but just iterating over a large list should be fast)

(如果你开始在中间插入东西,Python列表可能会变慢,但只是在大型列表上迭代应该很快)

#2


1  

It's probably slow because you load all result in memory first and performing the iteration over a list. Try iterating the cursor instead.

它可能很慢,因为您首先将所有结果加载到内存中并在列表上执行迭代。尝试迭代光标。

And no, scripts shouldn't be that slow.

不,脚本不应该那么慢。

clientItemsCursor.execute("Select ids from largetable where year =?", year);
for clientItemrow in clientItemsCursor:
    aID = str(clientItemrow[0])
    count = count + 1

#3


1  

More investigation is needed here... consider the following script:

这里需要更多的调查......考虑以下脚本:

bigList = range(500000)
doSomething = ""
arrayList = [[x] for x in bigList]  # takes a few seconds
for x in arrayList:
    doSomething += str(x[0])
    count+=1

This is pretty much the same as your script, minus the database stuff, and takes a few seconds to run on my not-terribly-fast machine.

这与您的脚本几乎相同,减去数据库的内容,并且需要几秒钟才能在我非常快的机器上运行。

#4


0  

When you connect to your database directly (I mean you get an SQL prompt), how many secods runs this query?

当您直接连接到数据库时(我的意思是你得到一个SQL提示),有多少个secod运行这个查询?

When query ends, you get a message like this:

查询结束时,您会收到如下消息:

NNNNN rows in set (0.01 sec)

So, if that time is so big, and your query is slow as "native", may be you have to create an index on that table.

所以,如果那个时间太大,而你的查询速度很慢,那么你可能需要在该表上创建一个索引。

#5


0  

This is slow because you are

这很慢,因为你是

  1. Getting all the results
  2. 获得所有结果
  3. Allocating memory and assigning the values to that memory to create the list allIDRows
  4. 分配内存并将值分配给该内存以创建列表allIDRows
  5. Iterating over that list and counting.
  6. 迭代该列表并计算。

If execute gives you back a cursor then use the cursor to it's advantage and start counting as you get stuff back and save time on the mem allocation.

如果执行给你一个光标,那么使用光标有利于它,并在你收回东西时开始计数并节省内存分配。

clientItemsCursor.execute("Select ids from largetable where year =?", year);
for clientItemrow in clientItemsCursor:
   count +=1

Other hints:

其他提示:

  • create an index on year
  • 创建年度指数
  • use 'select count(*) from ... to get the count for the year' this will probably be optimised on the db.
  • 使用'select count(*)from ...来获取年份的计数'这可能会在db上进行优化。
  • Remove the aID line if not needed this is converting the first item of the row to a string even though its not used.
  • 如果不需要,删除aID行即使没有使用,也会将行的第一项转换为字符串。

#1


17  

This should not be slow with Python native lists - but maybe ODBC's driver is returning a "lazy" object that tries to be smart but just gets slow. Try just doing

使用Python本机列表时,这应该不会很慢 - 但是ODBC的驱动程序可能会返回一个“懒惰”对象,该对象试图变得聪明但速度变慢。试试吧

allIDRows = list(clientItemsCursor.fetchall())

allIDRows = list(clientItemsCursor.fetchall())

in your code and post further benchmarks.

在您的代码中并发布进一步的基准测试。

(Python lists can get slow if you start inserting things in its middle, but just iterating over a large list should be fast)

(如果你开始在中间插入东西,Python列表可能会变慢,但只是在大型列表上迭代应该很快)

#2


1  

It's probably slow because you load all result in memory first and performing the iteration over a list. Try iterating the cursor instead.

它可能很慢,因为您首先将所有结果加载到内存中并在列表上执行迭代。尝试迭代光标。

And no, scripts shouldn't be that slow.

不,脚本不应该那么慢。

clientItemsCursor.execute("Select ids from largetable where year =?", year);
for clientItemrow in clientItemsCursor:
    aID = str(clientItemrow[0])
    count = count + 1

#3


1  

More investigation is needed here... consider the following script:

这里需要更多的调查......考虑以下脚本:

bigList = range(500000)
doSomething = ""
arrayList = [[x] for x in bigList]  # takes a few seconds
for x in arrayList:
    doSomething += str(x[0])
    count+=1

This is pretty much the same as your script, minus the database stuff, and takes a few seconds to run on my not-terribly-fast machine.

这与您的脚本几乎相同,减去数据库的内容,并且需要几秒钟才能在我非常快的机器上运行。

#4


0  

When you connect to your database directly (I mean you get an SQL prompt), how many secods runs this query?

当您直接连接到数据库时(我的意思是你得到一个SQL提示),有多少个secod运行这个查询?

When query ends, you get a message like this:

查询结束时,您会收到如下消息:

NNNNN rows in set (0.01 sec)

So, if that time is so big, and your query is slow as "native", may be you have to create an index on that table.

所以,如果那个时间太大,而你的查询速度很慢,那么你可能需要在该表上创建一个索引。

#5


0  

This is slow because you are

这很慢,因为你是

  1. Getting all the results
  2. 获得所有结果
  3. Allocating memory and assigning the values to that memory to create the list allIDRows
  4. 分配内存并将值分配给该内存以创建列表allIDRows
  5. Iterating over that list and counting.
  6. 迭代该列表并计算。

If execute gives you back a cursor then use the cursor to it's advantage and start counting as you get stuff back and save time on the mem allocation.

如果执行给你一个光标,那么使用光标有利于它,并在你收回东西时开始计数并节省内存分配。

clientItemsCursor.execute("Select ids from largetable where year =?", year);
for clientItemrow in clientItemsCursor:
   count +=1

Other hints:

其他提示:

  • create an index on year
  • 创建年度指数
  • use 'select count(*) from ... to get the count for the year' this will probably be optimised on the db.
  • 使用'select count(*)from ...来获取年份的计数'这可能会在db上进行优化。
  • Remove the aID line if not needed this is converting the first item of the row to a string even though its not used.
  • 如果不需要,删除aID行即使没有使用,也会将行的第一项转换为字符串。