在PHP中遍历大型数据数组。

时间:2021-02-17 21:38:07

I have an array with 100,000 users personal info in (ID, name, email etc). I need to loop through each row of the array and insert a mysql record to a table based on the row data. My problem is that I am running out of memory after about 70,000 rows.

我有一个10万用户个人信息的数组(ID,姓名,电子邮件等)。我需要遍历数组的每一行,并基于行数据向表插入一个mysql记录。我的问题是,在大约70000行之后,内存将耗尽。

My code:

我的代码:

if(!empty($users)){
    $c = 0;
        foreach($users as $user){

            $message = // Some code to create custom email
            queue_mail_to_send($user->user_email, $subject, $message, $db_options, $mail_options, $mail_queue);
        }
}

Background:

背景:

I am building an email system which sends out an email to the users of my site. The code above is looping through the array of users and executing the function 'queue_mail_to_send' which inserts a mysql row into a email queue table. (I am using a PEAR library to stagger the email sending)

我正在构建一个电子邮件系统,向我的站点的用户发送电子邮件。上面的代码循环遍历用户数组并执行“queue_mail_to_send”函数,该函数将mysql行插入到电子邮件队列表中。(我正在使用一个PEAR库来错开邮件发送)

Question:

问题:

I know that I am simply exhausting the memory here by trying to do too much in one execution. So does anybody know a better approach to this rather than trying to execute everything in one big loop?

我知道,我只是试图在一次执行中做太多事情,从而耗尽了这里的记忆。有没有人知道更好的方法,而不是试图在一个大的循环中执行所有的事情?

Thanks

谢谢

5 个解决方案

#1


3  

I think reducing the payload of the script will be cumbersome and will not give you a satisfying result. If you have any possibility to do so, I would advise you to log which rows you have processed already, and have a script run the next x rows. If you can use a cronjob, you can stage a mail, and let the cronjob add mails to the queue every 5 minutes, until all users are processed.

我认为减少脚本的有效负载会很麻烦,不会给您满意的结果。如果您有可能这样做,我建议您记录您已经处理过的行,并让一个脚本运行下一个x行。如果可以使用cronjob,那么可以设置一个邮件,让cronjob每5分钟向队列添加一次邮件,直到所有用户都被处理完毕。

The easiest way would be to store somewhere, the highest user id you have processed. I would not advise you to store the number of users, because in between batches a user can be added or removed, resulting in users not receiving the e-mail. But if you order by user id (assuming you use an auto-incrementing column for the id!), you can be sure every user gets processed.

最简单的方法是存储您处理过的最高用户id。我不建议您存储用户的数量,因为在不同批次之间可以添加或删除用户,导致用户无法接收电子邮件。但是,如果您按用户id排序(假设您对id使用自动递增的列!),您可以确保每个用户都得到了处理。

So your user query would be something like:

所以你的用户查询会是这样的:

SELECT * FROM users WHERE user_id > [highest_processed_user_id] ORDER BY user_id LIMIT 1000

Then process your loop, and store the last user id:

然后处理您的循环,并存储最后一个用户id:

if(!empty($users)) {
    $last_processed_id = null;
    foreach($users as $user) {
        $message = // Message creation magic
        queue_mail_to_send( /** parameters **/ );
        $last_processed_id = $user->id;
    }

    // batch done! store processed user id
    $query = 'UPDATE mail_table SET last_processed_user_id = '. $last_processed_id; // please use parameterized statements here
    // execute the query
}

And on the next execution, do it again until all users have received the mail.

在下一次执行时,再次执行,直到所有用户收到邮件。

#2


1  

I have exactly same problem with you. Anyway the answer from @giorgio is the best solutions.

我和你有同样的问题。无论如何,来自@giorgio的最佳解决方案。

But like java or python, we have "yield" in php. @see [here] (http://php.net/manual/en/language.generators.syntax.php)

但是和java或python一样,php中也有“yield”。@see[这](http://php.net/manual/en/language.generators.syntax.php)

Here is my sample code, my case is 50.000 records. and I also test successfully with 370.000 records. But it takes times.

这是我的样本代码,我的案例是50000条记录。我还成功地测试了370.000条记录。但这需要时间。

$items = CustomerService::findAll();
        foreach ($items AS $item)
        {
            yield (new self())->loadFromResource($item);
        }

#3


0  

You may split that operation in multiple operations, seperated in time. For instance, only allow your routine to process 40 emails per minute, or maybe use an array of an array, to create "pages" of records (use sql LIMIT function). And set the arrays of array to null and unset it, when you no longer need that information.

您可以在多个操作中分割该操作,并及时分离。例如,只允许您的例程每分钟处理40封电子邮件,或者使用数组来创建记录的“页面”(使用sql LIMIT函数)。并将数组设置为null并在不再需要该信息时取消它。

#4


-1  

I think you can use MySQL IN clause rather then doing foreach for every user.

我认为你可以在子句中使用MySQL,而不是每个用户都使用。

Like user_ids = array (1,2,3,4); // Do something WHERE user_id IN ($user_ids);

比如user_ids =数组(1,2,3,4);//做一些user_id ($user_ids);

and of sending mails you can user PHPMailer class by supplying comma separated email addresses in $to.

发送邮件时,您可以通过提供逗号分隔的电子邮件地址($to)来使用PHPMailer类。

#5


-1  

USE just one query like:

只使用一个查询,如:

INSERT INTO table_name (COL1, Col2,...) SELECT COL1, COL2 FROM other_table;

#1


3  

I think reducing the payload of the script will be cumbersome and will not give you a satisfying result. If you have any possibility to do so, I would advise you to log which rows you have processed already, and have a script run the next x rows. If you can use a cronjob, you can stage a mail, and let the cronjob add mails to the queue every 5 minutes, until all users are processed.

我认为减少脚本的有效负载会很麻烦,不会给您满意的结果。如果您有可能这样做,我建议您记录您已经处理过的行,并让一个脚本运行下一个x行。如果可以使用cronjob,那么可以设置一个邮件,让cronjob每5分钟向队列添加一次邮件,直到所有用户都被处理完毕。

The easiest way would be to store somewhere, the highest user id you have processed. I would not advise you to store the number of users, because in between batches a user can be added or removed, resulting in users not receiving the e-mail. But if you order by user id (assuming you use an auto-incrementing column for the id!), you can be sure every user gets processed.

最简单的方法是存储您处理过的最高用户id。我不建议您存储用户的数量,因为在不同批次之间可以添加或删除用户,导致用户无法接收电子邮件。但是,如果您按用户id排序(假设您对id使用自动递增的列!),您可以确保每个用户都得到了处理。

So your user query would be something like:

所以你的用户查询会是这样的:

SELECT * FROM users WHERE user_id > [highest_processed_user_id] ORDER BY user_id LIMIT 1000

Then process your loop, and store the last user id:

然后处理您的循环,并存储最后一个用户id:

if(!empty($users)) {
    $last_processed_id = null;
    foreach($users as $user) {
        $message = // Message creation magic
        queue_mail_to_send( /** parameters **/ );
        $last_processed_id = $user->id;
    }

    // batch done! store processed user id
    $query = 'UPDATE mail_table SET last_processed_user_id = '. $last_processed_id; // please use parameterized statements here
    // execute the query
}

And on the next execution, do it again until all users have received the mail.

在下一次执行时,再次执行,直到所有用户收到邮件。

#2


1  

I have exactly same problem with you. Anyway the answer from @giorgio is the best solutions.

我和你有同样的问题。无论如何,来自@giorgio的最佳解决方案。

But like java or python, we have "yield" in php. @see [here] (http://php.net/manual/en/language.generators.syntax.php)

但是和java或python一样,php中也有“yield”。@see[这](http://php.net/manual/en/language.generators.syntax.php)

Here is my sample code, my case is 50.000 records. and I also test successfully with 370.000 records. But it takes times.

这是我的样本代码,我的案例是50000条记录。我还成功地测试了370.000条记录。但这需要时间。

$items = CustomerService::findAll();
        foreach ($items AS $item)
        {
            yield (new self())->loadFromResource($item);
        }

#3


0  

You may split that operation in multiple operations, seperated in time. For instance, only allow your routine to process 40 emails per minute, or maybe use an array of an array, to create "pages" of records (use sql LIMIT function). And set the arrays of array to null and unset it, when you no longer need that information.

您可以在多个操作中分割该操作,并及时分离。例如,只允许您的例程每分钟处理40封电子邮件,或者使用数组来创建记录的“页面”(使用sql LIMIT函数)。并将数组设置为null并在不再需要该信息时取消它。

#4


-1  

I think you can use MySQL IN clause rather then doing foreach for every user.

我认为你可以在子句中使用MySQL,而不是每个用户都使用。

Like user_ids = array (1,2,3,4); // Do something WHERE user_id IN ($user_ids);

比如user_ids =数组(1,2,3,4);//做一些user_id ($user_ids);

and of sending mails you can user PHPMailer class by supplying comma separated email addresses in $to.

发送邮件时,您可以通过提供逗号分隔的电子邮件地址($to)来使用PHPMailer类。

#5


-1  

USE just one query like:

只使用一个查询,如:

INSERT INTO table_name (COL1, Col2,...) SELECT COL1, COL2 FROM other_table;