加快这段代码 - PHP / SQL - 短代码,但目前需要很长时间

时间:2021-07-22 02:43:59

Right, this code goes through a rather large multidimensional array (has about 28,000 rows and 16 parts).

对,这段代码通过一个相当大的多维数组(有大约28,000行和16个部分)。

Order of events:

活动顺序:

  1. Check if the data exists in the database
  2. 检查数据库中是否存在数据

  3. if it exists - Update it with the new data
  4. 如果存在 - 使用新数据更新它

  5. if it doesn't exist - Insert it
  6. 如果它不存在 - 插入它

Simple.

But right now to go through this it has taken over 30min i think and Still going.

但是现在要经历这个过程,它已经花费了30多分钟,我认为并且还在继续。

$counter = 0;
$events = sizeof($feed_array['1'])-1;
while($counter <= $events ) {

    $num_rows = mysql_num_rows(mysql_query("SELECT * FROM it_raw WHERE perfID = '".addslashes($feed_array['16'][$counter])."'")); 
    if($num_rows) {
        $eventDate=explode("/", $feed_array['1'][$counter]); //print_r($eventDate);
        $eventTime=explode(":", $feed_array['2'][$counter]); //print_r($eventTime);     
        $eventUnixTime=mktime($eventTime[0], $eventTime[1], "00", $eventDate[1], $eventDate[0], $eventDate[2]);

        mysql_query("UPDATE `it_raw` SET  
                    `eventtime` =       '".$eventUnixTime."',
                    `eventname` =       '".addslashes($feed_array['3'][$counter])."',
                    `venuename` =       '".addslashes($feed_array['4'][$counter])."',
                    `venueregion` =     '".addslashes($feed_array['5'][$counter])."',
                    `venuepostcode` =   '".addslashes($feed_array['6'][$counter])."',
                    `country` =         '".addslashes($feed_array['7'][$counter])."',
                    `minprice` =        '".addslashes($feed_array['8'][$counter])."',
                    `available` =       '".addslashes($feed_array['9'][$counter])."',
                    `link` =            '".addslashes($feed_array['10'][$counter])."',
                    `eventtype` =       '".addslashes($feed_array['11'][$counter])."',
                    `seaOnSaleDate` =   '".addslashes($feed_array['12'][$counter])."',
                    `perOnSaleDate` =   '".addslashes($feed_array['13'][$counter])."',
                    `soldOut` =         '".addslashes($feed_array['14'][$counter])."',
                    `eventImageURL` =   '".addslashes($feed_array['15'][$counter])."',
                    `perfID`=           '".addslashes($feed_array['16'][$counter])."'
                    WHERE  `perfID` = ".$feed_array['16'][$counter]." LIMIT 1 ;");
        echo "UPDATE ".$feed_array['16'][$counter].": ".addslashes($feed_array['3'][$counter])."\n";
    } else {
        $eventDate=explode("/", $feed_array['1'][$counter]); //print_r($eventDate);
        $eventTime=explode(":", $feed_array['2'][$counter]); //print_r($eventTime);     
        $eventUnixTime=mktime($eventTime[0], $eventTime[1], "00", $eventDate[1], $eventDate[0], $eventDate[2]);
        $sql = "INSERT INTO  `dante_tickets`.`it_raw` (
                `id` ,
                `eventtime` ,
                `eventname` ,
                `venuename` ,
                `venueregion` ,
                `venuepostcode` ,
                `country` ,
                `minprice` ,
                `available` ,
                `link` ,
                `eventtype` ,
                `seaOnSaleDate` ,
                `perOnSaleDate` ,
                `soldOut` ,
                `eventImageURL` ,
                `perfID`
                )
                VALUES (
                    NULL ,  
                    '".$eventUnixTime."',  
                    '".addslashes($feed_array['3'][$counter])."',  
                    '".addslashes($feed_array['4'][$counter])."',  
                    '".addslashes($feed_array['5'][$counter])."',  
                    '".addslashes($feed_array['6'][$counter])."',  
                    '".addslashes($feed_array['7'][$counter])."',  
                    '".addslashes($feed_array['8'][$counter])."',  
                    '".addslashes($feed_array['9'][$counter])."',  
                    '".addslashes($feed_array['10'][$counter])."',  
                    '".addslashes($feed_array['11'][$counter])."',  
                    '".addslashes($feed_array['12'][$counter])."',  
                    '".addslashes($feed_array['13'][$counter])."',  
                    '".addslashes($feed_array['14'][$counter])."',  
                    '".addslashes($feed_array['15'][$counter])."',  
                    '".addslashes($feed_array['16'][$counter])."'
                );";

        mysql_query($sql) or die(mysql_error().":".$sql);
        echo "Inserted ".$feed_array['16'][$counter].": ".addslashes($feed_array['3'][$counter])."\n";
    }
    unset($sql);
    $counter++;     
}

UPDATE

I just carried out profiling one one of the rows:

我刚刚对其中一行进行了剖析:

mysql> EXPLAIN SELECT * FROM it_raw WHERE perfID = 210968;
+----+-------------+--------+------+---------------+--------+---------+-------+------+-------+
| id | select_type | table  | type | possible_keys | key    | key_len | ref   | rows | Extra |
+----+-------------+--------+------+---------------+--------+---------+-------+------+-------+
|  1 | SIMPLE      | it_raw | ref  | perfID        | perfID | 4       | const |    1 |       | 
+----+-------------+--------+------+---------------+--------+---------+-------+------+-------+
1 row in set (0.07 sec)

UPDATE 2

To try and "speed" things up, instead of carrying out the UPDATE and INSERT statements straight away, i've now placed them in a variable (so only the initial select runs - to check a duplicate - then stores the action [insert or update]). At the end of the loop it executes all the statments.

为了尝试“加速”事情,而不是立即执行UPDATE和INSERT语句,我现在将它们放在一个变量中(所以只有初始选择运行 - 检查重复 - 然后存储操作[插入或更新])。在循环结束时,它执行所有的状态。

Except now, it's coming up with MySQL error that the syntax is incorrect. (when initially there was nothing wrong).

除了现在,它出现MySQL错误,语法不正确。 (最初没有什么不对的)。

I've simply replaced the mysql_query with:

我只是将mysql_query替换为:

$sql_exec .= "SELECT.... ;";

$ sql_exec。=“SELECT ....;”;

is there something i'm missing here for the formatting?

我在这里缺少格式化的东西吗?

UPDATE 3 OK finally fixed it Lessons Learned:

更新3确定最后修正了经验教训:

  1. Do logic search first on database
  2. 首先在数据库上进行逻辑搜索

  3. Carry out insert/updates in bulk.
  4. 批量执行插入/更新。

Here is the final code which now takes about 60 seconds to run (from over 30min+)

这是最终的代码,现在需要大约60秒才能运行(超过30分钟+)

while($counter <= $events ) {

        $num_rows = mysql_num_rows(mysql_query("SELECT * FROM it_raw WHERE perfID = '".addslashes($feed_array['16'][$counter])."'")); 
        if($num_rows) {
            $eventDate=explode("/", $feed_array['1'][$counter]); //print_r($eventDate);
            $eventTime=explode(":", $feed_array['2'][$counter]); //print_r($eventTime);     
            $eventUnixTime=mktime($eventTime[0], $eventTime[1], "00", $eventDate[1], $eventDate[0], $eventDate[2]);

            $sql_exec[] =   "UPDATE `it_raw` SET `eventtime` = '".$eventUnixTime."',`eventname` = '".addslashes($feed_array['3'][$counter])."',`venuename` = '".addslashes($feed_array['4'][$counter])."',`venueregion` = '".addslashes($feed_array['5'][$counter])."',`venuepostcode` = '".addslashes($feed_array['6'][$counter])."',`country` = '".addslashes($feed_array['7'][$counter])."',`minprice` = '".addslashes($feed_array['8'][$counter])."',`available` = '".addslashes($feed_array['9'][$counter])."',`link` = '".addslashes($feed_array['10'][$counter])."',`eventtype` = '".addslashes($feed_array['11'][$counter])."',`seaOnSaleDate` = '".addslashes($feed_array['12'][$counter])."',`perOnSaleDate` = '".addslashes($feed_array['13'][$counter])."',`soldOut` =  '".addslashes($feed_array['14'][$counter])."',`eventImageURL` =   '".addslashes($feed_array['15'][$counter])."',`perfID`='".addslashes($feed_array['16'][$counter])."' WHERE `perfID` = ".$feed_array['16'][$counter]." LIMIT 1;";
            echo "UPDATE ".$feed_array['16'][$counter].": ".addslashes($feed_array['3'][$counter])."\n";
        } else {
            $eventDate=explode("/", $feed_array['1'][$counter]); //print_r($eventDate);
            $eventTime=explode(":", $feed_array['2'][$counter]); //print_r($eventTime);     
            $eventUnixTime=mktime($eventTime[0], $eventTime[1], "00", $eventDate[1], $eventDate[0], $eventDate[2]);
            $sql_exec[] = "INSERT INTO  `it_raw` (`id` ,`eventtime` ,`eventname` ,`venuename` ,`venueregion` ,`venuepostcode` ,`country` ,`minprice` ,`available` ,`link` ,`eventtype` ,`seaOnSaleDate` ,
                    `perOnSaleDate` ,`soldOut` ,`eventImageURL` ,`perfID`) VALUES ( NULL ,'".$eventUnixTime."','".addslashes($feed_array['3'][$counter])."','".addslashes($feed_array['4'][$counter])."','".addslashes($feed_array['5'][$counter])."','".addslashes($feed_array['6'][$counter])."','".addslashes($feed_array['7'][$counter])."','".addslashes($feed_array['8'][$counter])."','".addslashes($feed_array['9'][$counter])."','".addslashes($feed_array['10'][$counter])."','".addslashes($feed_array['11'][$counter])."','".addslashes($feed_array['12'][$counter])."','".addslashes($feed_array['13'][$counter])."','".addslashes($feed_array['14'][$counter])."','".addslashes($feed_array['15'][$counter])."','".addslashes($feed_array['16'][$counter])."');";

            //mysql_query($sql) or die(mysql_error().":".$sql);
            echo "Inserted ".$feed_array['16'][$counter].": ".addslashes($feed_array['3'][$counter])."\n";
        }
        unset($sql);
        $counter++;     
    }

    foreach($sql_exec as $value) {
        mysql_query($value) or die (mysql_error().": ".$value); 
    }

4 个解决方案

#1


You could try grouping inserts and updates into groups so the code runs less queries.

您可以尝试将插入和更新分组到组中,以便代码运行较少的查询。

For example, you could group all of the inserts into one very large insert, or maybe group every 100 inserts.

例如,您可以将所有插入分组到一个非常大的插入中,或者可以将每100个插入分组。

Also using prepared statements as gradbot suggested may help.

同样使用预备语句作为gradbot建议可能会有所帮助。

Other than that, it's not very easy to say which part of it is the major contributor to slowness. You should use a profiler to determine that, for example by using a smaller dataset so the profiled script runs in a reasonable time.

除此之外,说它的哪一部分是造成缓慢的主要原因并不容易。您应该使用分析器来确定,例如通过使用较小的数据集,以便配置文件脚本在合理的时间内运行。

#2


You can do a number of things.

你可以做很多事情。

Try This. I'm unable to test it but the syntax should be right.

试试这个。我无法测试它,但语法应该是正确的。

$counter = 0;
$events = sizeof($feed_array['1']) - 1;
while($counter <= $events )
{
  $eventDate = explode("/", $feed_array['1'][$counter]); //print_r($eventDate);
  $eventTime = explode(":", $feed_array['2'][$counter]); //print_r($eventTime);             
  $eventUnixTime = mktime($eventTime[0], $eventTime[1], "00", $eventDate[1], $eventDate[0], $eventDate[2]);

  $data = array(
    'eventtime'     => $eventUnixTime,
    'eventname'     => addslashes($feed_array['3'][$counter]),
    'venuename'     => addslashes($feed_array['4'][$counter]),
    'venueregion'   => addslashes($feed_array['5'][$counter]),
    'venuepostcode' => addslashes($feed_array['6'][$counter]),
    'country'       => addslashes($feed_array['7'][$counter]),
    'minprice'      => addslashes($feed_array['8'][$counter]),
    'available'     => addslashes($feed_array['9'][$counter]),
    'link'          => addslashes($feed_array['10'][$counter]),
    'eventtype'     => addslashes($feed_array['11'][$counter]),
    'seaOnSaleDate' => addslashes($feed_array['12'][$counter]),
    'perOnSaleDate' => addslashes($feed_array['13'][$counter]),
    'soldOut'       => addslashes($feed_array['14'][$counter]),
    'eventImageURL' => addslashes($feed_array['15'][$counter]),
    'perfID'        => addslashes($feed_array['16'][$counter]),
  );

  $update = array();
  foreach ($data as $key => $value)
    $update[] = "`$key` = '$value'";

  $sql = "INSERT INTO `dante_tickets`.`it_raw`" .
    '(`id`, `'. implode ('`,`', array_keys($data)) . '`) VALUES ' .
    '(NULL, ' . implode (',', $data) . ') ON DUPLICATE KEY UPDATE ' . 
    implode (',', $update);

  mysql_query($sql) or die(mysql_error().":".$sql);
  echo "Inserted or Updated".$feed_array['16'][$counter].": ".addslashes($feed_array['3'][$counter])."\n";

  unset($sql);
  $counter++;     
}

I forgot to mention this requires that perfID is a unique key.

我忘了提到这要求perfID是一个唯一的密钥。

#3


  1. Have you profiled this query?

    你有没有想过这个查询?

    "SELECT * FROM it_raw WHERE perfID = '".addslashes($feed_array['16'][$counter])."'"

    “SELECT * FROM it_raw WHERE perfID ='”。adddslashes($ feed_array ['16'] [$ counter])。“'”

    Because you run it 28000 times.. so unless it is REALLY quick, it will cause you a head ache. Use the EXPLAIN syntax and add an appropriate index if needed.

    因为你跑28000次..所以除非它真的很快,否则会让你头痛。使用EXPLAIN语法并根据需要添加适当的索引。

    EDIT: With profile, I mean that you should try to use EXPLAIN on the mysql-prompt to see what execution plan the MySQL Query Optimizer suggests for this query. I.e, on the prompt, run:

    编辑:使用配置文件,我的意思是你应该尝试在mysql提示符上使用EXPLAIN来查看MySQL查询优化器为此查询建议的执行计划。即,在提示符下运行:

    EXPLAIN SELECT * FROM it_raw WHERE perfID = 426; 
    # Change this id to something existing and valid
    

    What you want to see is that it is using an index, and ONLY an index. If you don't understand the output, copy and paste it here so can I go through it with you.

    你想看到的是它正在使用一个索引,而且只是一个索引。如果您不理解输出,请将其复制并粘贴到此处,以便我可以随身携带。

    UPDATE: As you can see, it takes 0.07 seconds for EVERY row in your data array, plus the time to actually query the database, transfer the result etc. This is roughly 28000 * 0.07 = 1960 seconds, or 32 minutes, just to check if the data exists or not. You need to come up with another way of checking if the data already exists... One, very simple optimization might be:

    更新:正如您所看到的,数据数组中的每一行需要0.07秒,加上实际查询数据库,传输结果等的时间。这大约是28000 * 0.07 = 1960秒,或者32分钟,只是为了检查如果数据存在与否。你需要提出另一种方法来检查数据是否已经存在......一个非常简单的优化可能是:

    EXPLAIN SELECT perfId FROM it_raw WHERE perfID = 210968;
    

    This way, you can use the index on perfId and don't need to visit the table

    这样,您可以在perfId上使用索引,而不需要访问该表

  2. If it is possible, try to avoid quering the database for each run in the loop. Perhaps it is possible to fetch the ids from the database into a big array of ids that will fit in PHP memory? This will be MUCH quicker than quering the database for every row in your big data array.

    如果可能,请尝试避免在循环中为每次运行查询数据库。也许有可能将数据库中的id提取到适合PHP内存的大量id中?这比为大数据阵列中的每一行查询数据库要快得多。

#4


This is exactly the scenario for which prepared statements were made:

这正是准备好的陈述的情景:

$prepared_statement =
    $DB->prepare('INSERT INTO table(column, column) VALUES(?, ?)');
loop {
    $prepared_statement->execute(array('value1', 'value2');
}

It's implemented in the MySQLi and PDO wrappers. It only compiles the query once and automagically sanitizes the given data, saving time (both developmental and executional) and headache.

它是在MySQLi和PDO包装器中实现的。它只编译一次查询并自动清理给定的数据,节省时间(包括开发和执行)和头痛。

#1


You could try grouping inserts and updates into groups so the code runs less queries.

您可以尝试将插入和更新分组到组中,以便代码运行较少的查询。

For example, you could group all of the inserts into one very large insert, or maybe group every 100 inserts.

例如,您可以将所有插入分组到一个非常大的插入中,或者可以将每100个插入分组。

Also using prepared statements as gradbot suggested may help.

同样使用预备语句作为gradbot建议可能会有所帮助。

Other than that, it's not very easy to say which part of it is the major contributor to slowness. You should use a profiler to determine that, for example by using a smaller dataset so the profiled script runs in a reasonable time.

除此之外,说它的哪一部分是造成缓慢的主要原因并不容易。您应该使用分析器来确定,例如通过使用较小的数据集,以便配置文件脚本在合理的时间内运行。

#2


You can do a number of things.

你可以做很多事情。

Try This. I'm unable to test it but the syntax should be right.

试试这个。我无法测试它,但语法应该是正确的。

$counter = 0;
$events = sizeof($feed_array['1']) - 1;
while($counter <= $events )
{
  $eventDate = explode("/", $feed_array['1'][$counter]); //print_r($eventDate);
  $eventTime = explode(":", $feed_array['2'][$counter]); //print_r($eventTime);             
  $eventUnixTime = mktime($eventTime[0], $eventTime[1], "00", $eventDate[1], $eventDate[0], $eventDate[2]);

  $data = array(
    'eventtime'     => $eventUnixTime,
    'eventname'     => addslashes($feed_array['3'][$counter]),
    'venuename'     => addslashes($feed_array['4'][$counter]),
    'venueregion'   => addslashes($feed_array['5'][$counter]),
    'venuepostcode' => addslashes($feed_array['6'][$counter]),
    'country'       => addslashes($feed_array['7'][$counter]),
    'minprice'      => addslashes($feed_array['8'][$counter]),
    'available'     => addslashes($feed_array['9'][$counter]),
    'link'          => addslashes($feed_array['10'][$counter]),
    'eventtype'     => addslashes($feed_array['11'][$counter]),
    'seaOnSaleDate' => addslashes($feed_array['12'][$counter]),
    'perOnSaleDate' => addslashes($feed_array['13'][$counter]),
    'soldOut'       => addslashes($feed_array['14'][$counter]),
    'eventImageURL' => addslashes($feed_array['15'][$counter]),
    'perfID'        => addslashes($feed_array['16'][$counter]),
  );

  $update = array();
  foreach ($data as $key => $value)
    $update[] = "`$key` = '$value'";

  $sql = "INSERT INTO `dante_tickets`.`it_raw`" .
    '(`id`, `'. implode ('`,`', array_keys($data)) . '`) VALUES ' .
    '(NULL, ' . implode (',', $data) . ') ON DUPLICATE KEY UPDATE ' . 
    implode (',', $update);

  mysql_query($sql) or die(mysql_error().":".$sql);
  echo "Inserted or Updated".$feed_array['16'][$counter].": ".addslashes($feed_array['3'][$counter])."\n";

  unset($sql);
  $counter++;     
}

I forgot to mention this requires that perfID is a unique key.

我忘了提到这要求perfID是一个唯一的密钥。

#3


  1. Have you profiled this query?

    你有没有想过这个查询?

    "SELECT * FROM it_raw WHERE perfID = '".addslashes($feed_array['16'][$counter])."'"

    “SELECT * FROM it_raw WHERE perfID ='”。adddslashes($ feed_array ['16'] [$ counter])。“'”

    Because you run it 28000 times.. so unless it is REALLY quick, it will cause you a head ache. Use the EXPLAIN syntax and add an appropriate index if needed.

    因为你跑28000次..所以除非它真的很快,否则会让你头痛。使用EXPLAIN语法并根据需要添加适当的索引。

    EDIT: With profile, I mean that you should try to use EXPLAIN on the mysql-prompt to see what execution plan the MySQL Query Optimizer suggests for this query. I.e, on the prompt, run:

    编辑:使用配置文件,我的意思是你应该尝试在mysql提示符上使用EXPLAIN来查看MySQL查询优化器为此查询建议的执行计划。即,在提示符下运行:

    EXPLAIN SELECT * FROM it_raw WHERE perfID = 426; 
    # Change this id to something existing and valid
    

    What you want to see is that it is using an index, and ONLY an index. If you don't understand the output, copy and paste it here so can I go through it with you.

    你想看到的是它正在使用一个索引,而且只是一个索引。如果您不理解输出,请将其复制并粘贴到此处,以便我可以随身携带。

    UPDATE: As you can see, it takes 0.07 seconds for EVERY row in your data array, plus the time to actually query the database, transfer the result etc. This is roughly 28000 * 0.07 = 1960 seconds, or 32 minutes, just to check if the data exists or not. You need to come up with another way of checking if the data already exists... One, very simple optimization might be:

    更新:正如您所看到的,数据数组中的每一行需要0.07秒,加上实际查询数据库,传输结果等的时间。这大约是28000 * 0.07 = 1960秒,或者32分钟,只是为了检查如果数据存在与否。你需要提出另一种方法来检查数据是否已经存在......一个非常简单的优化可能是:

    EXPLAIN SELECT perfId FROM it_raw WHERE perfID = 210968;
    

    This way, you can use the index on perfId and don't need to visit the table

    这样,您可以在perfId上使用索引,而不需要访问该表

  2. If it is possible, try to avoid quering the database for each run in the loop. Perhaps it is possible to fetch the ids from the database into a big array of ids that will fit in PHP memory? This will be MUCH quicker than quering the database for every row in your big data array.

    如果可能,请尝试避免在循环中为每次运行查询数据库。也许有可能将数据库中的id提取到适合PHP内存的大量id中?这比为大数据阵列中的每一行查询数据库要快得多。

#4


This is exactly the scenario for which prepared statements were made:

这正是准备好的陈述的情景:

$prepared_statement =
    $DB->prepare('INSERT INTO table(column, column) VALUES(?, ?)');
loop {
    $prepared_statement->execute(array('value1', 'value2');
}

It's implemented in the MySQLi and PDO wrappers. It only compiles the query once and automagically sanitizes the given data, saving time (both developmental and executional) and headache.

它是在MySQLi和PDO包装器中实现的。它只编译一次查询并自动清理给定的数据,节省时间(包括开发和执行)和头痛。