如何优化这个DB操作?

时间:2023-01-19 09:17:30

I'm quite sloppy with databases, can't get this working with joins, and I'm not even sure that would be faster...

我对数据库非常草率,不能让它与连接一起工作,我甚至不确定这会更快……

DELETE FROM atable 
WHERE  btable_id IN (SELECT id 
                     FROM   btable 
                     WHERE  param > 2) 
       AND ctable_id IN (SELECT id 
                         FROM   ctable 
                         WHERE  ( someblob LIKE '%_ID1_%' 
                                  OR someblob LIKE '%_ID2_%' )) 

Atable contains ~19M rows, this would delete ~3M of that. At the moment, I can only run the query with LIMIT 100000, and I don't want to sit here with phpmyadmin all day, because each deletion (of 100.000 rows) runs for about 1.5 mins.

Atable包含~19M行,这将删除~3M行。目前,我只能运行限制为100000的查询,我不想整天坐在phpmyadmin中,因为每次删除(100.000行)都要运行大约1.5分钟。

Any ways to speed this up / automate it?

有什么方法可以加快这个过程?

MySQL 5.5

MySQL 5.5

(do you think it's already bad DB design if any table contains 20M rows?)

(如果任何表包含20M行,您认为DB设计已经很糟糕了吗?)

4 个解决方案

#1


2  

Use EXISTS or JOIN instead of IN to improve perfromance

使用存在或连接而不是使用IN来改善不一致

Using EXISTS:

使用存在:

DELETE FROM Atable A 
WHERE EXISTS (SELECT 1 FROM Btable B WHERE A.Btable_id = B.id AND B.param > 2) AND 
      EXISTS (SELECT 1 FROM Ctable C WHERE A.Ctable_id = C.id AND (C.someblob LIKE '%_ID1_%' OR C.someblob LIKE '%_ID2_%'))

Using JOIN:

使用连接:

DELETE A 
FROM Atable A 
INNER JOIN Btable B ON A.Btable_id = B.id AND B.param > 2
INNER JOIN Ctable C WHERE A.Ctable_id = C.id AND (C.someblob LIKE '%_ID1_%' OR C.someblob LIKE '%_ID2_%')

#2


1  

first you should try with exist instead of in. it's faster in many many case.

首先,你应该尝试使用exist而不是in。在很多情况下它都更快。

Then you could try to do inner join instead of in and exists.

然后您可以尝试执行内部连接,而不是in和exist。

Example :

例子:

delete a 
from a 
inner join b on b.id = a.tablebid

And finally if it could be possible (i don't know if you have id3, ids) to change the or by something else. Sometimes strange and complicated change helps the optimizer. case when, subquery...

最后,如果可能的话(我不知道你是否有id3, id)改变or。有时奇怪而复杂的变化可以帮助优化器。情况下,子查询…

#3


1  

I don't see where a simple index would help much. I'd do:

我不认为一个简单的指数会有多大帮助。我想做的事:

delete from atable where id in (
    select
        id
    from
        atable a
        join btable b on a.btable_id = b.id
        join ctable c on a.ctable_id = c.id
    where
        b.param > 2
        and (
            c.someblob LIKE '%_ID1_%' 
            OR c.someblob LIKE '%_ID2_%'
        )
)

Correction: I'm assuming you've got indexes on btable and ctable's id's (probably, if they're primary keys...) and on b.param (if it's numeric).

更正:我假设在btable和ctable的id(如果它们是主键…)和b上都有索引。参数(如果它的数字)。

#4


1  

Beside optimizing the query you could also take a look at a good use of indexes, since they might prevent a full table scan.

除了优化查询之外,还可以查看索引的良好使用情况,因为它们可能会防止全表扫描。

For BTable for example create an index on id and param.

例如,对于BTable,在id和参数上创建索引。

To explain why this helps: If the database has to look up the id and param values in the table in a unsorted manner, the database has to read ALL rows. If the database reads the index, SORTED, it can look up the id and param with reduced costs.

要解释这为什么有用:如果数据库必须以未排序的方式查找表中的id和参数值,那么数据库必须读取所有行。如果数据库读取索引,排序,它可以查找id和param以降低成本。

#1


2  

Use EXISTS or JOIN instead of IN to improve perfromance

使用存在或连接而不是使用IN来改善不一致

Using EXISTS:

使用存在:

DELETE FROM Atable A 
WHERE EXISTS (SELECT 1 FROM Btable B WHERE A.Btable_id = B.id AND B.param > 2) AND 
      EXISTS (SELECT 1 FROM Ctable C WHERE A.Ctable_id = C.id AND (C.someblob LIKE '%_ID1_%' OR C.someblob LIKE '%_ID2_%'))

Using JOIN:

使用连接:

DELETE A 
FROM Atable A 
INNER JOIN Btable B ON A.Btable_id = B.id AND B.param > 2
INNER JOIN Ctable C WHERE A.Ctable_id = C.id AND (C.someblob LIKE '%_ID1_%' OR C.someblob LIKE '%_ID2_%')

#2


1  

first you should try with exist instead of in. it's faster in many many case.

首先,你应该尝试使用exist而不是in。在很多情况下它都更快。

Then you could try to do inner join instead of in and exists.

然后您可以尝试执行内部连接,而不是in和exist。

Example :

例子:

delete a 
from a 
inner join b on b.id = a.tablebid

And finally if it could be possible (i don't know if you have id3, ids) to change the or by something else. Sometimes strange and complicated change helps the optimizer. case when, subquery...

最后,如果可能的话(我不知道你是否有id3, id)改变or。有时奇怪而复杂的变化可以帮助优化器。情况下,子查询…

#3


1  

I don't see where a simple index would help much. I'd do:

我不认为一个简单的指数会有多大帮助。我想做的事:

delete from atable where id in (
    select
        id
    from
        atable a
        join btable b on a.btable_id = b.id
        join ctable c on a.ctable_id = c.id
    where
        b.param > 2
        and (
            c.someblob LIKE '%_ID1_%' 
            OR c.someblob LIKE '%_ID2_%'
        )
)

Correction: I'm assuming you've got indexes on btable and ctable's id's (probably, if they're primary keys...) and on b.param (if it's numeric).

更正:我假设在btable和ctable的id(如果它们是主键…)和b上都有索引。参数(如果它的数字)。

#4


1  

Beside optimizing the query you could also take a look at a good use of indexes, since they might prevent a full table scan.

除了优化查询之外,还可以查看索引的良好使用情况,因为它们可能会防止全表扫描。

For BTable for example create an index on id and param.

例如,对于BTable,在id和参数上创建索引。

To explain why this helps: If the database has to look up the id and param values in the table in a unsorted manner, the database has to read ALL rows. If the database reads the index, SORTED, it can look up the id and param with reduced costs.

要解释这为什么有用:如果数据库必须以未排序的方式查找表中的id和参数值,那么数据库必须读取所有行。如果数据库读取索引,排序,它可以查找id和param以降低成本。