I need to solve the following task: I have a quite large array of IDs in PHP script and I need to select from MySQL DB all rows with IDs NOT IN this array.
我需要解决以下任务:我在PHP脚本中有一个相当大的id数组,我需要从MySQL DB中选择所有id不在这个数组中的行。
There are several similar questions (How to find all records which are NOT in this array? (MySql)) and the most favourite answer is use NOT IN ()
construction with implode(',',$array)
within a brackets.
有几个类似的问题(如何查找不在这个数组中的所有记录?(MySql))最受欢迎的答案是,不要在()构造中使用括号内爆(',',',$array)。
And this worked... until my array gown up to 2007 IDs and about 20 kB (in my case) I've got a "MySQL server has gone away" error. As I can understand this is because of the lengthy query.
这工作……直到我的数组长到2007年的id和大约20kb(在我的例子中),我有一个“MySQL服务器已经消失”的错误。我能理解这是因为冗长的查询。
There are also some solutions to this problem like this:
对于这个问题也有一些解决办法:
SET GLOBAL max_allowed_packet=1073741824;
(just taken from this question).
(刚从这个问题中提出来)。
Probably I could do it in this way, however now I doubt that NOT IN (implode)
approach is a good one to a big arrays (I expect that in my case array can be up to 8000 IDs and 100 kB).
也许我可以这样做,但是现在我怀疑NOT in(内爆)方法对于大数组来说不是一个好方法(我希望在我的例子中,数组最多可以达到8000个id和100 kB)。
Is there any better solution for a big arrays?
对于大数组有更好的解决方案吗?
Thanks!
谢谢!
EDIT 1
编辑1
As a solution it is recommended to insert all IDs from array to a temporary table and than use JOIN to solve the initial task. This is clear. However I never used temporary tables and therefore I have some additional question (probably worth to be as a separate question but I decided to leave it here):
作为一种解决方案,建议将所有id从数组插入到临时表中,而不是使用JOIN来解决初始任务。这是明确的。然而,我从未使用过临时表,因此我有一些额外的问题(可能值得单独提一下,但我决定把它留在这里):
If I need to do this routine several times during one MySQL session, which approach will be better:
如果在一次MySQL会话中需要多次执行此例程,哪种方法更好:
-
Each time I need to SELECT ID NOT IN PHP array I will create a NEW temporary table (all those tables will be deleted after MySQL connection termination - after my script will be terminated in fact).
每次需要在PHP数组中选择ID时,我将创建一个新的临时表(所有这些表将在MySQL连接终止后被删除——实际上在我的脚本终止之后)。
-
I will create a temporary table and delete one after I made needed SELECT
我将创建一个临时表,并在做了必要的选择之后删除一个
-
I will TRNCATE a temporary table afterwards.
之后我会准备一张临时的桌子。
Which is the better? Or I missed something else?
哪个是更好的?还是我漏掉了什么?
2 个解决方案
#1
3
In such cases it is usually better to create a temporary table and perform the query against it instead. It'd be something along the lines of:
在这种情况下,通常最好创建一个临时表并对其执行查询。它应该是:
CREATE TEMPORARY TABLE t1 (a int);
INSERT INTO t1 VALUES (1),(2),(3);
SELECT * FROM yourtable
LEFT JOIN t1 on (yourtable.id=t1.a)
WHERE t1.a IS NULL;
Of course INSERT
statement should be constructed so that you'd insert all values from your array into the temporary table.
当然,应该构造INSERT语句,以便将数组中的所有值插入到临时表中。
Edit: Inserting all values in a single INSERT
statement would most probably lead into the same problem you already faced. Hence I'd suggest that you use a prepared statement that will be executed to insert the data into temporary table while you iterate through the PHP array.
编辑:在一个INSERT语句中插入所有值很可能会导致您已经遇到的问题。因此,我建议您使用一个准备好的语句,在遍历PHP数组时将数据插入临时表。
#2
1
I've once had to tackle this problem, but with a IN(id)
WHERE Clause with approx 20,000-30,000 identifiers (indexes
).
我曾经不得不解决这个问题,但是使用了IN(id) WHERE子句,该子句包含大约2万到3万个标识符(索引)。
The way I got around this, with SELECT
query, was that I reduced the number of filtered identifiers and increased the number of times I sent the same query, in order to extract the same data.
通过SELECT query,我减少了过滤标识符的数量,增加了发送相同查询的次数,以便提取相同的数据。
You could use array_chunk
for PHP and divide 20,000
by 15
, which would give you 15
separate SQL Calls, filtering records by 1500
identifiers (per call, you can divide more than 15
to reduce the number of identifiers
further). But in your case, if you just divide 2007
idenitifers by 10
it would reduce the number of identifiers you're pushing to the database to 200
per SQL request, there are otherways to optimize this further with temporary tables and so fourth.
您可以为PHP使用array_chunk并将20,000除以15,这将为您提供15个单独的SQL调用,将记录过滤为1500个标识符(每次调用,您可以分割超过15个标识符,以进一步减少标识符的数量)。但是在您的例子中,如果您将2007 idenitifers除以10,它将把您推送到数据库的标识符数量减少到每条SQL请求200个,还有其他方法可以使用临时表进一步优化它,所以是第四个。
By dividing the number of indexes you're trying filter it will speed up each query, to run faster than if you were to send every index to the database in a single dump.
通过划分正在尝试筛选的索引数量,它将加快每个查询的速度,比将每个索引发送到数据库中的一个转储中要快。
#1
3
In such cases it is usually better to create a temporary table and perform the query against it instead. It'd be something along the lines of:
在这种情况下,通常最好创建一个临时表并对其执行查询。它应该是:
CREATE TEMPORARY TABLE t1 (a int);
INSERT INTO t1 VALUES (1),(2),(3);
SELECT * FROM yourtable
LEFT JOIN t1 on (yourtable.id=t1.a)
WHERE t1.a IS NULL;
Of course INSERT
statement should be constructed so that you'd insert all values from your array into the temporary table.
当然,应该构造INSERT语句,以便将数组中的所有值插入到临时表中。
Edit: Inserting all values in a single INSERT
statement would most probably lead into the same problem you already faced. Hence I'd suggest that you use a prepared statement that will be executed to insert the data into temporary table while you iterate through the PHP array.
编辑:在一个INSERT语句中插入所有值很可能会导致您已经遇到的问题。因此,我建议您使用一个准备好的语句,在遍历PHP数组时将数据插入临时表。
#2
1
I've once had to tackle this problem, but with a IN(id)
WHERE Clause with approx 20,000-30,000 identifiers (indexes
).
我曾经不得不解决这个问题,但是使用了IN(id) WHERE子句,该子句包含大约2万到3万个标识符(索引)。
The way I got around this, with SELECT
query, was that I reduced the number of filtered identifiers and increased the number of times I sent the same query, in order to extract the same data.
通过SELECT query,我减少了过滤标识符的数量,增加了发送相同查询的次数,以便提取相同的数据。
You could use array_chunk
for PHP and divide 20,000
by 15
, which would give you 15
separate SQL Calls, filtering records by 1500
identifiers (per call, you can divide more than 15
to reduce the number of identifiers
further). But in your case, if you just divide 2007
idenitifers by 10
it would reduce the number of identifiers you're pushing to the database to 200
per SQL request, there are otherways to optimize this further with temporary tables and so fourth.
您可以为PHP使用array_chunk并将20,000除以15,这将为您提供15个单独的SQL调用,将记录过滤为1500个标识符(每次调用,您可以分割超过15个标识符,以进一步减少标识符的数量)。但是在您的例子中,如果您将2007 idenitifers除以10,它将把您推送到数据库的标识符数量减少到每条SQL请求200个,还有其他方法可以使用临时表进一步优化它,所以是第四个。
By dividing the number of indexes you're trying filter it will speed up each query, to run faster than if you were to send every index to the database in a single dump.
通过划分正在尝试筛选的索引数量,它将加快每个查询的速度,比将每个索引发送到数据库中的一个转储中要快。