For each identifier, how can I return the quantity when the received country is not equal to any of the delivered countries? I need an efficient query for the steps below since my table is huge.
对于每个标识符,当接收国不等于任何一个发送国时,如何返回数量?由于我的表很大,因此需要对下面的步骤进行有效的查询。
These are the steps I would think could do this, of course you don't need to follow them :)
这些步骤我认为可以做到,当然你不需要遵循它们:
- Create a group of 'delivered' countries for each identifier.
- 为每个标识符创建一组“交付”的国家。
- See if 'received' is any of these countries for each identifier. If there is no match, return this result.
- 看看每个标识符的“received”是否是这些国家中的任何一个。如果没有匹配,返回此结果。
Starting Table:
从表:
identifier delivered received quantity
------------- ------------ ----------- ------------
1 USA France 432
1 France USA 450
1 Ireland Russia 100
2 Germany Germany 1,034
3 USA France 50
3 USA USA 120
Result:
结果:
identifier delivered received quantity
------------- ------------ ----------- ------------
1 Ireland Russia 100
The starting table is about 30,000,000 rows, so self-joins will be impossible unfortunately. I am using something similar to MySQL.
起始表大约有30,000,000行,因此很不幸,不可能进行自连接。我使用的是类似于MySQL的东西。
2 个解决方案
#1
6
I think LEFT JOIN
query should work for you:
我认为左连接查询应该适合您:
SELECT a.*
FROM starting a
LEFT JOIN starting b
ON a.id = b.id
AND a.delivered = b.received
WHERE b.received IS NULL;
Example: SQLFiddle
For optimizing above query, adding following composite index should give you better performance:
为了优化上面的查询,添加以下复合索引应该会给您更好的性能:
ALTER TABLE starting ADD KEY ix1(id, delivered, received);
#2
0
You could use a not exists
subquery:
可以使用不存在子查询:
SELECT a.*
FROM starting a
WHERE NOT EXISTS
(
SELECT *
FROM starting b
WHERE a.id = b.id
AND a.delivered = b.received
)
This is not a self-join, but the query optimizer is free to execute it as one (and usually does.)
这不是一个自连接,但是查询优化器可以*地执行它(通常是这样)。
#1
6
I think LEFT JOIN
query should work for you:
我认为左连接查询应该适合您:
SELECT a.*
FROM starting a
LEFT JOIN starting b
ON a.id = b.id
AND a.delivered = b.received
WHERE b.received IS NULL;
Example: SQLFiddle
For optimizing above query, adding following composite index should give you better performance:
为了优化上面的查询,添加以下复合索引应该会给您更好的性能:
ALTER TABLE starting ADD KEY ix1(id, delivered, received);
#2
0
You could use a not exists
subquery:
可以使用不存在子查询:
SELECT a.*
FROM starting a
WHERE NOT EXISTS
(
SELECT *
FROM starting b
WHERE a.id = b.id
AND a.delivered = b.received
)
This is not a self-join, but the query optimizer is free to execute it as one (and usually does.)
这不是一个自连接,但是查询优化器可以*地执行它(通常是这样)。