I would like to know the following:
我想知道以下几点:
- how to get data from multiple tables in my database?
- 如何从数据库中的多个表获取数据?
- what types of methods are there to do this?
- 有什么方法可以做到这一点?
- what are joins and unions and how are they different from one another?
- 什么是连接和结合,它们之间有什么不同?
- When should I use each one compared to the others?
- 我应该什么时候使用每一个?
I am planning to use this in my (for example - PHP) application, but don't want to run multiple queries against the database, what options do I have to get data from multiple tables in a single query?
我打算在我的(例如,PHP)应用程序中使用它,但是不希望对数据库运行多个查询,我需要在一个查询中从多个表中获取数据吗?
Note: I am writing this as I would like to be able to link to a well written guide on the numerous questions that I constantly come across in the PHP queue, so I can link to this for further detail when I post an answer.
注意:我正在写这篇文章,因为我希望能够链接到一个关于我在PHP队列中经常遇到的众多问题的书面指南,所以当我发布一个答案时,我可以链接到这个问题。
The answers cover off the following:
答覆如下:
- Part 1 - Joins and Unions
- 第1部分-加入和工会。
- Part 2 - Subqueries
- 第2部分-子查询
- Part 3 - Tricks and Efficient Code
- 第3部分-技巧和有效的代码。
- Part 4 - Subqueries in the From Clause
- 第4部分- From子句中的子查询。
- Part 5 - Mixed Bag of John's Tricks
- 第5部分-约翰的戏法的混合袋。
6 个解决方案
#1
398
Part 1 - Joins and Unions
This answer covers:
这个答案包括:
- Part 1
- Joining two or more tables using an inner join (See the wikipedia entry for additional info)
- 使用内部连接连接两个或多个表(参见wikipedia条目以获取其他信息)
- How to use a union query
- 如何使用联合查询?
- Left and Right Outer Joins (this * answer is excellent to describe types of joins)
- 左和右外连接(这个*的答案很好地描述了连接的类型)
- Intersect queries (and how to reproduce them if your database doesn't support them) - this is a function of SQL-Server (see info) and part of the reason I wrote this whole thing in the first place.
- 交叉查询(以及如何在数据库不支持它们的情况下复制它们)——这是SQL-Server(参见info)的函数,也是我首先编写这整个东西的部分原因。
- 第1部分使用内连接加入两个或两个以上的表(见附加信息)的*条目如何使用联合查询左和右外连接(这*答案是优秀的描述类型的连接)相交查询(以及如何复制它们,如果数据库不支持)——这是一个函数的sql server(见信息)和部分原因我写这整件事放在第一位。
- Part 2
- Subqueries - what they are, where they can be used and what to watch out for
- 子查询——它们是什么,在哪里可以使用,需要注意什么。
- Cartesian joins AKA - Oh, the misery!
- 笛卡儿加入了——哦,痛苦!
- 第2部分的子查询——它们是什么,它们可以在哪里被使用,以及在笛卡尔坐标系中应该注意什么——哦,痛苦!
There are a number of ways to retrieve data from multiple tables in a database. In this answer, I will be using ANSI-92 join syntax. This may be different to a number of other tutorials out there which use the older ANSI-89 syntax (and if you are used to 89, may seem much less intuitive - but all I can say is to try it) as it is much easier to understand when the queries start getting more complex. Why use it? Is there a performance gain? The short answer is no, but it is easier to read once you get used to it. It is easier to read queries written by other folks using this syntax.
有许多方法可以从数据库中的多个表中检索数据。在这个答案中,我将使用ANSI-92连接语法。这可能是不同数量的其他教程,使用旧的ansi - 89的语法(如果你使用到89年,可能看起来更直观,但我能说的就是它),因为它是更容易理解当查询开始变得越来越复杂。为什么使用它呢?是否有业绩增长?简短的回答是没有,但是一旦你习惯了就更容易阅读了。使用这种语法阅读其他人编写的查询更容易。
I am also going to use the concept of a small caryard which has a database to keep track of what cars it has available. The owner has hired you as his IT Computer guy and expects you to be able to drop him the data that he asks for at the drop of a hat.
我还将使用一个小的车库的概念,它有一个数据库来跟踪它所拥有的汽车。老板雇佣了你作为他的IT电脑人员,并希望你能把他要求的数据交给他。
I have made a number of lookup tables that will be used by the final table. This will give us a reasonable model to work from. To start off, I will be running my queries against an example database that has the following structure. I will try to think of common mistakes that are made when starting out and explain what goes wrong with them - as well as of course showing how to correct them.
我已经做了一些查找表,这些表将被最终表使用。这将给我们一个合理的工作模式。首先,我将对一个具有以下结构的示例数据库运行我的查询。我将试着去思考一些常见的错误,这些错误是在开始和解释错误的时候产生的,当然也包括如何纠正错误。
The first table is simply a color listing so that we know what colors we have in the car yard.
第一张表是一个简单的颜色列表,这样我们就能知道在车场里有什么颜色。
mysql> create table colors(id int(3) not null auto_increment primary key,
-> color varchar(15), paint varchar(10));
Query OK, 0 rows affected (0.01 sec)
mysql> show columns from colors;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(3) | NO | PRI | NULL | auto_increment |
| color | varchar(15) | YES | | NULL | |
| paint | varchar(10) | YES | | NULL | |
+-------+-------------+------+-----+---------+----------------+
3 rows in set (0.01 sec)
mysql> insert into colors (color, paint) values ('Red', 'Metallic'),
-> ('Green', 'Gloss'), ('Blue', 'Metallic'),
-> ('White' 'Gloss'), ('Black' 'Gloss');
Query OK, 5 rows affected (0.00 sec)
Records: 5 Duplicates: 0 Warnings: 0
mysql> select * from colors;
+----+-------+----------+
| id | color | paint |
+----+-------+----------+
| 1 | Red | Metallic |
| 2 | Green | Gloss |
| 3 | Blue | Metallic |
| 4 | White | Gloss |
| 5 | Black | Gloss |
+----+-------+----------+
5 rows in set (0.00 sec)
The brands table identifies the different brands of the cars out caryard could possibly sell.
品牌表确定了不同品牌的汽车出车场可能会卖出去。
mysql> create table brands (id int(3) not null auto_increment primary key,
-> brand varchar(15));
Query OK, 0 rows affected (0.01 sec)
mysql> show columns from brands;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(3) | NO | PRI | NULL | auto_increment |
| brand | varchar(15) | YES | | NULL | |
+-------+-------------+------+-----+---------+----------------+
2 rows in set (0.01 sec)
mysql> insert into brands (brand) values ('Ford'), ('Toyota'),
-> ('Nissan'), ('Smart'), ('BMW');
Query OK, 5 rows affected (0.00 sec)
Records: 5 Duplicates: 0 Warnings: 0
mysql> select * from brands;
+----+--------+
| id | brand |
+----+--------+
| 1 | Ford |
| 2 | Toyota |
| 3 | Nissan |
| 4 | Smart |
| 5 | BMW |
+----+--------+
5 rows in set (0.00 sec)
The model table will cover off different types of cars, it is going to be simpler for this to use different car types rather than actual car models.
模型表将覆盖不同类型的汽车,使用不同的车型而不是实际的汽车模型将会更简单。
mysql> create table models (id int(3) not null auto_increment primary key,
-> model varchar(15));
Query OK, 0 rows affected (0.01 sec)
mysql> show columns from models;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(3) | NO | PRI | NULL | auto_increment |
| model | varchar(15) | YES | | NULL | |
+-------+-------------+------+-----+---------+----------------+
2 rows in set (0.00 sec)
mysql> insert into models (model) values ('Sports'), ('Sedan'), ('4WD'), ('Luxury');
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> select * from models;
+----+--------+
| id | model |
+----+--------+
| 1 | Sports |
| 2 | Sedan |
| 3 | 4WD |
| 4 | Luxury |
+----+--------+
4 rows in set (0.00 sec)
And finally, to tie up all these other tables, the table that ties everything together. The ID field is actually the unique lot number used to identify cars.
最后,把所有的桌子都绑起来,把所有的东西都绑在一起。ID字段实际上是用来标识汽车的唯一的批号。
mysql> create table cars (id int(3) not null auto_increment primary key,
-> color int(3), brand int(3), model int(3));
Query OK, 0 rows affected (0.01 sec)
mysql> show columns from cars;
+-------+--------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------+------+-----+---------+----------------+
| id | int(3) | NO | PRI | NULL | auto_increment |
| color | int(3) | YES | | NULL | |
| brand | int(3) | YES | | NULL | |
| model | int(3) | YES | | NULL | |
+-------+--------+------+-----+---------+----------------+
4 rows in set (0.00 sec)
mysql> insert into cars (color, brand, model) values (1,2,1), (3,1,2), (5,3,1),
-> (4,4,2), (2,2,3), (3,5,4), (4,1,3), (2,2,1), (5,2,3), (4,5,1);
Query OK, 10 rows affected (0.00 sec)
Records: 10 Duplicates: 0 Warnings: 0
mysql> select * from cars;
+----+-------+-------+-------+
| id | color | brand | model |
+----+-------+-------+-------+
| 1 | 1 | 2 | 1 |
| 2 | 3 | 1 | 2 |
| 3 | 5 | 3 | 1 |
| 4 | 4 | 4 | 2 |
| 5 | 2 | 2 | 3 |
| 6 | 3 | 5 | 4 |
| 7 | 4 | 1 | 3 |
| 8 | 2 | 2 | 1 |
| 9 | 5 | 2 | 3 |
| 10 | 4 | 5 | 1 |
+----+-------+-------+-------+
10 rows in set (0.00 sec)
This will give us enough data (I hope) to cover off the examples below of different types of joins and also give enough data to make them worthwhile.
这将给我们提供足够的数据(我希望),以覆盖不同类型的连接下面的示例,并提供足够的数据以使它们值得。
So getting into the grit of it, the boss wants to know The IDs of all the sports cars he has.
因此,老板想知道他拥有的所有跑车的id。
This is a simple two table join. We have a table that identifies the model and the table with the available stock in it. As you can see, the data in the model
column of the cars
table relates to the models
column of the cars
table we have. Now, we know that the models table has an ID of 1
for Sports
so lets write the join.
这是一个简单的两个表连接。我们有一个表,它标识模型和表,其中有可用的股票。如您所见,cars表模型列中的数据与我们所拥有的cars表的模型列有关。现在,我们知道模型表的运动ID为1,所以我们写入join。
select
ID,
model
from
cars
join models
on model=ID
So this query looks good right? We have identified the two tables and contain the information we need and use a join that correctly identifies what columns to join on.
这个查询看起来很好,对吗?我们已经识别了这两个表,并包含了我们需要的信息,并使用一个连接来正确地标识要连接的列。
ERROR 1052 (23000): Column 'ID' in field list is ambiguous
Oh noes! An error in our first query! Yes, and it is a plum. You see, the query has indeed got the right columns, but some of them exist in both tables, so the database gets confused about what actual column we mean and where. There are two solutions to solve this. The first is nice and simple, we can use tableName.columnName
to tell the database exactly what we mean, like this:
噢,不!第一个查询中的错误!是的,而且是李子。您可以看到,查询确实得到了正确的列,但是其中的一些列存在于两个表中,因此数据库对我们的实际列的含义和位置感到困惑。有两种解决方法。第一个很简单,我们可以使用tableName。columnName告诉数据库我们是什么意思,像这样:
select
cars.ID,
models.model
from
cars
join models
on cars.model=models.ID
+----+--------+
| ID | model |
+----+--------+
| 1 | Sports |
| 3 | Sports |
| 8 | Sports |
| 10 | Sports |
| 2 | Sedan |
| 4 | Sedan |
| 5 | 4WD |
| 7 | 4WD |
| 9 | 4WD |
| 6 | Luxury |
+----+--------+
10 rows in set (0.00 sec)
The other is probably more often used and is called table aliasing. The tables in this example have nice and short simple names, but typing out something like KPI_DAILY_SALES_BY_DEPARTMENT
would probably get old quickly, so a simple way is to nickname the table like this:
另一个可能更常用,称为表混叠。本例中的表具有很好的和简短的简单名称,但是键入诸如KPI_DAILY_SALES_BY_DEPARTMENT之类的东西可能很快就会过时,所以简单的方法就是给表起个昵称:
select
a.ID,
b.model
from
cars a
join models b
on a.model=b.ID
Now, back to the request. As you can see we have the information we need, but we also have information that wasn't asked for, so we need to include a where clause in the statement to only get the Sports cars as was asked. As I prefer the table alias method rather than using the table names over and over, I will stick to it from this point onwards.
现在,回到请求。你可以看到我们有我们需要的信息,但是我们也有一些没有被要求的信息,所以我们需要在声明中加入一个where子句,以得到被要求的跑车。由于我更喜欢使用表别名方法,而不是反复使用表名,所以从这一点开始,我将坚持使用这个方法。
Clearly, we need to add a where clause to our query. We can identify Sports cars either by ID=1
or model='Sports'
. As the ID is indexed and the primary key (and it happens to be less typing), lets use that in our query.
显然,我们需要在查询中添加where子句。我们可以通过ID=1或模型='Sports'来识别跑车。当ID被索引并且主键(而且它恰好是更少的类型)时,让我们在查询中使用它。
select
a.ID,
b.model
from
cars a
join models b
on a.model=b.ID
where
b.ID=1
+----+--------+
| ID | model |
+----+--------+
| 1 | Sports |
| 3 | Sports |
| 8 | Sports |
| 10 | Sports |
+----+--------+
4 rows in set (0.00 sec)
Bingo! The boss is happy. Of course, being a boss and never being happy with what he asked for, he looks at the information, then says I want the colors as well.
宾果!老板很高兴。当然,作为一个老板,从不满足于他所要求的,他看着这些信息,然后说我也想要这些颜色。
Okay, so we have a good part of our query already written, but we need to use a third table which is colors. Now, our main information table cars
stores the car color ID and this links back to the colors ID column. So, in a similar manner to the original, we can join a third table:
好的,我们已经编写了一个很好的查询部分,但是我们需要使用第三个表,也就是颜色。现在,我们的主要信息表汽车将汽车颜色标识和这个链接返回到颜色ID列。因此,以类似于原始的方式,我们可以加入第三个表:
select
a.ID,
b.model
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
where
b.ID=1
+----+--------+
| ID | model |
+----+--------+
| 1 | Sports |
| 3 | Sports |
| 8 | Sports |
| 10 | Sports |
+----+--------+
4 rows in set (0.00 sec)
Damn, although the table was correctly joined and the related columns were linked, we forgot to pull in the actual information from the new table that we just linked.
该死的,虽然表格被正确地连接起来,相关的列被链接起来,但是我们忘记了从我们刚刚链接的新表中提取实际信息。
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
where
b.ID=1
+----+--------+-------+
| ID | model | color |
+----+--------+-------+
| 1 | Sports | Red |
| 8 | Sports | Green |
| 10 | Sports | White |
| 3 | Sports | Black |
+----+--------+-------+
4 rows in set (0.00 sec)
Right, that's the boss off our back for a moment. Now, to explain some of this in a little more detail. As you can see, the from
clause in our statement links our main table (I often use a table that contains information rather than a lookup or dimension table. The query would work just as well with the tables all switched around, but make less sense when we come back to this query to read it in a few months time, so it is often best to try to write a query that will be nice and easy to understand - lay it out intuitively, use nice indenting so that everything is as clear as it can be. If you go on to teach others, try to instill these characteristics in their queries - especially if you will be troubleshooting them.
对,那是我们的老板。现在,为了更详细地解释这一点。如您所见,我们的语句中的from子句链接了我们的主表(我经常使用包含信息的表,而不是查找或维度表)。查询将工作表的所有交换,但是更少的意义当我们回到这个查询阅读它在几个月的时间,所以最好经常试着编写一个查询,将很容易理解——把它直观,使用漂亮的缩进,一切都是那样清晰。如果你继续教别人,试着在他们的查询中灌输这些特征——尤其是当你要排除他们的时候。
It is entirely possible to keep linking more and more tables in this manner.
以这种方式连接越来越多的表是完全可能的。
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
join brands d
on a.brand=d.ID
where
b.ID=1
While I forgot to include a table where we might want to join more than one column in the join
statement, here is an example. If the models
table had brand-specific models and therefore also had a column called brand
which linked back to the brands
table on the ID
field, it could be done as this:
当我忘记包含一个表时,我们可能想要在join语句中加入多个列,这里有一个例子。如果模型表有品牌特定的模型,因此也有一个名为brand的栏目,它可以链接到ID字段上的品牌表,可以这样做:
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
join brands d
on a.brand=d.ID
and b.brand=d.ID
where
b.ID=1
You can see, the query above not only links the joined tables to the main cars
table, but also specifies joins between the already joined tables. If this wasn't done, the result is called a cartesian join - which is dba speak for bad. A cartesian join is one where rows are returned because the information doesn't tell the database how to limit the results, so the query returns all the rows that fit the criteria.
您可以看到,上面的查询不仅将连接的表链接到主cars表,而且还指定已经连接的表之间的连接。如果没有这样做,结果被称为cartesian连接——这是dba说的不好。cartesian连接是一个返回的行,因为信息没有告诉数据库如何限制结果,所以查询返回符合条件的所有行。
So, to give an example of a cartesian join, lets run the following query:
因此,为了给出一个笛卡尔连接的例子,我们来运行以下查询:
select
a.ID,
b.model
from
cars a
join models b
+----+--------+
| ID | model |
+----+--------+
| 1 | Sports |
| 1 | Sedan |
| 1 | 4WD |
| 1 | Luxury |
| 2 | Sports |
| 2 | Sedan |
| 2 | 4WD |
| 2 | Luxury |
| 3 | Sports |
| 3 | Sedan |
| 3 | 4WD |
| 3 | Luxury |
| 4 | Sports |
| 4 | Sedan |
| 4 | 4WD |
| 4 | Luxury |
| 5 | Sports |
| 5 | Sedan |
| 5 | 4WD |
| 5 | Luxury |
| 6 | Sports |
| 6 | Sedan |
| 6 | 4WD |
| 6 | Luxury |
| 7 | Sports |
| 7 | Sedan |
| 7 | 4WD |
| 7 | Luxury |
| 8 | Sports |
| 8 | Sedan |
| 8 | 4WD |
| 8 | Luxury |
| 9 | Sports |
| 9 | Sedan |
| 9 | 4WD |
| 9 | Luxury |
| 10 | Sports |
| 10 | Sedan |
| 10 | 4WD |
| 10 | Luxury |
+----+--------+
40 rows in set (0.00 sec)
Good god, that's ugly. However, as far as the database is concerned, it is exactly what was asked for. In the query, we asked for for the ID
from cars
and the model
from models
. However, because we didn't specify how to join the tables, the database has matched every row from the first table with every row from the second table.
上帝啊,这是丑陋的。然而,就数据库而言,这正是所要求的。在查询中,我们询问了汽车的ID和模型的模型。但是,由于我们没有指定如何联接表,所以数据库已经将第一个表中的每一行与第二个表中的每一行匹配起来。
Okay, so the boss is back, and he wants more information again. I want the same list, but also include 4WDs in it.
好吧,老板回来了,他想要更多的信息。我想要同样的列表,但也包含了4WDs。
This however, gives us a great excuse to look at two different ways to accomplish this. We could add another condition to the where clause like this:
然而,这给了我们一个很好的借口来看看两种不同的方法来完成这个任务。我们可以在where子句中添加另一个条件:
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
join brands d
on a.brand=d.ID
where
b.ID=1
or b.ID=3
While the above will work perfectly well, lets look at it differently, this is a great excuse to show how a union
query will work.
虽然上面的工作将会很好地工作,但是让我们以不同的方式来看待它,这是一个很好的借口来展示一个联合查询是如何工作的。
We know that the following will return all the Sports cars:
我们知道以下将会返还所有的跑车:
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
join brands d
on a.brand=d.ID
where
b.ID=1
And the following would return all the 4WDs:
下面将返回所有4WDs:
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
join brands d
on a.brand=d.ID
where
b.ID=3
So by adding a union all
clause between them, the results of the second query will be appended to the results of the first query.
因此,通过在它们之间添加一个union all子句,第二个查询的结果将被附加到第一个查询的结果中。
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
join brands d
on a.brand=d.ID
where
b.ID=1
union all
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
join brands d
on a.brand=d.ID
where
b.ID=3
+----+--------+-------+
| ID | model | color |
+----+--------+-------+
| 1 | Sports | Red |
| 8 | Sports | Green |
| 10 | Sports | White |
| 3 | Sports | Black |
| 5 | 4WD | Green |
| 7 | 4WD | White |
| 9 | 4WD | Black |
+----+--------+-------+
7 rows in set (0.00 sec)
As you can see, the results of the first query are returned first, followed by the results of the second query.
如您所见,第一个查询的结果首先返回,然后是第二个查询的结果。
In this example, it would of course have been much easier to simply use the first query, but union
queries can be great for specific cases. They are a great way to return specific results from tables from tables that aren't easily joined together - or for that matter completely unrelated tables. There are a few rules to follow however.
在本例中,简单地使用第一个查询会更容易一些,但是联合查询对于特定的情况来说是很好的。它们是一种很好的方法,可以从不容易连接的表中返回特定的结果——或者是完全不相关的表。然而,有一些规则可以遵循。
- The column types from the first query must match the column types from every other query below.
- 第一个查询的列类型必须与下面所有其他查询的列类型匹配。
- The names of the columns from the first query will be used to identify the entire set of results.
- 来自第一个查询的列的名称将用于标识整个结果集。
- The number of columns in each query must be the same.
- 每个查询中的列数必须相同。
Now, you might be wondering what the difference is between using union
and union all
. A union
query will remove duplicates, while a union all
will not. This does mean that there is a small performance hit when using union
over union all
but the results may be worth it - I won't speculate on that sort of thing in this though.
现在,您可能想知道使用union和union all之间有什么区别。联合查询将删除重复项,而联合查询则不会。这确实意味着在使用union over union时,会有一个小的性能影响,但是结果可能是值得的——我不会在这方面做任何推测。
On this note, it might be worth noting some additional notes here.
在这张便条上,可能值得注意一些额外的注释。
- If we wanted to order the results, we can use an
order by
but you can't use the alias anymore. In the query above, appending anorder by a.ID
would result in an error - as far as the results are concerned, the column is calledID
rather thana.ID
- even though the same alias has been used in both queries. - 如果我们想要排序结果,我们可以使用order by,但是你不能再使用alias了。在上面的查询中,附加一个order by a。ID将导致一个错误——就结果而言,该列被称为ID而不是a。ID -即使在两个查询中都使用了相同的别名。
- We can only have one
order by
statement, and it must be as the last statement. - 我们只能通过语句来进行一个排序,它必须是最后一个语句。
For the next examples, I am adding a few extra rows to our tables.
在下一个示例中,我将在表中添加一些额外的行。
I have added Holden
to the brands table. I have also added a row into cars
that has the color
value of 12
- which has no reference in the colors table.
我已经在品牌表上加上了Holden。我还将一行添加到具有12的颜色值的汽车中,这在颜色表中没有引用。
Okay, the boss is back again, barking requests out - *I want a count of each brand we carry and the number of cars in it!` - Typical, we just get to an interesting section of our discussion and the boss wants more work.
好吧,老板又回来了,大声叫着要求——我想要一个我们所带的每个品牌的数量和里面的汽车数量!——典型的情况是,我们只是讨论了一个有趣的部分,老板想要更多的工作。
Rightyo, so the first thing we need to do is get a complete listing of possible brands.
好的,所以我们首先要做的是得到一个完整的可能的品牌清单。
select
a.brand
from
brands a
+--------+
| brand |
+--------+
| Ford |
| Toyota |
| Nissan |
| Smart |
| BMW |
| Holden |
+--------+
6 rows in set (0.00 sec)
Now, when we join this to our cars table we get the following result:
现在,当我们把它加入到汽车表中我们得到了如下结果:
select
a.brand
from
brands a
join cars b
on a.ID=b.brand
group by
a.brand
+--------+
| brand |
+--------+
| BMW |
| Ford |
| Nissan |
| Smart |
| Toyota |
+--------+
5 rows in set (0.00 sec)
Which is of course a problem - we aren't seeing any mention of the lovely Holden
brand I added.
当然,这是一个问题——我们没有看到任何提及我添加的可爱的霍尔顿品牌。
This is because a join looks for matching rows in both tables. As there is no data in cars that is of type Holden
it isn't returned. This is where we can use an outer
join. This will return all the results from one table whether they are matched in the other table or not:
这是因为连接在两个表中查找匹配的行。由于在汽车上没有数据,霍尔顿没有返回。这是我们可以使用外部连接的地方。这将返回一个表的所有结果,不管它们是否匹配在另一个表中:
select
a.brand
from
brands a
left outer join cars b
on a.ID=b.brand
group by
a.brand
+--------+
| brand |
+--------+
| BMW |
| Ford |
| Holden |
| Nissan |
| Smart |
| Toyota |
+--------+
6 rows in set (0.00 sec)
Now that we have that, we can add a lovely aggregate function to get a count and get the boss off our backs for a moment.
现在我们有了这个,我们可以添加一个可爱的聚合函数来得到一个计数,并让老板离开我们一会儿。
select
a.brand,
count(b.id) as countOfBrand
from
brands a
left outer join cars b
on a.ID=b.brand
group by
a.brand
+--------+--------------+
| brand | countOfBrand |
+--------+--------------+
| BMW | 2 |
| Ford | 2 |
| Holden | 0 |
| Nissan | 1 |
| Smart | 1 |
| Toyota | 5 |
+--------+--------------+
6 rows in set (0.00 sec)
And with that, away the boss skulks.
这样一来,老板就躲了起来。
Now, to explain this in some more detail, outer joins can be of the left
or right
type. The Left or Right defines which table is fully included. A left outer join
will include all the rows from the table on the left, while (you guessed it) a right outer join
brings all the results from the table on the right into the results.
现在,为了更详细地解释这一点,外部连接可以是左或右类型。左边或右边定义了哪个表被完全包含。左外连接将包括左侧表中的所有行,而(您猜对了)一个右外连接将所有的结果从表中引入到结果中。
Some databases will allow a full outer join
which will bring back results (whether matched or not) from both tables, but this isn't supported in all databases.
一些数据库将允许一个完整的外部连接,这将从两个表中返回结果(无论是否匹配),但这在所有数据库中都不受支持。
Now, I probably figure at this point in time, you are wondering whether or not you can merge join types in a query - and the answer is yes, you absolutely can.
现在,我可能会在这个时间点,你想知道你是否可以在一个查询中合并连接类型——答案是肯定的,你绝对可以。
select
b.brand,
c.color,
count(a.id) as countOfBrand
from
cars a
right outer join brands b
on b.ID=a.brand
join colors c
on a.color=c.ID
group by
a.brand,
c.color
+--------+-------+--------------+
| brand | color | countOfBrand |
+--------+-------+--------------+
| Ford | Blue | 1 |
| Ford | White | 1 |
| Toyota | Black | 1 |
| Toyota | Green | 2 |
| Toyota | Red | 1 |
| Nissan | Black | 1 |
| Smart | White | 1 |
| BMW | Blue | 1 |
| BMW | White | 1 |
+--------+-------+--------------+
9 rows in set (0.00 sec)
So, why is that not the results that were expected? It is because although we have selected the outer join from cars to brands, it wasn't specified in the join to colors - so that particular join will only bring back results that match in both tables.
那么,为什么这不是预期的结果呢?这是因为虽然我们选择了从汽车到品牌的外部连接,但它并没有在连接到颜色中指定,因此特定的连接只会在两个表中返回匹配的结果。
Here is the query that would work to get the results that we expected:
下面的查询将会得到我们所期望的结果:
select
a.brand,
c.color,
count(b.id) as countOfBrand
from
brands a
left outer join cars b
on a.ID=b.brand
left outer join colors c
on b.color=c.ID
group by
a.brand,
c.color
+--------+-------+--------------+
| brand | color | countOfBrand |
+--------+-------+--------------+
| BMW | Blue | 1 |
| BMW | White | 1 |
| Ford | Blue | 1 |
| Ford | White | 1 |
| Holden | NULL | 0 |
| Nissan | Black | 1 |
| Smart | White | 1 |
| Toyota | NULL | 1 |
| Toyota | Black | 1 |
| Toyota | Green | 2 |
| Toyota | Red | 1 |
+--------+-------+--------------+
11 rows in set (0.00 sec)
As we can see, we have two outer joins in the query and the results are coming through as expected.
正如我们所看到的,查询中有两个外部连接,结果按预期的方式进行。
Now, how about those other types of joins you ask? What about Intersections?
那么,你问的其他类型的连接呢?十字路口呢?
Well, not all databases support the intersection
but pretty much all databases will allow you to create an intersection through a join (or a well structured where statement at the least).
不是所有的数据库都支持这个交集,但是几乎所有的数据库都允许你通过连接创建一个交集(或者至少是一个结构良好的语句)。
An Intersection is a type of join somewhat similar to a union
as described above - but the difference is that it only returns rows of data that are identical (and I do mean identical) between the various individual queries joined by the union. Only rows that are identical in every regard will be returned.
交集是一种类似于前面描述的联合的连接类型,但不同之处在于,它只返回与union连接的各个独立查询之间相同的数据行(我的意思是相同的)。只有在所有方面相同的行才会被返回。
A simple example would be as such:
一个简单的例子就是:
select
*
from
colors
where
ID>2
intersect
select
*
from
colors
where
id<4
While a normal union
query would return all the rows of the table (the first query returning anything over ID>2
and the second anything having ID<4
) which would result in a full set, an intersect query would only return the row matching id=3
as it meets both criteria.
正常的union查询将返回表的所有行(第一个查询返回任何超过ID>2的查询,第二个查询有ID<4),这将导致一个完整的集合,一个intersect查询只返回与两个标准匹配的行匹配ID =3。
Now, if your database doesn't support an intersect
query, the above can be easily accomlished with the following query:
现在,如果您的数据库不支持交叉查询,那么可以通过以下查询轻松地解决上述问题:
select
a.ID,
a.color,
a.paint
from
colors a
join colors b
on a.ID=b.ID
where
a.ID>2
and b.ID<4
+----+-------+----------+
| ID | color | paint |
+----+-------+----------+
| 3 | Blue | Metallic |
+----+-------+----------+
1 row in set (0.00 sec)
If you wish to perform an intersection across two different tables using a database that doesn't inherently support an intersection query, you will need to create a join on every column of the tables.
如果希望使用一个不支持交叉查询的数据库在两个不同的表上执行交集,那么您需要在表的每一列上创建一个联接。
#2
93
Ok, I found this post very interesting and I would like to share some of my knowledge on creating a query. Thanks for this Fluffeh. Others who may read this and may feel that I'm wrong are 101% free to edit and criticise my answer. (Honestly, I feel very thankful for correcting my mistake(s).)
好的,我发现这个帖子非常有趣,我想分享一些关于创建查询的知识。感谢Fluffeh。其他人可能会读到这篇文章,可能觉得我错了,有101%的人可以*地编辑和批评我的答案。(坦白地说,我非常感谢能纠正我的错误。)
I'll be posting some of the frequently asked questions in MySQL
tag.
我将在MySQL标签中发布一些常见问题。
Trick No. 1 (rows that matches to multiple conditions)
Given this schema
鉴于这种模式
CREATE TABLE MovieList
(
ID INT,
MovieName VARCHAR(25),
CONSTRAINT ml_pk PRIMARY KEY (ID),
CONSTRAINT ml_uq UNIQUE (MovieName)
);
INSERT INTO MovieList VALUES (1, 'American Pie');
INSERT INTO MovieList VALUES (2, 'The Notebook');
INSERT INTO MovieList VALUES (3, 'Discovery Channel: Africa');
INSERT INTO MovieList VALUES (4, 'Mr. Bean');
INSERT INTO MovieList VALUES (5, 'Expendables 2');
CREATE TABLE CategoryList
(
MovieID INT,
CategoryName VARCHAR(25),
CONSTRAINT cl_uq UNIQUE(MovieID, CategoryName),
CONSTRAINT cl_fk FOREIGN KEY (MovieID) REFERENCES MovieList(ID)
);
INSERT INTO CategoryList VALUES (1, 'Comedy');
INSERT INTO CategoryList VALUES (1, 'Romance');
INSERT INTO CategoryList VALUES (2, 'Romance');
INSERT INTO CategoryList VALUES (2, 'Drama');
INSERT INTO CategoryList VALUES (3, 'Documentary');
INSERT INTO CategoryList VALUES (4, 'Comedy');
INSERT INTO CategoryList VALUES (5, 'Comedy');
INSERT INTO CategoryList VALUES (5, 'Action');
QUESTION
问题
Find all movies that belong to at least both Comedy
and Romance
categories.
找到所有至少属于喜剧和爱情类的电影。
Solution
解决方案
This question can be very tricky sometimes. It may seem that a query like this will be the answer:-
这个问题有时候很棘手。看起来像这样的查询将会是答案:-。
SELECT DISTINCT a.MovieName
FROM MovieList a
INNER JOIN CategoryList b
ON a.ID = b.MovieID
WHERE b.CategoryName = 'Comedy' AND
b.CategoryName = 'Romance'
SQLFiddle Demo
which is definitely very wrong because it produces no result. The explanation of this is that there is only one valid value of CategoryName
on each row. For instance, the first condition returns true, the second condition is always false. Thus, by using AND
operator, both condition should be true; otherwise, it will be false. Another query is like this,
这绝对是非常错误的,因为它不会产生任何结果。对此的解释是,每一行只有一个有效的分类名称。例如,第一个条件返回true,第二个条件总是false。因此,通过使用和操作,这两种情况都应该是正确的;否则,它将是假的。另一个查询是这样的,
SELECT DISTINCT a.MovieName
FROM MovieList a
INNER JOIN CategoryList b
ON a.ID = b.MovieID
WHERE b.CategoryName IN ('Comedy','Romance')
SQLFiddle Demo
and the result is still incorrect because it matches to record that has at least one match on the categoryName
. The real solution would be by counting the number of record instances per movie. The number of instance should match to the total number of the values supplied in the condition.
结果仍然是不正确的,因为它匹配的记录至少有一个匹配的分类名。真正的解决方案是通过计算每部电影的记录实例数。实例的数量应该与条件中提供的值的总数相匹配。
SELECT a.MovieName
FROM MovieList a
INNER JOIN CategoryList b
ON a.ID = b.MovieID
WHERE b.CategoryName IN ('Comedy','Romance')
GROUP BY a.MovieName
HAVING COUNT(*) = 2
SQLFiddle Demo (the answer)
- SQL of Relational Division
- SQL关系部门
Trick No. 2 (maximum record for each entry)
Given schema,
给定的模式,
CREATE TABLE Software
(
ID INT,
SoftwareName VARCHAR(25),
Descriptions VARCHAR(150),
CONSTRAINT sw_pk PRIMARY KEY (ID),
CONSTRAINT sw_uq UNIQUE (SoftwareName)
);
INSERT INTO Software VALUES (1,'PaintMe','used for photo editing');
INSERT INTO Software VALUES (2,'World Map','contains map of different places of the world');
INSERT INTO Software VALUES (3,'Dictionary','contains description, synonym, antonym of the words');
CREATE TABLE VersionList
(
SoftwareID INT,
VersionNo INT,
DateReleased DATE,
CONSTRAINT sw_uq UNIQUE (SoftwareID, VersionNo),
CONSTRAINT sw_fk FOREIGN KEY (SOftwareID) REFERENCES Software(ID)
);
INSERT INTO VersionList VALUES (3, 2, '2009-12-01');
INSERT INTO VersionList VALUES (3, 1, '2009-11-01');
INSERT INTO VersionList VALUES (3, 3, '2010-01-01');
INSERT INTO VersionList VALUES (2, 2, '2010-12-01');
INSERT INTO VersionList VALUES (2, 1, '2009-12-01');
INSERT INTO VersionList VALUES (1, 3, '2011-12-01');
INSERT INTO VersionList VALUES (1, 2, '2010-12-01');
INSERT INTO VersionList VALUES (1, 1, '2009-12-01');
INSERT INTO VersionList VALUES (1, 4, '2012-12-01');
QUESTION
问题
Find the latest version on each software. Display the following columns: SoftwareName
,Descriptions
,LatestVersion
(from VersionNo column),DateReleased
在每个软件上找到最新的版本。显示下列列:软件名称、描述、最新版本(来自VersionNo列)、daterele。
Solution
解决方案
Some SQL developers mistakenly use MAX()
aggregate function. They tend to create like this,
一些SQL开发人员错误地使用MAX()聚合函数。他们倾向于这样创造,
SELECT a.SoftwareName, a.Descriptions,
MAX(b.VersionNo) AS LatestVersion, b.DateReleased
FROM Software a
INNER JOIN VersionList b
ON a.ID = b.SoftwareID
GROUP BY a.ID
ORDER BY a.ID
SQLFiddle Demo
(most RDBMS generates a syntax error on this because of not specifying some of the non-aggregated columns on the group by
clause) the result produces the correct LatestVersion
on each software but obviously the DateReleased
are incorrect. MySQL
doesn't support Window Functions
and Common Table Expression
yet as some RDBMS do already. The workaround on this problem is to create a subquery
which gets the individual maximum versionNo
on each software and later on be joined on the other tables.
(大多数RDBMS在这方面产生了一个语法错误,因为没有在group by子句中指定一些非聚合的列),结果会在每个软件上生成正确的LatestVersion,但是很明显,daterele是不正确的。MySQL不支持窗口函数和公共表表达式,而有些RDBMS已经这样做了。解决这个问题的方法是创建一个子查询,该子查询在每个软件上获得最大的版本号,然后在其他表上加入。
SELECT a.SoftwareName, a.Descriptions,
b.LatestVersion, c.DateReleased
FROM Software a
INNER JOIN
(
SELECT SoftwareID, MAX(VersionNO) LatestVersion
FROM VersionList
GROUP BY SoftwareID
) b ON a.ID = b.SoftwareID
INNER JOIN VersionList c
ON c.SoftwareID = b.SoftwareID AND
c.VersionNO = b.LatestVersion
GROUP BY a.ID
ORDER BY a.ID
SQLFiddle Demo (the answer)
So that was it. I'll be posting another soon as I recall any other FAQ on MySQL
tag. Thank you for reading this little article. I hope that you have atleast get even a little knowledge from this.
这是它。我将很快发布另一个关于MySQL标签的常见问题。感谢您阅读这篇小文章。我希望你至少能从中得到一点知识。
UPDATE 1
更新1
Trick No. 3 (Finding the latest record between two IDs)
Given Schema
给定的模式
CREATE TABLE userList
(
ID INT,
NAME VARCHAR(20),
CONSTRAINT us_pk PRIMARY KEY (ID),
CONSTRAINT us_uq UNIQUE (NAME)
);
INSERT INTO userList VALUES (1, 'Fluffeh');
INSERT INTO userList VALUES (2, 'John Woo');
INSERT INTO userList VALUES (3, 'hims056');
CREATE TABLE CONVERSATION
(
ID INT,
FROM_ID INT,
TO_ID INT,
MESSAGE VARCHAR(250),
DeliveryDate DATE
);
INSERT INTO CONVERSATION VALUES (1, 1, 2, 'hi john', '2012-01-01');
INSERT INTO CONVERSATION VALUES (2, 2, 1, 'hello fluff', '2012-01-02');
INSERT INTO CONVERSATION VALUES (3, 1, 3, 'hey hims', '2012-01-03');
INSERT INTO CONVERSATION VALUES (4, 1, 3, 'please reply', '2012-01-04');
INSERT INTO CONVERSATION VALUES (5, 3, 1, 'how are you?', '2012-01-05');
INSERT INTO CONVERSATION VALUES (6, 3, 2, 'sample message!', '2012-01-05');
QUESTION
问题
Find the latest conversation between two users.
找到两个用户之间的最新对话。
Solution
解决方案
SELECT b.Name SenderName,
c.Name RecipientName,
a.Message,
a.DeliveryDate
FROM Conversation a
INNER JOIN userList b
ON a.From_ID = b.ID
INNER JOIN userList c
ON a.To_ID = c.ID
WHERE (LEAST(a.FROM_ID, a.TO_ID), GREATEST(a.FROM_ID, a.TO_ID), DeliveryDate)
IN
(
SELECT LEAST(FROM_ID, TO_ID) minFROM,
GREATEST(FROM_ID, TO_ID) maxTo,
MAX(DeliveryDate) maxDate
FROM Conversation
GROUP BY minFROM, maxTo
)
SQLFiddle Demo
#3
58
Part 2 - Subqueries
Okay, now the boss has burst in again - I want a list of all of our cars with the brand and a total of how many of that brand we have!
好吧,现在老板又来了——我想要一份我们所有的汽车的名单,以及我们拥有的品牌的总数!
This is a great opportunity to use the next trick in our bag of SQL goodies - the subquery. If you are unfamiliar with the term, a subquery is a query that runs inside another query. There are many different ways to use them.
这是在我们的SQL goodies包中使用下一个技巧的好机会——子查询。如果您不熟悉这个术语,那么子查询就是在另一个查询中运行的查询。有很多种不同的使用方法。
For our request, lets first put a simple query together that will list each car and the brand:
对于我们的请求,首先让我们将一个简单的查询放在一起,列出每辆车和品牌:
select
a.ID,
b.brand
from
cars a
join brands b
on a.brand=b.ID
Now, if we wanted to simply get a count of cars sorted by brand, we could of course write this:
现在,如果我们想简单地得到按品牌分类的汽车数量,我们当然可以这样写:
select
b.brand,
count(a.ID) as countCars
from
cars a
join brands b
on a.brand=b.ID
group by
b.brand
+--------+-----------+
| brand | countCars |
+--------+-----------+
| BMW | 2 |
| Ford | 2 |
| Nissan | 1 |
| Smart | 1 |
| Toyota | 5 |
+--------+-----------+
So, we should be able to simply add in the count function to our original query right?
因此,我们应该能够简单地将count函数添加到原来的查询中,对吗?
select
a.ID,
b.brand,
count(a.ID) as countCars
from
cars a
join brands b
on a.brand=b.ID
group by
a.ID,
b.brand
+----+--------+-----------+
| ID | brand | countCars |
+----+--------+-----------+
| 1 | Toyota | 1 |
| 2 | Ford | 1 |
| 3 | Nissan | 1 |
| 4 | Smart | 1 |
| 5 | Toyota | 1 |
| 6 | BMW | 1 |
| 7 | Ford | 1 |
| 8 | Toyota | 1 |
| 9 | Toyota | 1 |
| 10 | BMW | 1 |
| 11 | Toyota | 1 |
+----+--------+-----------+
11 rows in set (0.00 sec)
Sadly, no, we can't do that. The reason is that when we add in the car ID (column a.ID) we have to add it into the group by - so now, when the count function works, there is only one ID matched per ID.
很遗憾,我们不能这么做。原因是,当我们添加car ID(列a.ID)时,我们必须将它添加到组中——所以现在,当count函数工作时,每个ID只有一个ID匹配。
This is where we can however use a subquery - in fact we can do two completely different types of subquery that will return the same results that we need for this. The first is to simply put the subquery in the select
clause. This means each time we get a row of data, the subquery will run off, get a column of data and then pop it into our row of data.
这是我们可以使用子查询的地方——事实上,我们可以做两种完全不同的子查询,它将返回我们需要的相同结果。第一个是简单地将子查询放在select子句中。这意味着每次我们获得一行数据时,子查询就会运行,获取一列数据,然后将其放入我们的数据行中。
select
a.ID,
b.brand,
(
select
count(c.ID)
from
cars c
where
a.brand=c.brand
) as countCars
from
cars a
join brands b
on a.brand=b.ID
+----+--------+-----------+
| ID | brand | countCars |
+----+--------+-----------+
| 2 | Ford | 2 |
| 7 | Ford | 2 |
| 1 | Toyota | 5 |
| 5 | Toyota | 5 |
| 8 | Toyota | 5 |
| 9 | Toyota | 5 |
| 11 | Toyota | 5 |
| 3 | Nissan | 1 |
| 4 | Smart | 1 |
| 6 | BMW | 2 |
| 10 | BMW | 2 |
+----+--------+-----------+
11 rows in set (0.00 sec)
And Bam!, this would do us. If you noticed though, this sub query will have to run for each and every single row of data we return. Even in this little example, we only have five different Brands of car, but the subquery ran eleven times as we have eleven rows of data that we are returning. So, in this case, it doesn't seem like the most efficient way to write code.
和Bam !这对我们有好处。如果您注意到,这个子查询将必须为我们返回的每一行数据运行。即使在这个小例子中,我们只有5个不同的汽车品牌,但子查询运行11次,因为我们有11行数据返回。因此,在这种情况下,它似乎不是编写代码的最有效的方法。
For a different approach, lets run a subquery and pretend it is a table:
对于不同的方法,让我们运行一个子查询并假设它是一个表:
select
a.ID,
b.brand,
d.countCars
from
cars a
join brands b
on a.brand=b.ID
join
(
select
c.brand,
count(c.ID) as countCars
from
cars c
group by
c.brand
) d
on a.brand=d.brand
+----+--------+-----------+
| ID | brand | countCars |
+----+--------+-----------+
| 1 | Toyota | 5 |
| 2 | Ford | 2 |
| 3 | Nissan | 1 |
| 4 | Smart | 1 |
| 5 | Toyota | 5 |
| 6 | BMW | 2 |
| 7 | Ford | 2 |
| 8 | Toyota | 5 |
| 9 | Toyota | 5 |
| 10 | BMW | 2 |
| 11 | Toyota | 5 |
+----+--------+-----------+
11 rows in set (0.00 sec)
Okay, so we have the same results (ordered slightly different - it seems the database wanted to return results ordered by the first column we picked this time) - but the same right numbers.
好的,我们有相同的结果(排序略有不同——看起来数据库想要返回由我们这次选择的第一列排序的结果)-但是相同的正确数字。
So, what's the difference between the two - and when should we use each type of subquery? First, lets make sure we understand how that second query works. We selected two tables in the from
clause of our query, and then wrote a query and told the database that it was in fact a table instead - which the database is perfectly happy with. There can be some benefits to using this method (as well as some limitations). Foremost is that this subquery ran once. If our database contained a large volume of data, there could well be a massive improvement over the first method. However, as we are using this as a table, we have to bring in extra rows of data - so that they can actually be joined back to our rows of data. We also have to be sure that there are enough rows of data if we are going to use a simple join like in the query above. If you recall, the join will only pull back rows that have matching data on both sides of the join. If we aren't careful, this could result in valid data not being returned from our cars table if there wasn't a matching row in this subquery.
那么,两者之间的区别是什么?我们应该什么时候使用每种类型的子查询?首先,让我们了解第二个查询是如何工作的。我们从查询的from子句中选择了两个表,然后编写了一个查询,并告诉数据库它实际上是一个表,而数据库对此非常满意。使用这个方法可能会有一些好处(也有一些限制)。最重要的是这个子查询运行一次。如果我们的数据库包含大量的数据,那么第一个方法可能会有很大的改进。但是,当我们将其用作表时,我们必须引入额外的数据行,这样它们就可以被连接到我们的数据行中。我们还必须确保有足够的数据行,如果我们要在上面的查询中使用一个简单的连接。如果您还记得,join只会拉回在连接的两端有匹配数据的行。如果我们不小心,如果在这个子查询中没有匹配行,这可能导致没有从cars表返回的有效数据。
Now, looking back at the first subquery, there are some limitations as well. because we are pulling data back into a single row, we can ONLY pull back one row of data. Subqueries used in the select
clause of a query very often use only an aggregate function such as sum
, count
, max
or another similar aggregate function. They don't have to, but that is often how they are written.
现在,回顾第一个子查询,也有一些限制。因为我们将数据拉回单行,我们只能收回一行数据。在查询的select子句中使用的子查询通常只使用聚合函数,例如sum、count、max或其他类似的聚合函数。他们不需要,但通常是这样写的。
So, before we move on, lets have a quick look at where else we can use a subquery. We can use it in the where
clause - now, this example is a little contrived as in our database, there are better ways of getting the following data, but seeing as it is only for an example, lets have a look:
所以,在我们继续之前,让我们快速浏览一下我们可以在哪里使用子查询。我们可以在where子句中使用它——现在,这个例子有点像在我们的数据库中,有更好的方法得到以下数据,但是看到它只是一个例子,让我们看一下:
select
ID,
brand
from
brands
where
brand like '%o%'
+----+--------+
| ID | brand |
+----+--------+
| 1 | Ford |
| 2 | Toyota |
| 6 | Holden |
+----+--------+
3 rows in set (0.00 sec)
This returns us a list of brand IDs and Brand names (the second column is only added to show us the brands) that contain the letter o
in the name.
这将返回一个品牌id和品牌名称的列表(第二列只是添加给我们的品牌),其中包含了字母o的名称。
Now, we could use the results of this query in a where clause this:
现在,我们可以在where子句中使用这个查询的结果:
select
a.ID,
b.brand
from
cars a
join brands b
on a.brand=b.ID
where
a.brand in
(
select
ID
from
brands
where
brand like '%o%'
)
+----+--------+
| ID | brand |
+----+--------+
| 2 | Ford |
| 7 | Ford |
| 1 | Toyota |
| 5 | Toyota |
| 8 | Toyota |
| 9 | Toyota |
| 11 | Toyota |
+----+--------+
7 rows in set (0.00 sec)
As you can see, even though the subquery was returning the three brand IDs, our cars table only had entries for two of them.
正如您所看到的,尽管子查询返回了三个品牌id,但我们的cars表只包含了其中两个id的条目。
In this case, for further detail, the subquery is working as if we wrote the following code:
在这种情况下,对于进一步的细节,子查询的工作方式就像我们编写了以下代码:
select
a.ID,
b.brand
from
cars a
join brands b
on a.brand=b.ID
where
a.brand in (1,2,6)
+----+--------+
| ID | brand |
+----+--------+
| 1 | Toyota |
| 2 | Ford |
| 5 | Toyota |
| 7 | Ford |
| 8 | Toyota |
| 9 | Toyota |
| 11 | Toyota |
+----+--------+
7 rows in set (0.00 sec)
Again, you can see how a subquery vs manual inputs has changed the order of the rows when returning from the database.
同样,您可以看到在从数据库返回时,子查询和手工输入如何更改了行的顺序。
While we are discussing subqueries, lets see what else we can do with a subquery:
在讨论子查询时,让我们看看还可以用子查询来做什么:
- You can place a subquery within another subquery, and so on and so on. There is a limit which depends on your database, but short of recursive functions of some insane and maniacal programmer, most folks will never hit that limit.
- 您可以在另一个子查询中放置子查询,等等。有一个限制依赖于您的数据库,但是缺少一些疯狂的和疯狂的程序员的递归函数,大多数人永远不会达到这个极限。
- You can place a number of subqueries into a single query, a few in the
select
clause, some in thefrom
clause and a couple more in thewhere
clause - just remember that each one you put in is making your query more complex and likely to take longer to execute. - 可以放置一个数量的子查询到一个查询中,一些在select子句中,一些在from子句和一些在where子句中,记住,每一个你是让你查询更复杂,可能需要更长的时间来执行。
If you need to write some efficient code, it can be beneficial to write the query a number of ways and see (either by timing it or by using an explain plan) which is the optimal query to get your results. The first way that works may not always be the best way of doing it.
如果您需要编写一些高效的代码,那么可以通过多种方式编写查询,并查看(通过计时或使用explain计划),这是获得结果的最佳查询。第一种方法可能并不总是最好的方法。
#4
55
Part 3 - Tricks and Efficient Code
MySQL in() efficiency
I thought I would add some extra bits, for tips and tricks that have come up.
我想我会添加一些额外的比特,来获取一些提示和技巧。
One question I see come up a fair bit, is How do I get non-matching rows from two tables and I see the answer most commonly accepted as something like the following (based on our cars and brands table - which has Holden listed as a brand, but does not appear in the cars table):
一个问题我看了不少,从两个表我怎么得到匹配的行,我看到答案通常接受类似以下(根据我们的汽车和品牌表——霍尔顿列为一个品牌,但不出现在汽车表):
select
a.ID,
a.brand
from
brands a
where
a.ID not in(select brand from cars)
And yes it will work.
是的,它会起作用。
+----+--------+
| ID | brand |
+----+--------+
| 6 | Holden |
+----+--------+
1 row in set (0.00 sec)
However it is not efficient in some database. Here is a link to a Stack Overflow question asking about it, and here is an excellent in depth article if you want to get into the nitty gritty.
然而,它在某些数据库中是无效的。这里有一个关于堆栈溢出问题的链接,如果你想深入了解细节,这里有一个很好的深度文章。
The short answer is, if the optimiser doesn't handle it efficiently, it may be much better to use a query like the following to get non matched rows:
简单的回答是,如果optimiser不能有效地处理它,那么使用如下的查询来获得不匹配的行可能会更好:
select
a.brand
from
brands a
left join cars b
on a.id=b.brand
where
b.brand is null
+--------+
| brand |
+--------+
| Holden |
+--------+
1 row in set (0.00 sec)
Update Table with same table in subquery
Ahhh, another oldie but goodie - the old You can't specify target table 'brands' for update in FROM clause.
Ahhh,另一个oldie但是goodie -老的你不能指定目标表“品牌”从子句更新。
MySQL will not allow you to run an update...
query with a subselect on the same table. Now, you might be thinking, why not just slap it into the where clause right? But what if you want to update only the row with the max()
date amoung a bunch of other rows? You can't exactly do that in a where clause.
MySQL不允许您运行更新…在同一张表上的子选择查询。现在,你可能会想,为什么不直接把它放到where子句中?但是,如果您只想更新与max()日期相匹配的行,那么该怎么办呢?你不能在where子句中这样做。
update
brands
set
brand='Holden'
where
id=
(select
id
from
brands
where
id=6);
ERROR 1093 (HY000): You can't specify target table 'brands'
for update in FROM clause
So, we can't do that eh? Well, not exactly. There is a sneaky workaround that a surprisingly large number of users don't know about - though it does include some hackery that you will need to pay attention to.
所以我们不能这么做?嗯,不完全是。有一种狡猾的变通方法,令人惊讶的是,许多用户并不知道——尽管它确实包含了一些你需要注意的hackery。
You can stick the subquery within another subquery, which puts enough of a gap between the two queries so that it will work. However, note that it might be safest to stick the query within a transaction - this will prevent any other changes being made to the tables while the query is running.
您可以在另一个子查询中插入子查询,这将使两个查询之间的差距足够大,以便它能够工作。但是,请注意,在事务中插入查询可能是最安全的——这将防止在查询运行时对表进行其他更改。
update
brands
set
brand='Holden'
where id=
(select
id
from
(select
id
from
brands
where
id=6
)
as updateTable);
Query OK, 0 rows affected (0.02 sec)
Rows matched: 1 Changed: 0 Warnings: 0
#5
16
You can use the concept of multiple queries in the FROM keyword. Let me show you one example:
您可以在FROM关键字中使用多个查询的概念。让我给你们举一个例子:
SELECT DISTINCT e.id,e.name,d.name,lap.lappy LAPTOP_MAKE,c_loc.cnty COUNTY
FROM (
SELECT c.id cnty,l.name
FROM county c, location l
WHERE c.id=l.county_id AND l.end_Date IS NOT NULL
) c_loc, emp e
INNER JOIN dept d ON e.deptno =d.id
LEFT JOIN
(
SELECT l.id lappy, c.name cmpy
FROM laptop l, company c
WHERE l.make = c.name
) lap ON e.cmpy_id=lap.cmpy
You can use as many tables as you want to. Use outer joins and union where ever it's necessary, even inside table subqueries.
您可以使用任意多的表。在需要时使用外部连接和连接,甚至在表子查询中。
That's a very easy method to involve as many as tables and fields.
这是一个非常简单的方法,可以包含很多表和字段。
#6
3
Hopes this makes it find the tables as you're reading through the thing:
希望这能让它在你阅读的时候找到桌子:
jsfiddle
mysql> show columns from colors;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(3) | NO | PRI | NULL | auto_increment |
| color | varchar(15) | YES | | NULL | |
| paint | varchar(10) | YES | | NULL | |
+-------+-------------+------+-----+---------+----------------+
#1
398
Part 1 - Joins and Unions
This answer covers:
这个答案包括:
- Part 1
- Joining two or more tables using an inner join (See the wikipedia entry for additional info)
- 使用内部连接连接两个或多个表(参见wikipedia条目以获取其他信息)
- How to use a union query
- 如何使用联合查询?
- Left and Right Outer Joins (this * answer is excellent to describe types of joins)
- 左和右外连接(这个*的答案很好地描述了连接的类型)
- Intersect queries (and how to reproduce them if your database doesn't support them) - this is a function of SQL-Server (see info) and part of the reason I wrote this whole thing in the first place.
- 交叉查询(以及如何在数据库不支持它们的情况下复制它们)——这是SQL-Server(参见info)的函数,也是我首先编写这整个东西的部分原因。
- 第1部分使用内连接加入两个或两个以上的表(见附加信息)的*条目如何使用联合查询左和右外连接(这*答案是优秀的描述类型的连接)相交查询(以及如何复制它们,如果数据库不支持)——这是一个函数的sql server(见信息)和部分原因我写这整件事放在第一位。
- Part 2
- Subqueries - what they are, where they can be used and what to watch out for
- 子查询——它们是什么,在哪里可以使用,需要注意什么。
- Cartesian joins AKA - Oh, the misery!
- 笛卡儿加入了——哦,痛苦!
- 第2部分的子查询——它们是什么,它们可以在哪里被使用,以及在笛卡尔坐标系中应该注意什么——哦,痛苦!
There are a number of ways to retrieve data from multiple tables in a database. In this answer, I will be using ANSI-92 join syntax. This may be different to a number of other tutorials out there which use the older ANSI-89 syntax (and if you are used to 89, may seem much less intuitive - but all I can say is to try it) as it is much easier to understand when the queries start getting more complex. Why use it? Is there a performance gain? The short answer is no, but it is easier to read once you get used to it. It is easier to read queries written by other folks using this syntax.
有许多方法可以从数据库中的多个表中检索数据。在这个答案中,我将使用ANSI-92连接语法。这可能是不同数量的其他教程,使用旧的ansi - 89的语法(如果你使用到89年,可能看起来更直观,但我能说的就是它),因为它是更容易理解当查询开始变得越来越复杂。为什么使用它呢?是否有业绩增长?简短的回答是没有,但是一旦你习惯了就更容易阅读了。使用这种语法阅读其他人编写的查询更容易。
I am also going to use the concept of a small caryard which has a database to keep track of what cars it has available. The owner has hired you as his IT Computer guy and expects you to be able to drop him the data that he asks for at the drop of a hat.
我还将使用一个小的车库的概念,它有一个数据库来跟踪它所拥有的汽车。老板雇佣了你作为他的IT电脑人员,并希望你能把他要求的数据交给他。
I have made a number of lookup tables that will be used by the final table. This will give us a reasonable model to work from. To start off, I will be running my queries against an example database that has the following structure. I will try to think of common mistakes that are made when starting out and explain what goes wrong with them - as well as of course showing how to correct them.
我已经做了一些查找表,这些表将被最终表使用。这将给我们一个合理的工作模式。首先,我将对一个具有以下结构的示例数据库运行我的查询。我将试着去思考一些常见的错误,这些错误是在开始和解释错误的时候产生的,当然也包括如何纠正错误。
The first table is simply a color listing so that we know what colors we have in the car yard.
第一张表是一个简单的颜色列表,这样我们就能知道在车场里有什么颜色。
mysql> create table colors(id int(3) not null auto_increment primary key,
-> color varchar(15), paint varchar(10));
Query OK, 0 rows affected (0.01 sec)
mysql> show columns from colors;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(3) | NO | PRI | NULL | auto_increment |
| color | varchar(15) | YES | | NULL | |
| paint | varchar(10) | YES | | NULL | |
+-------+-------------+------+-----+---------+----------------+
3 rows in set (0.01 sec)
mysql> insert into colors (color, paint) values ('Red', 'Metallic'),
-> ('Green', 'Gloss'), ('Blue', 'Metallic'),
-> ('White' 'Gloss'), ('Black' 'Gloss');
Query OK, 5 rows affected (0.00 sec)
Records: 5 Duplicates: 0 Warnings: 0
mysql> select * from colors;
+----+-------+----------+
| id | color | paint |
+----+-------+----------+
| 1 | Red | Metallic |
| 2 | Green | Gloss |
| 3 | Blue | Metallic |
| 4 | White | Gloss |
| 5 | Black | Gloss |
+----+-------+----------+
5 rows in set (0.00 sec)
The brands table identifies the different brands of the cars out caryard could possibly sell.
品牌表确定了不同品牌的汽车出车场可能会卖出去。
mysql> create table brands (id int(3) not null auto_increment primary key,
-> brand varchar(15));
Query OK, 0 rows affected (0.01 sec)
mysql> show columns from brands;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(3) | NO | PRI | NULL | auto_increment |
| brand | varchar(15) | YES | | NULL | |
+-------+-------------+------+-----+---------+----------------+
2 rows in set (0.01 sec)
mysql> insert into brands (brand) values ('Ford'), ('Toyota'),
-> ('Nissan'), ('Smart'), ('BMW');
Query OK, 5 rows affected (0.00 sec)
Records: 5 Duplicates: 0 Warnings: 0
mysql> select * from brands;
+----+--------+
| id | brand |
+----+--------+
| 1 | Ford |
| 2 | Toyota |
| 3 | Nissan |
| 4 | Smart |
| 5 | BMW |
+----+--------+
5 rows in set (0.00 sec)
The model table will cover off different types of cars, it is going to be simpler for this to use different car types rather than actual car models.
模型表将覆盖不同类型的汽车,使用不同的车型而不是实际的汽车模型将会更简单。
mysql> create table models (id int(3) not null auto_increment primary key,
-> model varchar(15));
Query OK, 0 rows affected (0.01 sec)
mysql> show columns from models;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(3) | NO | PRI | NULL | auto_increment |
| model | varchar(15) | YES | | NULL | |
+-------+-------------+------+-----+---------+----------------+
2 rows in set (0.00 sec)
mysql> insert into models (model) values ('Sports'), ('Sedan'), ('4WD'), ('Luxury');
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> select * from models;
+----+--------+
| id | model |
+----+--------+
| 1 | Sports |
| 2 | Sedan |
| 3 | 4WD |
| 4 | Luxury |
+----+--------+
4 rows in set (0.00 sec)
And finally, to tie up all these other tables, the table that ties everything together. The ID field is actually the unique lot number used to identify cars.
最后,把所有的桌子都绑起来,把所有的东西都绑在一起。ID字段实际上是用来标识汽车的唯一的批号。
mysql> create table cars (id int(3) not null auto_increment primary key,
-> color int(3), brand int(3), model int(3));
Query OK, 0 rows affected (0.01 sec)
mysql> show columns from cars;
+-------+--------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------+------+-----+---------+----------------+
| id | int(3) | NO | PRI | NULL | auto_increment |
| color | int(3) | YES | | NULL | |
| brand | int(3) | YES | | NULL | |
| model | int(3) | YES | | NULL | |
+-------+--------+------+-----+---------+----------------+
4 rows in set (0.00 sec)
mysql> insert into cars (color, brand, model) values (1,2,1), (3,1,2), (5,3,1),
-> (4,4,2), (2,2,3), (3,5,4), (4,1,3), (2,2,1), (5,2,3), (4,5,1);
Query OK, 10 rows affected (0.00 sec)
Records: 10 Duplicates: 0 Warnings: 0
mysql> select * from cars;
+----+-------+-------+-------+
| id | color | brand | model |
+----+-------+-------+-------+
| 1 | 1 | 2 | 1 |
| 2 | 3 | 1 | 2 |
| 3 | 5 | 3 | 1 |
| 4 | 4 | 4 | 2 |
| 5 | 2 | 2 | 3 |
| 6 | 3 | 5 | 4 |
| 7 | 4 | 1 | 3 |
| 8 | 2 | 2 | 1 |
| 9 | 5 | 2 | 3 |
| 10 | 4 | 5 | 1 |
+----+-------+-------+-------+
10 rows in set (0.00 sec)
This will give us enough data (I hope) to cover off the examples below of different types of joins and also give enough data to make them worthwhile.
这将给我们提供足够的数据(我希望),以覆盖不同类型的连接下面的示例,并提供足够的数据以使它们值得。
So getting into the grit of it, the boss wants to know The IDs of all the sports cars he has.
因此,老板想知道他拥有的所有跑车的id。
This is a simple two table join. We have a table that identifies the model and the table with the available stock in it. As you can see, the data in the model
column of the cars
table relates to the models
column of the cars
table we have. Now, we know that the models table has an ID of 1
for Sports
so lets write the join.
这是一个简单的两个表连接。我们有一个表,它标识模型和表,其中有可用的股票。如您所见,cars表模型列中的数据与我们所拥有的cars表的模型列有关。现在,我们知道模型表的运动ID为1,所以我们写入join。
select
ID,
model
from
cars
join models
on model=ID
So this query looks good right? We have identified the two tables and contain the information we need and use a join that correctly identifies what columns to join on.
这个查询看起来很好,对吗?我们已经识别了这两个表,并包含了我们需要的信息,并使用一个连接来正确地标识要连接的列。
ERROR 1052 (23000): Column 'ID' in field list is ambiguous
Oh noes! An error in our first query! Yes, and it is a plum. You see, the query has indeed got the right columns, but some of them exist in both tables, so the database gets confused about what actual column we mean and where. There are two solutions to solve this. The first is nice and simple, we can use tableName.columnName
to tell the database exactly what we mean, like this:
噢,不!第一个查询中的错误!是的,而且是李子。您可以看到,查询确实得到了正确的列,但是其中的一些列存在于两个表中,因此数据库对我们的实际列的含义和位置感到困惑。有两种解决方法。第一个很简单,我们可以使用tableName。columnName告诉数据库我们是什么意思,像这样:
select
cars.ID,
models.model
from
cars
join models
on cars.model=models.ID
+----+--------+
| ID | model |
+----+--------+
| 1 | Sports |
| 3 | Sports |
| 8 | Sports |
| 10 | Sports |
| 2 | Sedan |
| 4 | Sedan |
| 5 | 4WD |
| 7 | 4WD |
| 9 | 4WD |
| 6 | Luxury |
+----+--------+
10 rows in set (0.00 sec)
The other is probably more often used and is called table aliasing. The tables in this example have nice and short simple names, but typing out something like KPI_DAILY_SALES_BY_DEPARTMENT
would probably get old quickly, so a simple way is to nickname the table like this:
另一个可能更常用,称为表混叠。本例中的表具有很好的和简短的简单名称,但是键入诸如KPI_DAILY_SALES_BY_DEPARTMENT之类的东西可能很快就会过时,所以简单的方法就是给表起个昵称:
select
a.ID,
b.model
from
cars a
join models b
on a.model=b.ID
Now, back to the request. As you can see we have the information we need, but we also have information that wasn't asked for, so we need to include a where clause in the statement to only get the Sports cars as was asked. As I prefer the table alias method rather than using the table names over and over, I will stick to it from this point onwards.
现在,回到请求。你可以看到我们有我们需要的信息,但是我们也有一些没有被要求的信息,所以我们需要在声明中加入一个where子句,以得到被要求的跑车。由于我更喜欢使用表别名方法,而不是反复使用表名,所以从这一点开始,我将坚持使用这个方法。
Clearly, we need to add a where clause to our query. We can identify Sports cars either by ID=1
or model='Sports'
. As the ID is indexed and the primary key (and it happens to be less typing), lets use that in our query.
显然,我们需要在查询中添加where子句。我们可以通过ID=1或模型='Sports'来识别跑车。当ID被索引并且主键(而且它恰好是更少的类型)时,让我们在查询中使用它。
select
a.ID,
b.model
from
cars a
join models b
on a.model=b.ID
where
b.ID=1
+----+--------+
| ID | model |
+----+--------+
| 1 | Sports |
| 3 | Sports |
| 8 | Sports |
| 10 | Sports |
+----+--------+
4 rows in set (0.00 sec)
Bingo! The boss is happy. Of course, being a boss and never being happy with what he asked for, he looks at the information, then says I want the colors as well.
宾果!老板很高兴。当然,作为一个老板,从不满足于他所要求的,他看着这些信息,然后说我也想要这些颜色。
Okay, so we have a good part of our query already written, but we need to use a third table which is colors. Now, our main information table cars
stores the car color ID and this links back to the colors ID column. So, in a similar manner to the original, we can join a third table:
好的,我们已经编写了一个很好的查询部分,但是我们需要使用第三个表,也就是颜色。现在,我们的主要信息表汽车将汽车颜色标识和这个链接返回到颜色ID列。因此,以类似于原始的方式,我们可以加入第三个表:
select
a.ID,
b.model
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
where
b.ID=1
+----+--------+
| ID | model |
+----+--------+
| 1 | Sports |
| 3 | Sports |
| 8 | Sports |
| 10 | Sports |
+----+--------+
4 rows in set (0.00 sec)
Damn, although the table was correctly joined and the related columns were linked, we forgot to pull in the actual information from the new table that we just linked.
该死的,虽然表格被正确地连接起来,相关的列被链接起来,但是我们忘记了从我们刚刚链接的新表中提取实际信息。
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
where
b.ID=1
+----+--------+-------+
| ID | model | color |
+----+--------+-------+
| 1 | Sports | Red |
| 8 | Sports | Green |
| 10 | Sports | White |
| 3 | Sports | Black |
+----+--------+-------+
4 rows in set (0.00 sec)
Right, that's the boss off our back for a moment. Now, to explain some of this in a little more detail. As you can see, the from
clause in our statement links our main table (I often use a table that contains information rather than a lookup or dimension table. The query would work just as well with the tables all switched around, but make less sense when we come back to this query to read it in a few months time, so it is often best to try to write a query that will be nice and easy to understand - lay it out intuitively, use nice indenting so that everything is as clear as it can be. If you go on to teach others, try to instill these characteristics in their queries - especially if you will be troubleshooting them.
对,那是我们的老板。现在,为了更详细地解释这一点。如您所见,我们的语句中的from子句链接了我们的主表(我经常使用包含信息的表,而不是查找或维度表)。查询将工作表的所有交换,但是更少的意义当我们回到这个查询阅读它在几个月的时间,所以最好经常试着编写一个查询,将很容易理解——把它直观,使用漂亮的缩进,一切都是那样清晰。如果你继续教别人,试着在他们的查询中灌输这些特征——尤其是当你要排除他们的时候。
It is entirely possible to keep linking more and more tables in this manner.
以这种方式连接越来越多的表是完全可能的。
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
join brands d
on a.brand=d.ID
where
b.ID=1
While I forgot to include a table where we might want to join more than one column in the join
statement, here is an example. If the models
table had brand-specific models and therefore also had a column called brand
which linked back to the brands
table on the ID
field, it could be done as this:
当我忘记包含一个表时,我们可能想要在join语句中加入多个列,这里有一个例子。如果模型表有品牌特定的模型,因此也有一个名为brand的栏目,它可以链接到ID字段上的品牌表,可以这样做:
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
join brands d
on a.brand=d.ID
and b.brand=d.ID
where
b.ID=1
You can see, the query above not only links the joined tables to the main cars
table, but also specifies joins between the already joined tables. If this wasn't done, the result is called a cartesian join - which is dba speak for bad. A cartesian join is one where rows are returned because the information doesn't tell the database how to limit the results, so the query returns all the rows that fit the criteria.
您可以看到,上面的查询不仅将连接的表链接到主cars表,而且还指定已经连接的表之间的连接。如果没有这样做,结果被称为cartesian连接——这是dba说的不好。cartesian连接是一个返回的行,因为信息没有告诉数据库如何限制结果,所以查询返回符合条件的所有行。
So, to give an example of a cartesian join, lets run the following query:
因此,为了给出一个笛卡尔连接的例子,我们来运行以下查询:
select
a.ID,
b.model
from
cars a
join models b
+----+--------+
| ID | model |
+----+--------+
| 1 | Sports |
| 1 | Sedan |
| 1 | 4WD |
| 1 | Luxury |
| 2 | Sports |
| 2 | Sedan |
| 2 | 4WD |
| 2 | Luxury |
| 3 | Sports |
| 3 | Sedan |
| 3 | 4WD |
| 3 | Luxury |
| 4 | Sports |
| 4 | Sedan |
| 4 | 4WD |
| 4 | Luxury |
| 5 | Sports |
| 5 | Sedan |
| 5 | 4WD |
| 5 | Luxury |
| 6 | Sports |
| 6 | Sedan |
| 6 | 4WD |
| 6 | Luxury |
| 7 | Sports |
| 7 | Sedan |
| 7 | 4WD |
| 7 | Luxury |
| 8 | Sports |
| 8 | Sedan |
| 8 | 4WD |
| 8 | Luxury |
| 9 | Sports |
| 9 | Sedan |
| 9 | 4WD |
| 9 | Luxury |
| 10 | Sports |
| 10 | Sedan |
| 10 | 4WD |
| 10 | Luxury |
+----+--------+
40 rows in set (0.00 sec)
Good god, that's ugly. However, as far as the database is concerned, it is exactly what was asked for. In the query, we asked for for the ID
from cars
and the model
from models
. However, because we didn't specify how to join the tables, the database has matched every row from the first table with every row from the second table.
上帝啊,这是丑陋的。然而,就数据库而言,这正是所要求的。在查询中,我们询问了汽车的ID和模型的模型。但是,由于我们没有指定如何联接表,所以数据库已经将第一个表中的每一行与第二个表中的每一行匹配起来。
Okay, so the boss is back, and he wants more information again. I want the same list, but also include 4WDs in it.
好吧,老板回来了,他想要更多的信息。我想要同样的列表,但也包含了4WDs。
This however, gives us a great excuse to look at two different ways to accomplish this. We could add another condition to the where clause like this:
然而,这给了我们一个很好的借口来看看两种不同的方法来完成这个任务。我们可以在where子句中添加另一个条件:
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
join brands d
on a.brand=d.ID
where
b.ID=1
or b.ID=3
While the above will work perfectly well, lets look at it differently, this is a great excuse to show how a union
query will work.
虽然上面的工作将会很好地工作,但是让我们以不同的方式来看待它,这是一个很好的借口来展示一个联合查询是如何工作的。
We know that the following will return all the Sports cars:
我们知道以下将会返还所有的跑车:
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
join brands d
on a.brand=d.ID
where
b.ID=1
And the following would return all the 4WDs:
下面将返回所有4WDs:
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
join brands d
on a.brand=d.ID
where
b.ID=3
So by adding a union all
clause between them, the results of the second query will be appended to the results of the first query.
因此,通过在它们之间添加一个union all子句,第二个查询的结果将被附加到第一个查询的结果中。
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
join brands d
on a.brand=d.ID
where
b.ID=1
union all
select
a.ID,
b.model,
c.color
from
cars a
join models b
on a.model=b.ID
join colors c
on a.color=c.ID
join brands d
on a.brand=d.ID
where
b.ID=3
+----+--------+-------+
| ID | model | color |
+----+--------+-------+
| 1 | Sports | Red |
| 8 | Sports | Green |
| 10 | Sports | White |
| 3 | Sports | Black |
| 5 | 4WD | Green |
| 7 | 4WD | White |
| 9 | 4WD | Black |
+----+--------+-------+
7 rows in set (0.00 sec)
As you can see, the results of the first query are returned first, followed by the results of the second query.
如您所见,第一个查询的结果首先返回,然后是第二个查询的结果。
In this example, it would of course have been much easier to simply use the first query, but union
queries can be great for specific cases. They are a great way to return specific results from tables from tables that aren't easily joined together - or for that matter completely unrelated tables. There are a few rules to follow however.
在本例中,简单地使用第一个查询会更容易一些,但是联合查询对于特定的情况来说是很好的。它们是一种很好的方法,可以从不容易连接的表中返回特定的结果——或者是完全不相关的表。然而,有一些规则可以遵循。
- The column types from the first query must match the column types from every other query below.
- 第一个查询的列类型必须与下面所有其他查询的列类型匹配。
- The names of the columns from the first query will be used to identify the entire set of results.
- 来自第一个查询的列的名称将用于标识整个结果集。
- The number of columns in each query must be the same.
- 每个查询中的列数必须相同。
Now, you might be wondering what the difference is between using union
and union all
. A union
query will remove duplicates, while a union all
will not. This does mean that there is a small performance hit when using union
over union all
but the results may be worth it - I won't speculate on that sort of thing in this though.
现在,您可能想知道使用union和union all之间有什么区别。联合查询将删除重复项,而联合查询则不会。这确实意味着在使用union over union时,会有一个小的性能影响,但是结果可能是值得的——我不会在这方面做任何推测。
On this note, it might be worth noting some additional notes here.
在这张便条上,可能值得注意一些额外的注释。
- If we wanted to order the results, we can use an
order by
but you can't use the alias anymore. In the query above, appending anorder by a.ID
would result in an error - as far as the results are concerned, the column is calledID
rather thana.ID
- even though the same alias has been used in both queries. - 如果我们想要排序结果,我们可以使用order by,但是你不能再使用alias了。在上面的查询中,附加一个order by a。ID将导致一个错误——就结果而言,该列被称为ID而不是a。ID -即使在两个查询中都使用了相同的别名。
- We can only have one
order by
statement, and it must be as the last statement. - 我们只能通过语句来进行一个排序,它必须是最后一个语句。
For the next examples, I am adding a few extra rows to our tables.
在下一个示例中,我将在表中添加一些额外的行。
I have added Holden
to the brands table. I have also added a row into cars
that has the color
value of 12
- which has no reference in the colors table.
我已经在品牌表上加上了Holden。我还将一行添加到具有12的颜色值的汽车中,这在颜色表中没有引用。
Okay, the boss is back again, barking requests out - *I want a count of each brand we carry and the number of cars in it!` - Typical, we just get to an interesting section of our discussion and the boss wants more work.
好吧,老板又回来了,大声叫着要求——我想要一个我们所带的每个品牌的数量和里面的汽车数量!——典型的情况是,我们只是讨论了一个有趣的部分,老板想要更多的工作。
Rightyo, so the first thing we need to do is get a complete listing of possible brands.
好的,所以我们首先要做的是得到一个完整的可能的品牌清单。
select
a.brand
from
brands a
+--------+
| brand |
+--------+
| Ford |
| Toyota |
| Nissan |
| Smart |
| BMW |
| Holden |
+--------+
6 rows in set (0.00 sec)
Now, when we join this to our cars table we get the following result:
现在,当我们把它加入到汽车表中我们得到了如下结果:
select
a.brand
from
brands a
join cars b
on a.ID=b.brand
group by
a.brand
+--------+
| brand |
+--------+
| BMW |
| Ford |
| Nissan |
| Smart |
| Toyota |
+--------+
5 rows in set (0.00 sec)
Which is of course a problem - we aren't seeing any mention of the lovely Holden
brand I added.
当然,这是一个问题——我们没有看到任何提及我添加的可爱的霍尔顿品牌。
This is because a join looks for matching rows in both tables. As there is no data in cars that is of type Holden
it isn't returned. This is where we can use an outer
join. This will return all the results from one table whether they are matched in the other table or not:
这是因为连接在两个表中查找匹配的行。由于在汽车上没有数据,霍尔顿没有返回。这是我们可以使用外部连接的地方。这将返回一个表的所有结果,不管它们是否匹配在另一个表中:
select
a.brand
from
brands a
left outer join cars b
on a.ID=b.brand
group by
a.brand
+--------+
| brand |
+--------+
| BMW |
| Ford |
| Holden |
| Nissan |
| Smart |
| Toyota |
+--------+
6 rows in set (0.00 sec)
Now that we have that, we can add a lovely aggregate function to get a count and get the boss off our backs for a moment.
现在我们有了这个,我们可以添加一个可爱的聚合函数来得到一个计数,并让老板离开我们一会儿。
select
a.brand,
count(b.id) as countOfBrand
from
brands a
left outer join cars b
on a.ID=b.brand
group by
a.brand
+--------+--------------+
| brand | countOfBrand |
+--------+--------------+
| BMW | 2 |
| Ford | 2 |
| Holden | 0 |
| Nissan | 1 |
| Smart | 1 |
| Toyota | 5 |
+--------+--------------+
6 rows in set (0.00 sec)
And with that, away the boss skulks.
这样一来,老板就躲了起来。
Now, to explain this in some more detail, outer joins can be of the left
or right
type. The Left or Right defines which table is fully included. A left outer join
will include all the rows from the table on the left, while (you guessed it) a right outer join
brings all the results from the table on the right into the results.
现在,为了更详细地解释这一点,外部连接可以是左或右类型。左边或右边定义了哪个表被完全包含。左外连接将包括左侧表中的所有行,而(您猜对了)一个右外连接将所有的结果从表中引入到结果中。
Some databases will allow a full outer join
which will bring back results (whether matched or not) from both tables, but this isn't supported in all databases.
一些数据库将允许一个完整的外部连接,这将从两个表中返回结果(无论是否匹配),但这在所有数据库中都不受支持。
Now, I probably figure at this point in time, you are wondering whether or not you can merge join types in a query - and the answer is yes, you absolutely can.
现在,我可能会在这个时间点,你想知道你是否可以在一个查询中合并连接类型——答案是肯定的,你绝对可以。
select
b.brand,
c.color,
count(a.id) as countOfBrand
from
cars a
right outer join brands b
on b.ID=a.brand
join colors c
on a.color=c.ID
group by
a.brand,
c.color
+--------+-------+--------------+
| brand | color | countOfBrand |
+--------+-------+--------------+
| Ford | Blue | 1 |
| Ford | White | 1 |
| Toyota | Black | 1 |
| Toyota | Green | 2 |
| Toyota | Red | 1 |
| Nissan | Black | 1 |
| Smart | White | 1 |
| BMW | Blue | 1 |
| BMW | White | 1 |
+--------+-------+--------------+
9 rows in set (0.00 sec)
So, why is that not the results that were expected? It is because although we have selected the outer join from cars to brands, it wasn't specified in the join to colors - so that particular join will only bring back results that match in both tables.
那么,为什么这不是预期的结果呢?这是因为虽然我们选择了从汽车到品牌的外部连接,但它并没有在连接到颜色中指定,因此特定的连接只会在两个表中返回匹配的结果。
Here is the query that would work to get the results that we expected:
下面的查询将会得到我们所期望的结果:
select
a.brand,
c.color,
count(b.id) as countOfBrand
from
brands a
left outer join cars b
on a.ID=b.brand
left outer join colors c
on b.color=c.ID
group by
a.brand,
c.color
+--------+-------+--------------+
| brand | color | countOfBrand |
+--------+-------+--------------+
| BMW | Blue | 1 |
| BMW | White | 1 |
| Ford | Blue | 1 |
| Ford | White | 1 |
| Holden | NULL | 0 |
| Nissan | Black | 1 |
| Smart | White | 1 |
| Toyota | NULL | 1 |
| Toyota | Black | 1 |
| Toyota | Green | 2 |
| Toyota | Red | 1 |
+--------+-------+--------------+
11 rows in set (0.00 sec)
As we can see, we have two outer joins in the query and the results are coming through as expected.
正如我们所看到的,查询中有两个外部连接,结果按预期的方式进行。
Now, how about those other types of joins you ask? What about Intersections?
那么,你问的其他类型的连接呢?十字路口呢?
Well, not all databases support the intersection
but pretty much all databases will allow you to create an intersection through a join (or a well structured where statement at the least).
不是所有的数据库都支持这个交集,但是几乎所有的数据库都允许你通过连接创建一个交集(或者至少是一个结构良好的语句)。
An Intersection is a type of join somewhat similar to a union
as described above - but the difference is that it only returns rows of data that are identical (and I do mean identical) between the various individual queries joined by the union. Only rows that are identical in every regard will be returned.
交集是一种类似于前面描述的联合的连接类型,但不同之处在于,它只返回与union连接的各个独立查询之间相同的数据行(我的意思是相同的)。只有在所有方面相同的行才会被返回。
A simple example would be as such:
一个简单的例子就是:
select
*
from
colors
where
ID>2
intersect
select
*
from
colors
where
id<4
While a normal union
query would return all the rows of the table (the first query returning anything over ID>2
and the second anything having ID<4
) which would result in a full set, an intersect query would only return the row matching id=3
as it meets both criteria.
正常的union查询将返回表的所有行(第一个查询返回任何超过ID>2的查询,第二个查询有ID<4),这将导致一个完整的集合,一个intersect查询只返回与两个标准匹配的行匹配ID =3。
Now, if your database doesn't support an intersect
query, the above can be easily accomlished with the following query:
现在,如果您的数据库不支持交叉查询,那么可以通过以下查询轻松地解决上述问题:
select
a.ID,
a.color,
a.paint
from
colors a
join colors b
on a.ID=b.ID
where
a.ID>2
and b.ID<4
+----+-------+----------+
| ID | color | paint |
+----+-------+----------+
| 3 | Blue | Metallic |
+----+-------+----------+
1 row in set (0.00 sec)
If you wish to perform an intersection across two different tables using a database that doesn't inherently support an intersection query, you will need to create a join on every column of the tables.
如果希望使用一个不支持交叉查询的数据库在两个不同的表上执行交集,那么您需要在表的每一列上创建一个联接。
#2
93
Ok, I found this post very interesting and I would like to share some of my knowledge on creating a query. Thanks for this Fluffeh. Others who may read this and may feel that I'm wrong are 101% free to edit and criticise my answer. (Honestly, I feel very thankful for correcting my mistake(s).)
好的,我发现这个帖子非常有趣,我想分享一些关于创建查询的知识。感谢Fluffeh。其他人可能会读到这篇文章,可能觉得我错了,有101%的人可以*地编辑和批评我的答案。(坦白地说,我非常感谢能纠正我的错误。)
I'll be posting some of the frequently asked questions in MySQL
tag.
我将在MySQL标签中发布一些常见问题。
Trick No. 1 (rows that matches to multiple conditions)
Given this schema
鉴于这种模式
CREATE TABLE MovieList
(
ID INT,
MovieName VARCHAR(25),
CONSTRAINT ml_pk PRIMARY KEY (ID),
CONSTRAINT ml_uq UNIQUE (MovieName)
);
INSERT INTO MovieList VALUES (1, 'American Pie');
INSERT INTO MovieList VALUES (2, 'The Notebook');
INSERT INTO MovieList VALUES (3, 'Discovery Channel: Africa');
INSERT INTO MovieList VALUES (4, 'Mr. Bean');
INSERT INTO MovieList VALUES (5, 'Expendables 2');
CREATE TABLE CategoryList
(
MovieID INT,
CategoryName VARCHAR(25),
CONSTRAINT cl_uq UNIQUE(MovieID, CategoryName),
CONSTRAINT cl_fk FOREIGN KEY (MovieID) REFERENCES MovieList(ID)
);
INSERT INTO CategoryList VALUES (1, 'Comedy');
INSERT INTO CategoryList VALUES (1, 'Romance');
INSERT INTO CategoryList VALUES (2, 'Romance');
INSERT INTO CategoryList VALUES (2, 'Drama');
INSERT INTO CategoryList VALUES (3, 'Documentary');
INSERT INTO CategoryList VALUES (4, 'Comedy');
INSERT INTO CategoryList VALUES (5, 'Comedy');
INSERT INTO CategoryList VALUES (5, 'Action');
QUESTION
问题
Find all movies that belong to at least both Comedy
and Romance
categories.
找到所有至少属于喜剧和爱情类的电影。
Solution
解决方案
This question can be very tricky sometimes. It may seem that a query like this will be the answer:-
这个问题有时候很棘手。看起来像这样的查询将会是答案:-。
SELECT DISTINCT a.MovieName
FROM MovieList a
INNER JOIN CategoryList b
ON a.ID = b.MovieID
WHERE b.CategoryName = 'Comedy' AND
b.CategoryName = 'Romance'
SQLFiddle Demo
which is definitely very wrong because it produces no result. The explanation of this is that there is only one valid value of CategoryName
on each row. For instance, the first condition returns true, the second condition is always false. Thus, by using AND
operator, both condition should be true; otherwise, it will be false. Another query is like this,
这绝对是非常错误的,因为它不会产生任何结果。对此的解释是,每一行只有一个有效的分类名称。例如,第一个条件返回true,第二个条件总是false。因此,通过使用和操作,这两种情况都应该是正确的;否则,它将是假的。另一个查询是这样的,
SELECT DISTINCT a.MovieName
FROM MovieList a
INNER JOIN CategoryList b
ON a.ID = b.MovieID
WHERE b.CategoryName IN ('Comedy','Romance')
SQLFiddle Demo
and the result is still incorrect because it matches to record that has at least one match on the categoryName
. The real solution would be by counting the number of record instances per movie. The number of instance should match to the total number of the values supplied in the condition.
结果仍然是不正确的,因为它匹配的记录至少有一个匹配的分类名。真正的解决方案是通过计算每部电影的记录实例数。实例的数量应该与条件中提供的值的总数相匹配。
SELECT a.MovieName
FROM MovieList a
INNER JOIN CategoryList b
ON a.ID = b.MovieID
WHERE b.CategoryName IN ('Comedy','Romance')
GROUP BY a.MovieName
HAVING COUNT(*) = 2
SQLFiddle Demo (the answer)
- SQL of Relational Division
- SQL关系部门
Trick No. 2 (maximum record for each entry)
Given schema,
给定的模式,
CREATE TABLE Software
(
ID INT,
SoftwareName VARCHAR(25),
Descriptions VARCHAR(150),
CONSTRAINT sw_pk PRIMARY KEY (ID),
CONSTRAINT sw_uq UNIQUE (SoftwareName)
);
INSERT INTO Software VALUES (1,'PaintMe','used for photo editing');
INSERT INTO Software VALUES (2,'World Map','contains map of different places of the world');
INSERT INTO Software VALUES (3,'Dictionary','contains description, synonym, antonym of the words');
CREATE TABLE VersionList
(
SoftwareID INT,
VersionNo INT,
DateReleased DATE,
CONSTRAINT sw_uq UNIQUE (SoftwareID, VersionNo),
CONSTRAINT sw_fk FOREIGN KEY (SOftwareID) REFERENCES Software(ID)
);
INSERT INTO VersionList VALUES (3, 2, '2009-12-01');
INSERT INTO VersionList VALUES (3, 1, '2009-11-01');
INSERT INTO VersionList VALUES (3, 3, '2010-01-01');
INSERT INTO VersionList VALUES (2, 2, '2010-12-01');
INSERT INTO VersionList VALUES (2, 1, '2009-12-01');
INSERT INTO VersionList VALUES (1, 3, '2011-12-01');
INSERT INTO VersionList VALUES (1, 2, '2010-12-01');
INSERT INTO VersionList VALUES (1, 1, '2009-12-01');
INSERT INTO VersionList VALUES (1, 4, '2012-12-01');
QUESTION
问题
Find the latest version on each software. Display the following columns: SoftwareName
,Descriptions
,LatestVersion
(from VersionNo column),DateReleased
在每个软件上找到最新的版本。显示下列列:软件名称、描述、最新版本(来自VersionNo列)、daterele。
Solution
解决方案
Some SQL developers mistakenly use MAX()
aggregate function. They tend to create like this,
一些SQL开发人员错误地使用MAX()聚合函数。他们倾向于这样创造,
SELECT a.SoftwareName, a.Descriptions,
MAX(b.VersionNo) AS LatestVersion, b.DateReleased
FROM Software a
INNER JOIN VersionList b
ON a.ID = b.SoftwareID
GROUP BY a.ID
ORDER BY a.ID
SQLFiddle Demo
(most RDBMS generates a syntax error on this because of not specifying some of the non-aggregated columns on the group by
clause) the result produces the correct LatestVersion
on each software but obviously the DateReleased
are incorrect. MySQL
doesn't support Window Functions
and Common Table Expression
yet as some RDBMS do already. The workaround on this problem is to create a subquery
which gets the individual maximum versionNo
on each software and later on be joined on the other tables.
(大多数RDBMS在这方面产生了一个语法错误,因为没有在group by子句中指定一些非聚合的列),结果会在每个软件上生成正确的LatestVersion,但是很明显,daterele是不正确的。MySQL不支持窗口函数和公共表表达式,而有些RDBMS已经这样做了。解决这个问题的方法是创建一个子查询,该子查询在每个软件上获得最大的版本号,然后在其他表上加入。
SELECT a.SoftwareName, a.Descriptions,
b.LatestVersion, c.DateReleased
FROM Software a
INNER JOIN
(
SELECT SoftwareID, MAX(VersionNO) LatestVersion
FROM VersionList
GROUP BY SoftwareID
) b ON a.ID = b.SoftwareID
INNER JOIN VersionList c
ON c.SoftwareID = b.SoftwareID AND
c.VersionNO = b.LatestVersion
GROUP BY a.ID
ORDER BY a.ID
SQLFiddle Demo (the answer)
So that was it. I'll be posting another soon as I recall any other FAQ on MySQL
tag. Thank you for reading this little article. I hope that you have atleast get even a little knowledge from this.
这是它。我将很快发布另一个关于MySQL标签的常见问题。感谢您阅读这篇小文章。我希望你至少能从中得到一点知识。
UPDATE 1
更新1
Trick No. 3 (Finding the latest record between two IDs)
Given Schema
给定的模式
CREATE TABLE userList
(
ID INT,
NAME VARCHAR(20),
CONSTRAINT us_pk PRIMARY KEY (ID),
CONSTRAINT us_uq UNIQUE (NAME)
);
INSERT INTO userList VALUES (1, 'Fluffeh');
INSERT INTO userList VALUES (2, 'John Woo');
INSERT INTO userList VALUES (3, 'hims056');
CREATE TABLE CONVERSATION
(
ID INT,
FROM_ID INT,
TO_ID INT,
MESSAGE VARCHAR(250),
DeliveryDate DATE
);
INSERT INTO CONVERSATION VALUES (1, 1, 2, 'hi john', '2012-01-01');
INSERT INTO CONVERSATION VALUES (2, 2, 1, 'hello fluff', '2012-01-02');
INSERT INTO CONVERSATION VALUES (3, 1, 3, 'hey hims', '2012-01-03');
INSERT INTO CONVERSATION VALUES (4, 1, 3, 'please reply', '2012-01-04');
INSERT INTO CONVERSATION VALUES (5, 3, 1, 'how are you?', '2012-01-05');
INSERT INTO CONVERSATION VALUES (6, 3, 2, 'sample message!', '2012-01-05');
QUESTION
问题
Find the latest conversation between two users.
找到两个用户之间的最新对话。
Solution
解决方案
SELECT b.Name SenderName,
c.Name RecipientName,
a.Message,
a.DeliveryDate
FROM Conversation a
INNER JOIN userList b
ON a.From_ID = b.ID
INNER JOIN userList c
ON a.To_ID = c.ID
WHERE (LEAST(a.FROM_ID, a.TO_ID), GREATEST(a.FROM_ID, a.TO_ID), DeliveryDate)
IN
(
SELECT LEAST(FROM_ID, TO_ID) minFROM,
GREATEST(FROM_ID, TO_ID) maxTo,
MAX(DeliveryDate) maxDate
FROM Conversation
GROUP BY minFROM, maxTo
)
SQLFiddle Demo
#3
58
Part 2 - Subqueries
Okay, now the boss has burst in again - I want a list of all of our cars with the brand and a total of how many of that brand we have!
好吧,现在老板又来了——我想要一份我们所有的汽车的名单,以及我们拥有的品牌的总数!
This is a great opportunity to use the next trick in our bag of SQL goodies - the subquery. If you are unfamiliar with the term, a subquery is a query that runs inside another query. There are many different ways to use them.
这是在我们的SQL goodies包中使用下一个技巧的好机会——子查询。如果您不熟悉这个术语,那么子查询就是在另一个查询中运行的查询。有很多种不同的使用方法。
For our request, lets first put a simple query together that will list each car and the brand:
对于我们的请求,首先让我们将一个简单的查询放在一起,列出每辆车和品牌:
select
a.ID,
b.brand
from
cars a
join brands b
on a.brand=b.ID
Now, if we wanted to simply get a count of cars sorted by brand, we could of course write this:
现在,如果我们想简单地得到按品牌分类的汽车数量,我们当然可以这样写:
select
b.brand,
count(a.ID) as countCars
from
cars a
join brands b
on a.brand=b.ID
group by
b.brand
+--------+-----------+
| brand | countCars |
+--------+-----------+
| BMW | 2 |
| Ford | 2 |
| Nissan | 1 |
| Smart | 1 |
| Toyota | 5 |
+--------+-----------+
So, we should be able to simply add in the count function to our original query right?
因此,我们应该能够简单地将count函数添加到原来的查询中,对吗?
select
a.ID,
b.brand,
count(a.ID) as countCars
from
cars a
join brands b
on a.brand=b.ID
group by
a.ID,
b.brand
+----+--------+-----------+
| ID | brand | countCars |
+----+--------+-----------+
| 1 | Toyota | 1 |
| 2 | Ford | 1 |
| 3 | Nissan | 1 |
| 4 | Smart | 1 |
| 5 | Toyota | 1 |
| 6 | BMW | 1 |
| 7 | Ford | 1 |
| 8 | Toyota | 1 |
| 9 | Toyota | 1 |
| 10 | BMW | 1 |
| 11 | Toyota | 1 |
+----+--------+-----------+
11 rows in set (0.00 sec)
Sadly, no, we can't do that. The reason is that when we add in the car ID (column a.ID) we have to add it into the group by - so now, when the count function works, there is only one ID matched per ID.
很遗憾,我们不能这么做。原因是,当我们添加car ID(列a.ID)时,我们必须将它添加到组中——所以现在,当count函数工作时,每个ID只有一个ID匹配。
This is where we can however use a subquery - in fact we can do two completely different types of subquery that will return the same results that we need for this. The first is to simply put the subquery in the select
clause. This means each time we get a row of data, the subquery will run off, get a column of data and then pop it into our row of data.
这是我们可以使用子查询的地方——事实上,我们可以做两种完全不同的子查询,它将返回我们需要的相同结果。第一个是简单地将子查询放在select子句中。这意味着每次我们获得一行数据时,子查询就会运行,获取一列数据,然后将其放入我们的数据行中。
select
a.ID,
b.brand,
(
select
count(c.ID)
from
cars c
where
a.brand=c.brand
) as countCars
from
cars a
join brands b
on a.brand=b.ID
+----+--------+-----------+
| ID | brand | countCars |
+----+--------+-----------+
| 2 | Ford | 2 |
| 7 | Ford | 2 |
| 1 | Toyota | 5 |
| 5 | Toyota | 5 |
| 8 | Toyota | 5 |
| 9 | Toyota | 5 |
| 11 | Toyota | 5 |
| 3 | Nissan | 1 |
| 4 | Smart | 1 |
| 6 | BMW | 2 |
| 10 | BMW | 2 |
+----+--------+-----------+
11 rows in set (0.00 sec)
And Bam!, this would do us. If you noticed though, this sub query will have to run for each and every single row of data we return. Even in this little example, we only have five different Brands of car, but the subquery ran eleven times as we have eleven rows of data that we are returning. So, in this case, it doesn't seem like the most efficient way to write code.
和Bam !这对我们有好处。如果您注意到,这个子查询将必须为我们返回的每一行数据运行。即使在这个小例子中,我们只有5个不同的汽车品牌,但子查询运行11次,因为我们有11行数据返回。因此,在这种情况下,它似乎不是编写代码的最有效的方法。
For a different approach, lets run a subquery and pretend it is a table:
对于不同的方法,让我们运行一个子查询并假设它是一个表:
select
a.ID,
b.brand,
d.countCars
from
cars a
join brands b
on a.brand=b.ID
join
(
select
c.brand,
count(c.ID) as countCars
from
cars c
group by
c.brand
) d
on a.brand=d.brand
+----+--------+-----------+
| ID | brand | countCars |
+----+--------+-----------+
| 1 | Toyota | 5 |
| 2 | Ford | 2 |
| 3 | Nissan | 1 |
| 4 | Smart | 1 |
| 5 | Toyota | 5 |
| 6 | BMW | 2 |
| 7 | Ford | 2 |
| 8 | Toyota | 5 |
| 9 | Toyota | 5 |
| 10 | BMW | 2 |
| 11 | Toyota | 5 |
+----+--------+-----------+
11 rows in set (0.00 sec)
Okay, so we have the same results (ordered slightly different - it seems the database wanted to return results ordered by the first column we picked this time) - but the same right numbers.
好的,我们有相同的结果(排序略有不同——看起来数据库想要返回由我们这次选择的第一列排序的结果)-但是相同的正确数字。
So, what's the difference between the two - and when should we use each type of subquery? First, lets make sure we understand how that second query works. We selected two tables in the from
clause of our query, and then wrote a query and told the database that it was in fact a table instead - which the database is perfectly happy with. There can be some benefits to using this method (as well as some limitations). Foremost is that this subquery ran once. If our database contained a large volume of data, there could well be a massive improvement over the first method. However, as we are using this as a table, we have to bring in extra rows of data - so that they can actually be joined back to our rows of data. We also have to be sure that there are enough rows of data if we are going to use a simple join like in the query above. If you recall, the join will only pull back rows that have matching data on both sides of the join. If we aren't careful, this could result in valid data not being returned from our cars table if there wasn't a matching row in this subquery.
那么,两者之间的区别是什么?我们应该什么时候使用每种类型的子查询?首先,让我们了解第二个查询是如何工作的。我们从查询的from子句中选择了两个表,然后编写了一个查询,并告诉数据库它实际上是一个表,而数据库对此非常满意。使用这个方法可能会有一些好处(也有一些限制)。最重要的是这个子查询运行一次。如果我们的数据库包含大量的数据,那么第一个方法可能会有很大的改进。但是,当我们将其用作表时,我们必须引入额外的数据行,这样它们就可以被连接到我们的数据行中。我们还必须确保有足够的数据行,如果我们要在上面的查询中使用一个简单的连接。如果您还记得,join只会拉回在连接的两端有匹配数据的行。如果我们不小心,如果在这个子查询中没有匹配行,这可能导致没有从cars表返回的有效数据。
Now, looking back at the first subquery, there are some limitations as well. because we are pulling data back into a single row, we can ONLY pull back one row of data. Subqueries used in the select
clause of a query very often use only an aggregate function such as sum
, count
, max
or another similar aggregate function. They don't have to, but that is often how they are written.
现在,回顾第一个子查询,也有一些限制。因为我们将数据拉回单行,我们只能收回一行数据。在查询的select子句中使用的子查询通常只使用聚合函数,例如sum、count、max或其他类似的聚合函数。他们不需要,但通常是这样写的。
So, before we move on, lets have a quick look at where else we can use a subquery. We can use it in the where
clause - now, this example is a little contrived as in our database, there are better ways of getting the following data, but seeing as it is only for an example, lets have a look:
所以,在我们继续之前,让我们快速浏览一下我们可以在哪里使用子查询。我们可以在where子句中使用它——现在,这个例子有点像在我们的数据库中,有更好的方法得到以下数据,但是看到它只是一个例子,让我们看一下:
select
ID,
brand
from
brands
where
brand like '%o%'
+----+--------+
| ID | brand |
+----+--------+
| 1 | Ford |
| 2 | Toyota |
| 6 | Holden |
+----+--------+
3 rows in set (0.00 sec)
This returns us a list of brand IDs and Brand names (the second column is only added to show us the brands) that contain the letter o
in the name.
这将返回一个品牌id和品牌名称的列表(第二列只是添加给我们的品牌),其中包含了字母o的名称。
Now, we could use the results of this query in a where clause this:
现在,我们可以在where子句中使用这个查询的结果:
select
a.ID,
b.brand
from
cars a
join brands b
on a.brand=b.ID
where
a.brand in
(
select
ID
from
brands
where
brand like '%o%'
)
+----+--------+
| ID | brand |
+----+--------+
| 2 | Ford |
| 7 | Ford |
| 1 | Toyota |
| 5 | Toyota |
| 8 | Toyota |
| 9 | Toyota |
| 11 | Toyota |
+----+--------+
7 rows in set (0.00 sec)
As you can see, even though the subquery was returning the three brand IDs, our cars table only had entries for two of them.
正如您所看到的,尽管子查询返回了三个品牌id,但我们的cars表只包含了其中两个id的条目。
In this case, for further detail, the subquery is working as if we wrote the following code:
在这种情况下,对于进一步的细节,子查询的工作方式就像我们编写了以下代码:
select
a.ID,
b.brand
from
cars a
join brands b
on a.brand=b.ID
where
a.brand in (1,2,6)
+----+--------+
| ID | brand |
+----+--------+
| 1 | Toyota |
| 2 | Ford |
| 5 | Toyota |
| 7 | Ford |
| 8 | Toyota |
| 9 | Toyota |
| 11 | Toyota |
+----+--------+
7 rows in set (0.00 sec)
Again, you can see how a subquery vs manual inputs has changed the order of the rows when returning from the database.
同样,您可以看到在从数据库返回时,子查询和手工输入如何更改了行的顺序。
While we are discussing subqueries, lets see what else we can do with a subquery:
在讨论子查询时,让我们看看还可以用子查询来做什么:
- You can place a subquery within another subquery, and so on and so on. There is a limit which depends on your database, but short of recursive functions of some insane and maniacal programmer, most folks will never hit that limit.
- 您可以在另一个子查询中放置子查询,等等。有一个限制依赖于您的数据库,但是缺少一些疯狂的和疯狂的程序员的递归函数,大多数人永远不会达到这个极限。
- You can place a number of subqueries into a single query, a few in the
select
clause, some in thefrom
clause and a couple more in thewhere
clause - just remember that each one you put in is making your query more complex and likely to take longer to execute. - 可以放置一个数量的子查询到一个查询中,一些在select子句中,一些在from子句和一些在where子句中,记住,每一个你是让你查询更复杂,可能需要更长的时间来执行。
If you need to write some efficient code, it can be beneficial to write the query a number of ways and see (either by timing it or by using an explain plan) which is the optimal query to get your results. The first way that works may not always be the best way of doing it.
如果您需要编写一些高效的代码,那么可以通过多种方式编写查询,并查看(通过计时或使用explain计划),这是获得结果的最佳查询。第一种方法可能并不总是最好的方法。
#4
55
Part 3 - Tricks and Efficient Code
MySQL in() efficiency
I thought I would add some extra bits, for tips and tricks that have come up.
我想我会添加一些额外的比特,来获取一些提示和技巧。
One question I see come up a fair bit, is How do I get non-matching rows from two tables and I see the answer most commonly accepted as something like the following (based on our cars and brands table - which has Holden listed as a brand, but does not appear in the cars table):
一个问题我看了不少,从两个表我怎么得到匹配的行,我看到答案通常接受类似以下(根据我们的汽车和品牌表——霍尔顿列为一个品牌,但不出现在汽车表):
select
a.ID,
a.brand
from
brands a
where
a.ID not in(select brand from cars)
And yes it will work.
是的,它会起作用。
+----+--------+
| ID | brand |
+----+--------+
| 6 | Holden |
+----+--------+
1 row in set (0.00 sec)
However it is not efficient in some database. Here is a link to a Stack Overflow question asking about it, and here is an excellent in depth article if you want to get into the nitty gritty.
然而,它在某些数据库中是无效的。这里有一个关于堆栈溢出问题的链接,如果你想深入了解细节,这里有一个很好的深度文章。
The short answer is, if the optimiser doesn't handle it efficiently, it may be much better to use a query like the following to get non matched rows:
简单的回答是,如果optimiser不能有效地处理它,那么使用如下的查询来获得不匹配的行可能会更好:
select
a.brand
from
brands a
left join cars b
on a.id=b.brand
where
b.brand is null
+--------+
| brand |
+--------+
| Holden |
+--------+
1 row in set (0.00 sec)
Update Table with same table in subquery
Ahhh, another oldie but goodie - the old You can't specify target table 'brands' for update in FROM clause.
Ahhh,另一个oldie但是goodie -老的你不能指定目标表“品牌”从子句更新。
MySQL will not allow you to run an update...
query with a subselect on the same table. Now, you might be thinking, why not just slap it into the where clause right? But what if you want to update only the row with the max()
date amoung a bunch of other rows? You can't exactly do that in a where clause.
MySQL不允许您运行更新…在同一张表上的子选择查询。现在,你可能会想,为什么不直接把它放到where子句中?但是,如果您只想更新与max()日期相匹配的行,那么该怎么办呢?你不能在where子句中这样做。
update
brands
set
brand='Holden'
where
id=
(select
id
from
brands
where
id=6);
ERROR 1093 (HY000): You can't specify target table 'brands'
for update in FROM clause
So, we can't do that eh? Well, not exactly. There is a sneaky workaround that a surprisingly large number of users don't know about - though it does include some hackery that you will need to pay attention to.
所以我们不能这么做?嗯,不完全是。有一种狡猾的变通方法,令人惊讶的是,许多用户并不知道——尽管它确实包含了一些你需要注意的hackery。
You can stick the subquery within another subquery, which puts enough of a gap between the two queries so that it will work. However, note that it might be safest to stick the query within a transaction - this will prevent any other changes being made to the tables while the query is running.
您可以在另一个子查询中插入子查询,这将使两个查询之间的差距足够大,以便它能够工作。但是,请注意,在事务中插入查询可能是最安全的——这将防止在查询运行时对表进行其他更改。
update
brands
set
brand='Holden'
where id=
(select
id
from
(select
id
from
brands
where
id=6
)
as updateTable);
Query OK, 0 rows affected (0.02 sec)
Rows matched: 1 Changed: 0 Warnings: 0
#5
16
You can use the concept of multiple queries in the FROM keyword. Let me show you one example:
您可以在FROM关键字中使用多个查询的概念。让我给你们举一个例子:
SELECT DISTINCT e.id,e.name,d.name,lap.lappy LAPTOP_MAKE,c_loc.cnty COUNTY
FROM (
SELECT c.id cnty,l.name
FROM county c, location l
WHERE c.id=l.county_id AND l.end_Date IS NOT NULL
) c_loc, emp e
INNER JOIN dept d ON e.deptno =d.id
LEFT JOIN
(
SELECT l.id lappy, c.name cmpy
FROM laptop l, company c
WHERE l.make = c.name
) lap ON e.cmpy_id=lap.cmpy
You can use as many tables as you want to. Use outer joins and union where ever it's necessary, even inside table subqueries.
您可以使用任意多的表。在需要时使用外部连接和连接,甚至在表子查询中。
That's a very easy method to involve as many as tables and fields.
这是一个非常简单的方法,可以包含很多表和字段。
#6
3
Hopes this makes it find the tables as you're reading through the thing:
希望这能让它在你阅读的时候找到桌子:
jsfiddle
mysql> show columns from colors;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(3) | NO | PRI | NULL | auto_increment |
| color | varchar(15) | YES | | NULL | |
| paint | varchar(10) | YES | | NULL | |
+-------+-------------+------+-----+---------+----------------+