I'm using the Stack Exchange Data Explorer to learn SQL, but I think the fundamentals of the question is applicable to other databases.
我正在使用Stack Exchange Data Explorer学习SQL,但是我认为这个问题的基本原理适用于其他数据库。
I'm trying to query the Badges
table, which according to Stexdex (that's what I'm going to call it from now on) has the following schema:
我正在查询badge表,根据Stexdex(我从现在开始将其命名为badge table)有以下模式:
- Badges
- Id
- Id
- UserId
- 用户标识
- Name
- 的名字
- Date
- 日期
- 徽章Id UserId名称日期
This works well for badges like [Epic]
and [Legendary]
which have unique names, but the silver and gold tag-specific badges seems to be mixed in together by having the same exact name.
这对于像史诗和传奇这样有独特名字的徽章来说是很有效的,但是银色和金色标签的徽章似乎由于有着相同的名字而混合在一起。
Here's an example query I wrote for [mysql]
tag:
下面是我为[mysql]标签编写的查询示例:
SELECT
UserId as [User Link],
Date
FROM
Badges
Where
Name = 'mysql'
Order By
Date ASC
The (slightly annotated) output is: as seen on stexdex:
输出(稍作注释)为:如stexdex:
User Link Date
--------------- ------------------- // all for silver except where noted
Bill Karwin 2009-02-20 11:00:25
Quassnoi 2009-06-01 10:00:16
Greg 2009-10-22 10:00:25
Quassnoi 2009-10-31 10:00:24 // for gold
Bill Karwin 2009-11-23 11:00:30 // for gold
cletus 2010-01-01 11:00:23
OMG Ponies 2010-01-03 11:00:48
Pascal MARTIN 2010-02-17 11:00:29
Mark Byers 2010-04-07 10:00:35
Daniel Vassallo 2010-05-14 10:00:38
This is consistent with the current list of silver and gold earners at the moment of this writing, but to speak in more timeless terms, as of the end of May 2010 only 2 users have earned the gold [mysql]
tag: Quassnoi and Bill Karwin, as evidenced in the above result by their names being the only ones that appear twice.
这是符合当前列表目前金银收入者撰写本文时,但说话更永恒的条款,截至2010年5月底只有2用户获得黄金(mysql)标签:Quassnoi和比尔Karwin,正如在上面的名字是唯一的结果,出现两次。
So this is the way I understand it:
这就是我理解的方式:
- The first time an
Id
appears (in chronological order) is for the silver badge - 第一次出现Id(按时间顺序)是为了表示银色徽章
- The second time is for the gold
- 第二次是黄金
Now, the above result mixes the silver and gold entries together. My questions are:
现在,上面的结果将银和金混合在一起。我的问题是:
- Is this a typical design, or are there much friendlier schema/normalization/whatever you call it?
- 这是典型的设计,还是有更友好的模式/规范化/无论你怎么称呼它?
- In the current design, how would you query the silver and gold badges separately?
-
GROUP BY Id
and picking the min/max or first/second by theDate
somehow? - 按Id分组,取最小值/最大值还是按日期排第一/秒?
- How can you write a query that lists all the silver badges first then all the gold badges next?
- Imagine also that the "real" query may be more complicated, i.e. not just listing by date.
- 想象一下,“真正的”查询可能更复杂,例如,不只是按日期列出。
- How would you write it so that it doesn't have too many repetition between the silver and gold subqueries?
- 你怎么写它,这样它就不会有太多的重复在银和金子查询之间?
- 如何编写一个查询,首先列出所有的银徽章,然后是所有的金徽章?想象一下,“真正的”查询可能更复杂,例如,不只是按日期列出。你怎么写它,这样它就不会有太多的重复在银和金子查询之间?
- Is it perhaps more typical to do two totally separate queries instead?
- 是否更典型的做法是执行两个完全独立的查询?
- What is this idiom called? A row "partitioning" query to put them into "buckets" or something?
- 这个成语叫什么?行“分区”查询将它们放入“bucket”或其他东西?
-
- 在目前的设计中,你如何分别查询银和金的徽章?按Id分组,取最小值/最大值还是按日期排第一/秒?如何编写一个查询,首先列出所有的银徽章,然后是所有的金徽章?想象一下,“真正的”查询可能更复杂,例如,不只是按日期列出。如何编写它,使它在silver和gold子查询之间没有太多重复?是否更典型的做法是执行两个完全独立的查询?这个成语叫什么?行“分区”查询将它们放入“bucket”或其他东西?
Requirement clarification
Originally I wanted the following output, essentially:
最初我想要的是下面的输出:
User Link Date
--------------- -------------------
Bill Karwin 2009-02-20 11:00:25 // result of query for silver
Quassnoi 2009-06-01 10:00:16 // :
Greg 2009-10-22 10:00:25 // :
cletus 2010-01-01 11:00:23 // :
OMG Ponies 2010-01-03 11:00:48 // :
Pascal MARTIN 2010-02-17 11:00:29 // :
Mark Byers 2010-04-07 10:00:35 // :
Daniel Vassallo 2010-05-14 10:00:38 // :
------- maybe some sort of row separator here? can SQL do this? -------
Quassnoi 2009-10-31 10:00:24 // result of query for gold
Bill Karwin 2009-11-23 11:00:30 // :
But the answers so far with a separate column for silver and gold is also great, so feel free to pursue that angle as well. I'm still curious how you'd do the above, though.
但到目前为止,对于银和金的单独一栏的答案也很好,因此也可以*地追求这个角度。我还是很好奇你会怎么做。
2 个解决方案
#1
4
Is this a typical design, or are there much friendlier schema/normalization/whatever you call it?
这是典型的设计,还是有更友好的模式/规范化/无论你怎么称呼它?
Sure, you could add a type code to make it more explicit. But when you consider that one can not get a gold badge before a silver one, the date stamp makes a lot of sense to differentiate between them.
当然,您可以添加类型代码以使其更显式。但是当你考虑到一个人在得到一个银色的徽章之前不能得到一个金色的徽章时,日期戳区分它们是很有意义的。
In the current design, how would you query the silver and gold badges separately? GROUP BY Id and picking the min/max or first/second by the Date somehow?
在目前的设计中,你如何分别查询银和金的徽章?按Id分组,取最小值/最大值还是按日期排第一/秒?
Yes - joining onto a derived table (AKA inline view) that is a list of users & the minimum date would return the silver badges. Using HAVING COUNT(*) >= 1
would work too. You'd have to use a combination of GROUP BY and HAVING COUNT(*) = 2` to get gold badges - the max date doesn't ensure that there are more than one record for a userid...
是的——连接到一个派生表(即内联视图),这是一个用户列表,最小日期将返回银证。使用COUNT(*) >= 1也可以。您必须使用GROUP BY和COUNT(*) = 2 '的组合来获得金牌徽章——最大日期不能保证用户id有多个记录……
How can you write a query that lists all the silver badges first then all the gold badges next?
如何编写一个查询,首先列出所有的银徽章,然后是所有的金徽章?
Sorry - by users, or all silvers first and then golds? The former might be done simply by using ORDER BY t.userid, t.date
; the latter I'd likely use analytic functions (IE: ROW_NUMBER(), RANK())...
对不起,是用户,还是所有的银器先镀金?前者可以简单地通过t的顺序来完成。userid,t.date;后者我可能会使用解析函数(例如:ROW_NUMBER(), RANK()……
Is it perhaps more typical to do two totally separate queries instead?
是否更典型的做法是执行两个完全独立的查询?
See above about how vague your requirements are, to me anyways...
看看上面你的要求有多模糊,对我来说……
What is this idiom called? A row "partitioning" query to put them into "buckets" or something?
这个成语叫什么?行“分区”查询将它们放入“bucket”或其他东西?
What you're asking about is referred to by the following synonyms: Analytic, Windowing, ranking...
你要问的是以下同义词:分析、窗口、排名……
#2
3
You'd do something like this and rely only on date or count in an aggregate.
你可以做这样的事情,只依赖于日期或累计计数。
Arguably, it also makes no sense to query silver followed by gold, but rather get data side by side like this:
可以论证的是,查询白银之后再使用黄金也没有意义,而是将数据并排放在一起:
Unfortunately, you haven't really specified what you want, but a good starting point for aggregates is to express it in plain English
不幸的是,您并没有真正指定您想要的内容,但是聚合的一个好的起点是用简单的英语来表达它。
Example: "Give me dates of silver and gold badge awards per user for tag mysql". Which this does:
例如:“为每个用户提供标记mysql的金银徽章奖励日期”。这样做:
SELECT
UserId as [User Link],
min(Date) as [Silver Date],
case when count(*) = 1 THEN NULL ELSE max(date) END
FROM
Badges
Where
Name = 'mysql'
group by
UserId
Order By
case when count(*) = 1 THEN NULL ELSE max(date) END DESC, min(Date)
Edit, after update:
后编辑,更新:
Your desired output is not really SQL: it's 2 separate recordsets. The separator is a no-go. As a setb based operation, there is no "natural" order so this introduces one:
您想要的输出不是真正的SQL:它是两个独立的记录集。分隔符是禁止使用的。作为一种基于setb的操作,没有“自然”顺序,因此引入了以下顺序:
SELECT
UserId as [User Link],
min(Date) as [Date],
0 as dummyorder
FROM
Badges
Where
Name = 'mysql'
group by
UserId
union all
select
UserId as [User Link],
max(Date) as [Date],
1 as dummyorder
FROM
Badges
Where
Name = 'mysql'
group by
UserId
having
count(*) = 2
Order By
dummyorder, Date
#1
4
Is this a typical design, or are there much friendlier schema/normalization/whatever you call it?
这是典型的设计,还是有更友好的模式/规范化/无论你怎么称呼它?
Sure, you could add a type code to make it more explicit. But when you consider that one can not get a gold badge before a silver one, the date stamp makes a lot of sense to differentiate between them.
当然,您可以添加类型代码以使其更显式。但是当你考虑到一个人在得到一个银色的徽章之前不能得到一个金色的徽章时,日期戳区分它们是很有意义的。
In the current design, how would you query the silver and gold badges separately? GROUP BY Id and picking the min/max or first/second by the Date somehow?
在目前的设计中,你如何分别查询银和金的徽章?按Id分组,取最小值/最大值还是按日期排第一/秒?
Yes - joining onto a derived table (AKA inline view) that is a list of users & the minimum date would return the silver badges. Using HAVING COUNT(*) >= 1
would work too. You'd have to use a combination of GROUP BY and HAVING COUNT(*) = 2` to get gold badges - the max date doesn't ensure that there are more than one record for a userid...
是的——连接到一个派生表(即内联视图),这是一个用户列表,最小日期将返回银证。使用COUNT(*) >= 1也可以。您必须使用GROUP BY和COUNT(*) = 2 '的组合来获得金牌徽章——最大日期不能保证用户id有多个记录……
How can you write a query that lists all the silver badges first then all the gold badges next?
如何编写一个查询,首先列出所有的银徽章,然后是所有的金徽章?
Sorry - by users, or all silvers first and then golds? The former might be done simply by using ORDER BY t.userid, t.date
; the latter I'd likely use analytic functions (IE: ROW_NUMBER(), RANK())...
对不起,是用户,还是所有的银器先镀金?前者可以简单地通过t的顺序来完成。userid,t.date;后者我可能会使用解析函数(例如:ROW_NUMBER(), RANK()……
Is it perhaps more typical to do two totally separate queries instead?
是否更典型的做法是执行两个完全独立的查询?
See above about how vague your requirements are, to me anyways...
看看上面你的要求有多模糊,对我来说……
What is this idiom called? A row "partitioning" query to put them into "buckets" or something?
这个成语叫什么?行“分区”查询将它们放入“bucket”或其他东西?
What you're asking about is referred to by the following synonyms: Analytic, Windowing, ranking...
你要问的是以下同义词:分析、窗口、排名……
#2
3
You'd do something like this and rely only on date or count in an aggregate.
你可以做这样的事情,只依赖于日期或累计计数。
Arguably, it also makes no sense to query silver followed by gold, but rather get data side by side like this:
可以论证的是,查询白银之后再使用黄金也没有意义,而是将数据并排放在一起:
Unfortunately, you haven't really specified what you want, but a good starting point for aggregates is to express it in plain English
不幸的是,您并没有真正指定您想要的内容,但是聚合的一个好的起点是用简单的英语来表达它。
Example: "Give me dates of silver and gold badge awards per user for tag mysql". Which this does:
例如:“为每个用户提供标记mysql的金银徽章奖励日期”。这样做:
SELECT
UserId as [User Link],
min(Date) as [Silver Date],
case when count(*) = 1 THEN NULL ELSE max(date) END
FROM
Badges
Where
Name = 'mysql'
group by
UserId
Order By
case when count(*) = 1 THEN NULL ELSE max(date) END DESC, min(Date)
Edit, after update:
后编辑,更新:
Your desired output is not really SQL: it's 2 separate recordsets. The separator is a no-go. As a setb based operation, there is no "natural" order so this introduces one:
您想要的输出不是真正的SQL:它是两个独立的记录集。分隔符是禁止使用的。作为一种基于setb的操作,没有“自然”顺序,因此引入了以下顺序:
SELECT
UserId as [User Link],
min(Date) as [Date],
0 as dummyorder
FROM
Badges
Where
Name = 'mysql'
group by
UserId
union all
select
UserId as [User Link],
max(Date) as [Date],
1 as dummyorder
FROM
Badges
Where
Name = 'mysql'
group by
UserId
having
count(*) = 2
Order By
dummyorder, Date