从两列中选择不同的+匹配值= " unique "

时间:2021-07-08 11:50:51

I know the title is worded poorly, but I couldn't think of a better way to say it.

我知道这个标题用得不好,但我想不出更好的表达方式。

I'm learning Ruby and refreshing on MySQL. I'm using a historical list of completed flights as a practice data set, with about 100,000 rows to work with. Each flight record includes the origin and destination airports (fields 'origin' and 'dest'), and the total flight distance (field 'distance').

我正在学习Ruby并在MySQL上更新。我使用一个已完成航班的历史列表作为实践数据集,其中大约有100,000行要处理。每个飞行记录包括起降机场和目的地机场(字段“origin”和“dest”),以及总飞行距离(字段“distance”)。

As an exercise I want to show the 10 longest routes sorted by distance descending. However, I want to consider each pair of endpoints as a single route, regardless of which is the origin and which is the destination. So for example JFK-LAX and LAX-JFK should be considered a single route.

作为练习,我想展示按距离递减排序的10条最长路径。但是,我想把每一对端点都看作一条单独的路径,而不考虑哪个端点是原点,哪个端点是目的地。例如肯尼迪-洛杉矶国际机场和LAX-JFK应该被认为是一条单一的路线。

When I run the query:

当我运行查询:

SELECT DISTINCT distance, origin, dest FROM flights ORDER BY distance DESC LIMIT 10;

of course I get this:

我当然明白了:

["2704", "BOS", "SFO"]
["2704", "SFO", "BOS"]
["2689", "BOS", "SJC"]
["2689", "SJC", "BOS"]
["2615", "LAX", "LIH"]
["2615", "LIH", "LAX"]
["2614", "HNL", "SAN"]
["2614", "SAN", "HNL"]
["2611", "BOS", "LAX"]
["2611", "LAX", "BOS"]

which is not what I want. I want to say, "Select the distance and endpoints of the 10 longest routes regardless of whether the airports are origins or destinations."

这不是我想要的。我想说,“选择10条最长路线的距离和终点,不管机场是起点还是终点。”

One thought I had was to sort each pair of endpoints alphabetically and join them together to create the unique route, e.g., LAX and JFK = "JFKLAX". But I don't know how to do that and pass it to my original query, or even if that's the best way to go about this.

我的一个想法是按字母顺序对每一对端点进行排序,并将它们连接在一起,以创建惟一的路径,例如LAX和JFK =“JFKLAX”。但我不知道如何做到这一点并将它传递给原始查询,或者即使这是最好的方式。

Can this be done purely in SQL / MySQL?

这完全可以在SQL / MySQL中完成吗?

1 个解决方案

#1


3  

One simple way to approach this is to use GREATEST() and LEAST() return whichever value of those two columns sorts higher or lower according to the columns' collation. They then always return in the same position, and the DISTINCT will deduplicate them.

实现这一点的一种简单方法是使用maximum()和LEAST()返回这两列中的任意值,并根据列的排序进行排序。然后它们总是以相同的位置返回,而不同的则会使它们重复。

SELECT DISTINCT
  distance,
  LEAST(origin, dest) AS endpoint1,
  GREATEST(origin, dest) AS endpoint2
FROM flights f
ORDER BY distance DESC LIMIT 10

Here it is in action on sqlfiddle

这里是在sqlfiddle

For example, LEAST('BOS', 'SFO') will always return 'BOS', while GREATEST('BOS', 'SFO') will always return 'SFO'. No matter the order, when the rows are juxtaposed the result is the same so the DISTINCT will apply correctly.

例如,至少('BOS', 'SFO')将总是返回'BOS',而最大的('BOS', 'SFO')将总是返回'SFO'。无论顺序如何,当将行并列时,结果都是相同的,因此将正确地应用不同的行。

#1


3  

One simple way to approach this is to use GREATEST() and LEAST() return whichever value of those two columns sorts higher or lower according to the columns' collation. They then always return in the same position, and the DISTINCT will deduplicate them.

实现这一点的一种简单方法是使用maximum()和LEAST()返回这两列中的任意值,并根据列的排序进行排序。然后它们总是以相同的位置返回,而不同的则会使它们重复。

SELECT DISTINCT
  distance,
  LEAST(origin, dest) AS endpoint1,
  GREATEST(origin, dest) AS endpoint2
FROM flights f
ORDER BY distance DESC LIMIT 10

Here it is in action on sqlfiddle

这里是在sqlfiddle

For example, LEAST('BOS', 'SFO') will always return 'BOS', while GREATEST('BOS', 'SFO') will always return 'SFO'. No matter the order, when the rows are juxtaposed the result is the same so the DISTINCT will apply correctly.

例如,至少('BOS', 'SFO')将总是返回'BOS',而最大的('BOS', 'SFO')将总是返回'SFO'。无论顺序如何,当将行并列时,结果都是相同的,因此将正确地应用不同的行。