GROUP BY与聚合和INNER JOIN

时间:2021-11-17 22:34:44

I tried to narrow down the problem as much as possible, it is still quite something. This is the query that doesn't work the way I want it:

我试图尽可能地缩小问题的范围,它仍然是相当的东西。这是不能按我想要的方式运行的查询:

SELECT *, MAX(tbl_stopover.dist)
FROM tbl_stopover
INNER JOIN
  (SELECT edges1.id id1, edges2.id id2, COUNT(edges1.id) numConn
  FROM tbl_edges edges1
  INNER JOIN tbl_edges edges2
  ON edges1.nodeB = edges2.nodeA
  GROUP BY edges1.id HAVING numConn = 1) AS tbl_conn
ON tbl_stopover.id_edge = tbl_conn.id1
GROUP BY id_edge

Here is what I get:

这是我得到的:

|id | edge | dist | id1 | id2 | numConn | MAX(tbl_stopover.dist) |
------------------------------------------------------------------
|2  | 23   | 2    | 23  | 35  | 1       | 9                      |
|4  | 24   | 5    | 24  | 46  | 1       | 9                      |
------------------------------------------------------------------

and this is what I would want:

这就是我想要的:

|id | edge | dist | id1 | id2 | numConn | MAX(tbl_stopover.dist) |
------------------------------------------------------------------
|3  | 23   | 9    | 23  | 35  | 1       | 9                      |
|5  | 24   | 9    | 24  | 46  | 1       | 9                      |
------------------------------------------------------------------

But let me elaborate a bit...

但让我详细说一下......

I have a graph, let's say as such:

我有一个图表,让我们这样说:

    node1
      |
    node2
   /     \
node3    node4
  |       |
node5    node6

Therefore I have a table I call tbl_edges like this:

因此,我有一个表格,我称之为tbl_edges,如下所示:

| id  | nodeA | node B |
------------------------
| 12  |   1   |    2   |
| 23  |   2   |    3   |
| 24  |   2   |    4   |
| 35  |   3   |    5   |
| 46  |   4   |    6   |
------------------------

Now each edge has "stop_overs" at a certain distance (to nodeA). Therefore I have a table tbl_stopover like this:

现在每个边缘在某个距离(到nodeA)有“stop_overs”。因此我有一个像这样的表tbl_stopover:

| id  | edge  |  dist  |
------------------------
|  1  |  12   |    5   |
|  2  |  23   |    2   |
|  3  |  23   |    9   |
|  4  |  24   |    5   |
|  5  |  24   |    9   |
|  6  |  35   |    5   |
|  7  |  46   |    5   |
------------------------

Why this query?
Let's assume I want to calculate the distance between the stop_overs. Within one edge that is no problem. Across edges it gets more difficult. But if I have two edges that are connected and there is no other connection I can also calculate the distance. Here an example assuming all edges have a length of 10. :

edge23 has a stop_over(id=3) at dist=9, edge35 has a stop_over(id=6) at dist=5. Therefore the distance between these two stop_overs is:

为什么这个查询?我们假设我想计算stop_overs之间的距离。在一个边缘内没有问题。跨越边缘变得更加困难。但是如果我有两个连接的边缘并且没有其他连接,我也可以计算距离。这里假设所有边的长度为10的示例:edge23在dist = 9处具有stop_over(id = 3),edge35在dist = 5处具有stop_over(id = 6)。因此,这两个stop_over之间的距离是:

dist = (length - dist_id3) + dist_id5 = (10-9) + 5

I am not sure if I made my self clear. If this is not understandable, feel free to ask question and I will do my best to make this more understandable.

我不确定我是否清楚自己。如果这是不可理解的,请随意提问,我会尽我所能让这更容易理解。

2 个解决方案

#1


4  

MySQL allows you to do something silly - display fields in an aggregate query that are not a part of the GROUP BY or an aggregate function like MAX. When you do this, you get random (as you said) results for the remaining fields.

MySQL允许你做一些愚蠢的事情 - 在聚合查询中显示不属于GROUP BY或MAX等聚合函数的字段。执行此操作时,您将获得剩余字段的随机(如您所述)结果。

In your query you are doing this twice - once in your inner query (id2 is not part of a GROUP BY or aggregate) and once in the outer.

在您的查询中,您执行此操作两次 - 一次在内部查询中(id2不是GROUP BY或聚合的一部分),一次在外部。

Prepare for random results!

准备随机结果!

To fix it, try something like this:

要解决它,尝试这样的事情:

SELECT tbl_stopover.id,
       tbl_stopover.dist,
       tbl_conn.id1,
       tbl_conn.id2,
       tbl_conn.numConn,
       MAX(tbl_stopover.dist)
FROM tbl_stopover
INNER JOIN
  (SELECT edges1.id id1, edges2.id id2, COUNT(edges1.id) numConn
  FROM tbl_edges edges1
  INNER JOIN tbl_edges edges2
  ON edges1.nodeB = edges2.nodeA
  GROUP BY edges1.id, edges2.id
  HAVING numConn = 1) AS tbl_conn
ON tbl_stopover.id_edge = tbl_conn.id1
GROUP BY tbl_stopover.id,
         tbl_stopover.dist,
         tbl_conn.id1,
         tbl_conn.id2,
         tbl_conn.numConn

The major changes are the explicit field list (note that I removed the id_edge since you are joining on id1 and already have that field), and addition of additional fields to both the inner and outer GROUP BY clauses.

主要更改是显式字段列表(请注意,我删除了id_edge,因为您正在加入id1并且已经有该字段),并且还为内部和外部GROUP BY子句添加了其他字段。

If this gives you more rows than you want then you may need to explain more about your desired result set. Something like this is the only way to ensure you get appropriate groupings.

如果这为您提供了比您想要的更多的行,那么您可能需要解释有关所需结果集的更多信息。这样的事情是确保您获得适当分组的唯一方法。

#2


1  

Okay. This seems to be the answer to my question. I will do some further "investigation" though, because I'm not sure if this is reliable. If anybody has some though on this, please leave a comment.

好的。这似乎是我的问题的答案。我会做一些进一步的“调查”,因为我不确定这是否可靠。如果有人对此有所了解,请发表评论。

SELECT tbl.id, tbl.dist, tbl.id1, tbl.id2, MAX(dist) maxDist
FROM
(
  SELECT tbl_stopover.id,
         tbl_stopover.dist,
         tbl_conn.id1,
         tbl_conn.id2,
         tbl_conn.numConn
  FROM tbl_stopover
  INNER JOIN
    (SELECT edges1.id id1, edges2.id id2, COUNT(edges1.id) numConn
    FROM tbl_edges edges1
    INNER JOIN tbl_edges edges2
    ON edges1.nodeB = edges2.nodeA
    GROUP BY edges1.id
    HAVING numConn = 1) AS tbl_conn
  ON tbl_stopover.id_edge = tbl_conn.id1
  GROUP BY tbl_stopover.dist, tbl_conn.id1
  ORDER BY dist DESC) AS tbl
GROUP BY tbl.id1, tbl.id2

Thanks to JNK (my colleague at work) without whom I wouldn't have gotten this far.

感谢JNK(我的同事在工作中)没有他,我不会有这么远。

#1


4  

MySQL allows you to do something silly - display fields in an aggregate query that are not a part of the GROUP BY or an aggregate function like MAX. When you do this, you get random (as you said) results for the remaining fields.

MySQL允许你做一些愚蠢的事情 - 在聚合查询中显示不属于GROUP BY或MAX等聚合函数的字段。执行此操作时,您将获得剩余字段的随机(如您所述)结果。

In your query you are doing this twice - once in your inner query (id2 is not part of a GROUP BY or aggregate) and once in the outer.

在您的查询中,您执行此操作两次 - 一次在内部查询中(id2不是GROUP BY或聚合的一部分),一次在外部。

Prepare for random results!

准备随机结果!

To fix it, try something like this:

要解决它,尝试这样的事情:

SELECT tbl_stopover.id,
       tbl_stopover.dist,
       tbl_conn.id1,
       tbl_conn.id2,
       tbl_conn.numConn,
       MAX(tbl_stopover.dist)
FROM tbl_stopover
INNER JOIN
  (SELECT edges1.id id1, edges2.id id2, COUNT(edges1.id) numConn
  FROM tbl_edges edges1
  INNER JOIN tbl_edges edges2
  ON edges1.nodeB = edges2.nodeA
  GROUP BY edges1.id, edges2.id
  HAVING numConn = 1) AS tbl_conn
ON tbl_stopover.id_edge = tbl_conn.id1
GROUP BY tbl_stopover.id,
         tbl_stopover.dist,
         tbl_conn.id1,
         tbl_conn.id2,
         tbl_conn.numConn

The major changes are the explicit field list (note that I removed the id_edge since you are joining on id1 and already have that field), and addition of additional fields to both the inner and outer GROUP BY clauses.

主要更改是显式字段列表(请注意,我删除了id_edge,因为您正在加入id1并且已经有该字段),并且还为内部和外部GROUP BY子句添加了其他字段。

If this gives you more rows than you want then you may need to explain more about your desired result set. Something like this is the only way to ensure you get appropriate groupings.

如果这为您提供了比您想要的更多的行,那么您可能需要解释有关所需结果集的更多信息。这样的事情是确保您获得适当分组的唯一方法。

#2


1  

Okay. This seems to be the answer to my question. I will do some further "investigation" though, because I'm not sure if this is reliable. If anybody has some though on this, please leave a comment.

好的。这似乎是我的问题的答案。我会做一些进一步的“调查”,因为我不确定这是否可靠。如果有人对此有所了解,请发表评论。

SELECT tbl.id, tbl.dist, tbl.id1, tbl.id2, MAX(dist) maxDist
FROM
(
  SELECT tbl_stopover.id,
         tbl_stopover.dist,
         tbl_conn.id1,
         tbl_conn.id2,
         tbl_conn.numConn
  FROM tbl_stopover
  INNER JOIN
    (SELECT edges1.id id1, edges2.id id2, COUNT(edges1.id) numConn
    FROM tbl_edges edges1
    INNER JOIN tbl_edges edges2
    ON edges1.nodeB = edges2.nodeA
    GROUP BY edges1.id
    HAVING numConn = 1) AS tbl_conn
  ON tbl_stopover.id_edge = tbl_conn.id1
  GROUP BY tbl_stopover.dist, tbl_conn.id1
  ORDER BY dist DESC) AS tbl
GROUP BY tbl.id1, tbl.id2

Thanks to JNK (my colleague at work) without whom I wouldn't have gotten this far.

感谢JNK(我的同事在工作中)没有他,我不会有这么远。