bigquery row_number基于特定字段的某些行组

时间:2021-09-30 13:51:58

I have data like this i.e.

我有这样的数据,即

cityName      stateId
cityText1        52
cityText2        52
cityText3        52

cityExp1         72
cityExp2         72
cityExp3         72

city1            21
city2            21

i am using SUB QUERIES FOR GETTING DATA. now with BIGQUERY I want data like this:

我正在使用SUB QUERIES获取数据。现在有BIGQUERY我想要这样的数据:

cityName      rowNumber
cityText1        1
cityText2        1
cityText3        1

cityExp1         2
cityExp2         2
cityExp3         2

city1            3
city2            3

I have tried to use row_number() but it gives unique number to every row. so is this possible how i want.

我曾尝试使用row_number()但它为每一行提供唯一的编号。所以这可能是我想要的。

1 个解决方案

#1


You need first to join them with a scalar value to be one partition, then you can apply the ROW_NUMBER on that partition.

首先需要将标量值作为一个分区加入它们,然后可以在该分区上应用ROW_NUMBER。

Update: scroll to the bottom of the answer to see a suggestion without using scalar.

更新:滚动到答案的底部,查看不使用标量的建议。

SELECT stateId,
       row_number() over (partition BY scalar) AS INDEX
FROM
  (SELECT stateId,
          1 AS scalar
   FROM
     (SELECT 'cityText1' AS cityName,
             52 AS stateId),
     (SELECT 'cityText2' AS cityName,
             52 AS stateId),
     (SELECT 'cityText3' AS cityName,
             52 AS stateId),
     (SELECT 'cityExp1' AS cityName,
             72 AS stateId),
     (SELECT 'cityExp2' AS cityName,
             72 AS stateId),
     (SELECT 'cityExp3' AS cityName,
             72 AS stateId),
     (SELECT 'city1' AS cityName,
             21 AS stateId),
     (SELECT 'city2' AS cityName,
             21 AS stateId)
   GROUP BY stateId) d

this returns:

+-----+---------+-------+---+
| Row | stateId | index |   |
+-----+---------+-------+---+
|   1 |      52 |     1 |   |
|   2 |      72 |     2 |   |
|   3 |      21 |     3 |   |
+-----+---------+-------+---+

Then you can join the table again and prepare the final output. For our static tables it's a pretty long query:

然后,您可以再次加入表格并准备最终输出。对于我们的静态表,这是一个非常长的查询:

SELECT t.cityName,
       t.stateId,
       d.index
FROM
  (SELECT *
   FROM
     (SELECT 'cityText1' AS cityName,
             52 AS stateId),
     (SELECT 'cityText2' AS cityName,
             52 AS stateId),
     (SELECT 'cityText3' AS cityName,
             52 AS stateId),
     (SELECT 'cityExp1' AS cityName,
             72 AS stateId),
     (SELECT 'cityExp2' AS cityName,
             72 AS stateId),
     (SELECT 'cityExp3' AS cityName,
             72 AS stateId),
     (SELECT 'city1' AS cityName,
             21 AS stateId),
     (SELECT 'city2' AS cityName,
             21 AS stateId)) t
JOIN
  (SELECT stateId,
          row_number() over (partition BY scalar) AS INDEX
   FROM
     (SELECT stateId,
             1 AS scalar
      FROM
        (SELECT 'cityText1' AS cityName,
                52 AS stateId),
        (SELECT 'cityText2' AS cityName,
                52 AS stateId),
        (SELECT 'cityText3' AS cityName,
                52 AS stateId),
        (SELECT 'cityExp1' AS cityName,
                72 AS stateId),
        (SELECT 'cityExp2' AS cityName,
                72 AS stateId),
        (SELECT 'cityExp3' AS cityName,
                72 AS stateId),
        (SELECT 'city1' AS cityName,
                21 AS stateId),
        (SELECT 'city2' AS cityName,
                21 AS stateId)
      GROUP BY stateId)) d ON d.stateId=t.stateId

this returns the final output:

这将返回最终输出:

+-----+------------+-----------+---------+---+
| Row | t_cityName | t_stateId | d_index |   |
+-----+------------+-----------+---------+---+
|   1 | cityText1  |        52 |       1 |   |
|   2 | cityText2  |        52 |       1 |   |
|   3 | cityText3  |        52 |       1 |   |
|   4 | cityExp1   |        72 |       2 |   |
|   5 | cityExp2   |        72 |       2 |   |
|   6 | cityExp3   |        72 |       2 |   |
|   7 | city1      |        21 |       3 |   |
|   8 | city2      |        21 |       3 |   |
+-----+------------+-----------+---------+---+

Update: After the update without scalar the query becomes:

更新:在没有标量的更新后,查询变为:

SELECT stateId,
       row_number() over () AS INDEX
FROM

     (SELECT 'cityText1' AS cityName,
             52 AS stateId),
     (SELECT 'cityText2' AS cityName,
             52 AS stateId),
     (SELECT 'cityText3' AS cityName,
             52 AS stateId),
     (SELECT 'cityExp1' AS cityName,
             72 AS stateId),
     (SELECT 'cityExp2' AS cityName,
             72 AS stateId),
     (SELECT 'cityExp3' AS cityName,
             72 AS stateId),
     (SELECT 'city1' AS cityName,
             21 AS stateId),
     (SELECT 'city2' AS cityName,
             21 AS stateId)
   group by stateId

#1


You need first to join them with a scalar value to be one partition, then you can apply the ROW_NUMBER on that partition.

首先需要将标量值作为一个分区加入它们,然后可以在该分区上应用ROW_NUMBER。

Update: scroll to the bottom of the answer to see a suggestion without using scalar.

更新:滚动到答案的底部,查看不使用标量的建议。

SELECT stateId,
       row_number() over (partition BY scalar) AS INDEX
FROM
  (SELECT stateId,
          1 AS scalar
   FROM
     (SELECT 'cityText1' AS cityName,
             52 AS stateId),
     (SELECT 'cityText2' AS cityName,
             52 AS stateId),
     (SELECT 'cityText3' AS cityName,
             52 AS stateId),
     (SELECT 'cityExp1' AS cityName,
             72 AS stateId),
     (SELECT 'cityExp2' AS cityName,
             72 AS stateId),
     (SELECT 'cityExp3' AS cityName,
             72 AS stateId),
     (SELECT 'city1' AS cityName,
             21 AS stateId),
     (SELECT 'city2' AS cityName,
             21 AS stateId)
   GROUP BY stateId) d

this returns:

+-----+---------+-------+---+
| Row | stateId | index |   |
+-----+---------+-------+---+
|   1 |      52 |     1 |   |
|   2 |      72 |     2 |   |
|   3 |      21 |     3 |   |
+-----+---------+-------+---+

Then you can join the table again and prepare the final output. For our static tables it's a pretty long query:

然后,您可以再次加入表格并准备最终输出。对于我们的静态表,这是一个非常长的查询:

SELECT t.cityName,
       t.stateId,
       d.index
FROM
  (SELECT *
   FROM
     (SELECT 'cityText1' AS cityName,
             52 AS stateId),
     (SELECT 'cityText2' AS cityName,
             52 AS stateId),
     (SELECT 'cityText3' AS cityName,
             52 AS stateId),
     (SELECT 'cityExp1' AS cityName,
             72 AS stateId),
     (SELECT 'cityExp2' AS cityName,
             72 AS stateId),
     (SELECT 'cityExp3' AS cityName,
             72 AS stateId),
     (SELECT 'city1' AS cityName,
             21 AS stateId),
     (SELECT 'city2' AS cityName,
             21 AS stateId)) t
JOIN
  (SELECT stateId,
          row_number() over (partition BY scalar) AS INDEX
   FROM
     (SELECT stateId,
             1 AS scalar
      FROM
        (SELECT 'cityText1' AS cityName,
                52 AS stateId),
        (SELECT 'cityText2' AS cityName,
                52 AS stateId),
        (SELECT 'cityText3' AS cityName,
                52 AS stateId),
        (SELECT 'cityExp1' AS cityName,
                72 AS stateId),
        (SELECT 'cityExp2' AS cityName,
                72 AS stateId),
        (SELECT 'cityExp3' AS cityName,
                72 AS stateId),
        (SELECT 'city1' AS cityName,
                21 AS stateId),
        (SELECT 'city2' AS cityName,
                21 AS stateId)
      GROUP BY stateId)) d ON d.stateId=t.stateId

this returns the final output:

这将返回最终输出:

+-----+------------+-----------+---------+---+
| Row | t_cityName | t_stateId | d_index |   |
+-----+------------+-----------+---------+---+
|   1 | cityText1  |        52 |       1 |   |
|   2 | cityText2  |        52 |       1 |   |
|   3 | cityText3  |        52 |       1 |   |
|   4 | cityExp1   |        72 |       2 |   |
|   5 | cityExp2   |        72 |       2 |   |
|   6 | cityExp3   |        72 |       2 |   |
|   7 | city1      |        21 |       3 |   |
|   8 | city2      |        21 |       3 |   |
+-----+------------+-----------+---------+---+

Update: After the update without scalar the query becomes:

更新:在没有标量的更新后,查询变为:

SELECT stateId,
       row_number() over () AS INDEX
FROM

     (SELECT 'cityText1' AS cityName,
             52 AS stateId),
     (SELECT 'cityText2' AS cityName,
             52 AS stateId),
     (SELECT 'cityText3' AS cityName,
             52 AS stateId),
     (SELECT 'cityExp1' AS cityName,
             72 AS stateId),
     (SELECT 'cityExp2' AS cityName,
             72 AS stateId),
     (SELECT 'cityExp3' AS cityName,
             72 AS stateId),
     (SELECT 'city1' AS cityName,
             21 AS stateId),
     (SELECT 'city2' AS cityName,
             21 AS stateId)
   group by stateId