对于常见元素,如何根据两列查找值? SQL

时间:2021-02-20 04:27:01

I am using DB2 by IBM; however I am hoping this can be done using the SQL standard without using the additional functions added by each DB manufacturer.

我正在使用IBM的DB2;但我希望这可以使用SQL标准完成,而不使用每个数据库制造商添加的附加功能。

My Problem:

我的问题:

For every CID (customer ID) I need to find the CLUB from which the most purchases were made.

对于每个CID(客户ID),我需要找到购买次数最多的CLUB。

In the table below, customer #1 ('cid =1') bought the most books from the club 'Readers Digest'. I can get this part done using: Find most frequent value in SQL column

在下表中,顾客#1('cid = 1')从俱乐部的“读者摘要”中购买了最多的书籍。我可以使用以下方法完成此部分:在SQL列中查找最常用的值

对于常见元素,如何根据两列查找值? SQL

However CID=2 is much harder (for me), and I am stuck. For 'cid = 2' we can see that the most commonly occurring sting under the 'CLUB' column is a tie between 'Oprah' and 'YRB Gold' - but when we look at the 'QNTY' column it becomes apparent (to people) that the final answer for 'cid=2' is 'YRB_gold'. 'YRB_GOLD' sold 5 books, where 'OPRAH' sold only 4 books.

但是CID = 2对我来说要困难得多,而且我被卡住了。对于'cid = 2',我们可以看到'CLUB'栏下最常出现的刺痛是'奥普拉'和'YRB Gold'之间的关系 - 但是当我们查看'QNTY'栏时,它变得明显(对于人们来说) )'cid = 2'的最终答案是'YRB_gold'。 'YRB_GOLD'出售了5本书,其中'OPRAH'只出售了4本书。

对于常见元素,如何根据两列查找值? SQL

I put 'cid=3' here as well:

我也把'cid = 3'放在这里:

对于常见元素,如何根据两列查找值? SQL

There are 47 customers (cid). My question is: how do I step through every 'CID', and consider the most commonly occurring string under 'CLUB', while at the same time also keeping track of the 'QNTY' for every club? If you choose to help, could you please offer the most OEM neutral answer? Thank you in advance.

有47个客户(cid)。我的问题是:如何逐步完成每个'CID',并考虑'CLUB'下最常出现的字符串,同时还要跟踪每个俱乐部的'QNTY'?如果您选择提供帮助,请提供最中性的OEM答案吗?先谢谢你。

******************************EDIT #1*****************************************

******************************编辑#1 ***************** ************************

Thanks for the answer below. Currently I am having an issue making it work.

谢谢你的回答。目前我遇到了一个问题,让它发挥作用。

The table name is 'yrb_purchase'. When I type everything in: SELECT cid, club, qnty AS q1 FROM (SELECT cid, club, sum(qnty) AS q2, row_number() OVER (PARTITION BY cid ORDER BY sum(q2) DESC) AS seqnum FROM yrb_purchase GROUP BY cid, club) cc WHERE seqnum = 1

表名是'yrb_purchase'。当我输入所有内容时:SELECT cid,club,qnty AS q1 FROM(SELECT cid,club,sum(qnty)AS q2,row_number()OVER(PARTITION BY cid ORDER BY sum(q2)DESC)AS seqnum FROM yrb_purchase GROUP BY cid,club)cc WHERE seqnum = 1

I get an error: SQL0206N "Q2" is not valid in the context where it is used. SQLSTATE=42703

我收到一个错误:SQL0206N“Q2”在使用它的上下文中无效。 SQLSTATE = 42703

currently working on trying to figure out how to fix that.

目前正致力于弄清楚如何解决这个问题。

1 个解决方案

#1


1  

The value that you are looking for has a name. Statistically, it is called the "mode".

您要查找的值有一个名称。统计上,它被称为“模式”。

If I understand correctly, you want the total quantity for each club, and then the one with the highest for each cid. You can use row_number() or rank(), depending on how you want to handle ties:

如果我理解正确,你需要每个俱乐部的总数量,然后是每个俱乐部最高的数量。您可以使用row_number()或rank(),具体取决于您希望如何处理关系:

select cid, club, qty
from (select cid, club, sum(qty) as qty,
             row_number() over (partition by cid order by sum(qty) desc) as seqnum
      from t
      group by cid, club
     ) cc
where seqnum = 1;

If there are ties and you want all values, then use rank() instead of row_number().

如果存在关联并且您想要所有值,则使用rank()而不是row_number()。

#1


1  

The value that you are looking for has a name. Statistically, it is called the "mode".

您要查找的值有一个名称。统计上,它被称为“模式”。

If I understand correctly, you want the total quantity for each club, and then the one with the highest for each cid. You can use row_number() or rank(), depending on how you want to handle ties:

如果我理解正确,你需要每个俱乐部的总数量,然后是每个俱乐部最高的数量。您可以使用row_number()或rank(),具体取决于您希望如何处理关系:

select cid, club, qty
from (select cid, club, sum(qty) as qty,
             row_number() over (partition by cid order by sum(qty) desc) as seqnum
      from t
      group by cid, club
     ) cc
where seqnum = 1;

If there are ties and you want all values, then use rank() instead of row_number().

如果存在关联并且您想要所有值,则使用rank()而不是row_number()。