从数据库中的列值生成直方图

时间:2023-01-21 14:57:20

Let's say I have a database column 'grade' like this:

假设我有一个数据库列“年级”,像这样:

|grade|
|    1|
|    2|
|    1|
|    3|
|    4|
|    5|

Is there a non-trivial way in SQL to generate a histogram like this?

SQL中是否存在一种非平凡的方法来生成这样的直方图?

|2,1,1,1,1,0|

where 2 means the grade 1 occurs twice, the 1s mean grades {2..5} occur once and 0 means grade 6 does not occur at all.

2表示1级发生两次,1表示2级。5}发生一次,0表示6级根本没有发生。

I don't mind if the histogram is one row per count.

我不介意直方图是每行。

If that matters, the database is SQL Server accessed by a perl CGI through unixODBC/FreeTDS.

如果这很重要,那么数据库就是通过unixODBC/FreeTDS通过perl CGI访问的SQL服务器。

EDIT: Thanks for your quick replies! It is okay if non-existing values (like grade 6 in the example above) do not occur as long as I can make out which histogram value belongs to which grade.

编辑:谢谢你的快速回复!如果不存在的值(如上面示例中的6级)没有出现,那是可以的,只要我能确定哪个直方图值属于哪个级别。

7 个解决方案

#1


26  

SELECT COUNT(grade) FROM table GROUP BY grade ORDER BY grade

Haven't verified it, but it should work.It will not, however, show count for 6s grade, since it's not present in the table at all...

还没有验证,但应该可以。然而,它不会显示6s的分数,因为它根本不在表中……

#2


6  

Use a temp table to get your missing values:

使用临时表获取丢失的值:

CREATE TABLE #tmp(num int)
DECLARE @num int
SET @num = 0
WHILE @num < 10
BEGIN
  INSERT #tmp @num
  SET @num = @num + 1
END


SELECT t.num as [Grade], count(g.Grade) FROM gradeTable g
RIGHT JOIN #tmp t on g.Grade = t.num
GROUP by t.num
ORDER BY 1

#3


3  

Gamecat's use of DISTINCT seems a little odd to me, will have to try it out when I'm back in the office...

Gamecat使用的DISTINCT浏览器对我来说有点奇怪,等我回到办公室的时候就得试试……

The way I would do it is similar though...

我的做法是相似的……

SELECT
    [table].grade        AS [grade],
    COUNT(*)             AS [occurances]
FROM
    [table]
GROUP BY
    [table].grade
ORDER BY
    [table].grade

To overcome the lack of data where there are 0 occurances, you can LEFT JOIN on to a table containing all valid grades. The COUNT(*) will count NULLS, but COUNT(grade) won't count the NULLS.

为了克服出现0次错误的数据不足,可以将JOIN保留到包含所有有效分数的表中。COUNT(*)将计数为NULLS,但是COUNT(grade)不会计算NULLS。

DECLARE @grades TABLE (
   val INT
   )  

INSERT INTO @grades VALUES (1)  
INSERT INTO @grades VALUES (2)  
INSERT INTO @grades VALUES (3)  
INSERT INTO @grades VALUES (4)  
INSERT INTO @grades VALUES (5)  
INSERT INTO @grades VALUES (6)  

SELECT
    [grades].val         AS [grade],
    COUNT([table].grade) AS [occurances]
FROM
    @grades   AS [grades]
LEFT JOIN
    [table]
        ON [table].grade = [grades].val
GROUP BY
    [grades].val
ORDER BY
    [grades].val

#4


3  

If there are a lot of data points, you can also group ranges together like this:

如果有很多数据点,你也可以将范围分组如下:

SELECT FLOOR(grade/5.00)*5 As Grade, 
       COUNT(*) AS [Grade Count]
FROM TableName
GROUP BY FLOOR(Grade/5.00)*5
ORDER BY 1

Additionally, if you wanted to label the full range, you can get the floor and ceiling ahead of time with a CTE.

此外,如果您想要标记整个范围,您可以提前使用CTE获得地板和天花板。

With GradeRanges As (
  SELECT FLOOR(Score/5.00)*5     As GradeFloor, 
         FLOOR(Score/5.00)*5 + 4 As GradeCeiling
  FROM TableName
)
SELECT GradeFloor,
       CONCAT(GradeFloor, ' to ', GradeCeiling) AS GradeRange,
       COUNT(*) AS [Grade Count]
FROM GradeRanges
GROUP BY GradeFloor, CONCAT(GradeFloor, ' to ', GradeCeiling)
ORDER BY GradeFloor

Note: In some SQL engines, you can GROUP BY an Ordinal Column Index, but with MS SQL, if you want it in the SELECT statement, you're going to need to group by it also, hence copying the Range into the Group Expression as well.

注意:在一些SQL引擎中,您可以使用序号列索引进行分组,但是对于MS SQL,如果您想要在SELECT语句中进行分组,您还需要对它进行分组,因此也需要将范围复制到GROUP表达式中。

#5


2  

select Grade, count(Grade)
from MyTable
group by Grade

#6


2  

According to Shlomo Priymak's article How to Quickly Create a Histogram in MySQL, you can use the following query:

根据Shlomo Priymak的文章,如何在MySQL中快速创建直方图,您可以使用以下查询:

SELECT grade, 
       COUNT(\*) AS 'Count',
       RPAD('', COUNT(\*), '*') AS 'Bar' 
FROM grades 
GROUP BY grade

Which will produce the following table:

将产生下表:

grade   Count   Bar
1       2       **
2       1       *
3       1       *
4       1       *
5       1       *

#7


0  

I am building on what Ilya Volodin did above, that should allow you to select a range of grade you want to group together in your result:

我是基于Ilya Volodin上面所做的,那应该可以让你选择一系列你想要在你的结果中组合在一起的等级:

DECLARE @cnt INT = 0;

WHILE @cnt < 100 -- Set max value
BEGIN
SELECT @cnt,COUNT(fe) FROM dbo.GEODATA_CB where fe >= @cnt-0.999 and fe <= @cnt+0.999 -- set tolerance
SET @cnt = @cnt + 1; -- set step
END;

#1


26  

SELECT COUNT(grade) FROM table GROUP BY grade ORDER BY grade

Haven't verified it, but it should work.It will not, however, show count for 6s grade, since it's not present in the table at all...

还没有验证,但应该可以。然而,它不会显示6s的分数,因为它根本不在表中……

#2


6  

Use a temp table to get your missing values:

使用临时表获取丢失的值:

CREATE TABLE #tmp(num int)
DECLARE @num int
SET @num = 0
WHILE @num < 10
BEGIN
  INSERT #tmp @num
  SET @num = @num + 1
END


SELECT t.num as [Grade], count(g.Grade) FROM gradeTable g
RIGHT JOIN #tmp t on g.Grade = t.num
GROUP by t.num
ORDER BY 1

#3


3  

Gamecat's use of DISTINCT seems a little odd to me, will have to try it out when I'm back in the office...

Gamecat使用的DISTINCT浏览器对我来说有点奇怪,等我回到办公室的时候就得试试……

The way I would do it is similar though...

我的做法是相似的……

SELECT
    [table].grade        AS [grade],
    COUNT(*)             AS [occurances]
FROM
    [table]
GROUP BY
    [table].grade
ORDER BY
    [table].grade

To overcome the lack of data where there are 0 occurances, you can LEFT JOIN on to a table containing all valid grades. The COUNT(*) will count NULLS, but COUNT(grade) won't count the NULLS.

为了克服出现0次错误的数据不足,可以将JOIN保留到包含所有有效分数的表中。COUNT(*)将计数为NULLS,但是COUNT(grade)不会计算NULLS。

DECLARE @grades TABLE (
   val INT
   )  

INSERT INTO @grades VALUES (1)  
INSERT INTO @grades VALUES (2)  
INSERT INTO @grades VALUES (3)  
INSERT INTO @grades VALUES (4)  
INSERT INTO @grades VALUES (5)  
INSERT INTO @grades VALUES (6)  

SELECT
    [grades].val         AS [grade],
    COUNT([table].grade) AS [occurances]
FROM
    @grades   AS [grades]
LEFT JOIN
    [table]
        ON [table].grade = [grades].val
GROUP BY
    [grades].val
ORDER BY
    [grades].val

#4


3  

If there are a lot of data points, you can also group ranges together like this:

如果有很多数据点,你也可以将范围分组如下:

SELECT FLOOR(grade/5.00)*5 As Grade, 
       COUNT(*) AS [Grade Count]
FROM TableName
GROUP BY FLOOR(Grade/5.00)*5
ORDER BY 1

Additionally, if you wanted to label the full range, you can get the floor and ceiling ahead of time with a CTE.

此外,如果您想要标记整个范围,您可以提前使用CTE获得地板和天花板。

With GradeRanges As (
  SELECT FLOOR(Score/5.00)*5     As GradeFloor, 
         FLOOR(Score/5.00)*5 + 4 As GradeCeiling
  FROM TableName
)
SELECT GradeFloor,
       CONCAT(GradeFloor, ' to ', GradeCeiling) AS GradeRange,
       COUNT(*) AS [Grade Count]
FROM GradeRanges
GROUP BY GradeFloor, CONCAT(GradeFloor, ' to ', GradeCeiling)
ORDER BY GradeFloor

Note: In some SQL engines, you can GROUP BY an Ordinal Column Index, but with MS SQL, if you want it in the SELECT statement, you're going to need to group by it also, hence copying the Range into the Group Expression as well.

注意:在一些SQL引擎中,您可以使用序号列索引进行分组,但是对于MS SQL,如果您想要在SELECT语句中进行分组,您还需要对它进行分组,因此也需要将范围复制到GROUP表达式中。

#5


2  

select Grade, count(Grade)
from MyTable
group by Grade

#6


2  

According to Shlomo Priymak's article How to Quickly Create a Histogram in MySQL, you can use the following query:

根据Shlomo Priymak的文章,如何在MySQL中快速创建直方图,您可以使用以下查询:

SELECT grade, 
       COUNT(\*) AS 'Count',
       RPAD('', COUNT(\*), '*') AS 'Bar' 
FROM grades 
GROUP BY grade

Which will produce the following table:

将产生下表:

grade   Count   Bar
1       2       **
2       1       *
3       1       *
4       1       *
5       1       *

#7


0  

I am building on what Ilya Volodin did above, that should allow you to select a range of grade you want to group together in your result:

我是基于Ilya Volodin上面所做的,那应该可以让你选择一系列你想要在你的结果中组合在一起的等级:

DECLARE @cnt INT = 0;

WHILE @cnt < 100 -- Set max value
BEGIN
SELECT @cnt,COUNT(fe) FROM dbo.GEODATA_CB where fe >= @cnt-0.999 and fe <= @cnt+0.999 -- set tolerance
SET @cnt = @cnt + 1; -- set step
END;