在GROUP BY中使用自定义聚合函数?

时间:2021-02-27 22:45:28

I have a simple MEDIAN calculation function:

我有一个简单的MEDIAN计算功能:

IF OBJECT_ID(N'COMPUTEMEDIAN', N'FN') IS NOT NULL
    DROP FUNCTION dbo.COMPUTEMEDIAN;
GO
CREATE FUNCTION dbo.COMPUTEMEDIAN(@VALUES NVARCHAR(MAX))
RETURNS DECIMAL
WITH EXECUTE AS CALLER
AS
BEGIN
    DECLARE @SQL NVARCHAR(MAX)
    DECLARE @MEDIAN DECIMAL
    SET @MEDIAN = 0.0;

    DECLARE @MEDIAN_TEMP TABLE (RawValue DECIMAL);

    -- This is the Killer!
    INSERT INTO @MEDIAN_TEMP
    SELECT s FROM master.dbo.Split(',', @VALUES) OPTION(MAXRECURSION 0)  

    SELECT @MEDIAN =
    (
     (SELECT MAX(RawValue) FROM
       (SELECT TOP 50 PERCENT RawValue FROM @MEDIAN_TEMP ORDER BY RawValue) AS BottomHalf)
     +
     (SELECT MIN(RawValue) FROM
       (SELECT TOP 50 PERCENT RawValue FROM @MEDIAN_TEMP ORDER BY RawValue DESC) AS TopHalf)
    ) / 2

    --PRINT @SQL
    RETURN @MEDIAN;
END;
GO

However, my table is of the following form:

但是,我的表格如下:

CREATE TABLE #TEMP (GroupName VARCHAR(MAX), Value DECIMAL)
INSERT INTO #TEMP VALUES ('A', 1.0)
INSERT INTO #TEMP VALUES ('A', 2.0)
INSERT INTO #TEMP VALUES ('A', 3.0)
INSERT INTO #TEMP VALUES ('A', 4.0)
INSERT INTO #TEMP VALUES ('B', 10.0)
INSERT INTO #TEMP VALUES ('B', 11.0)
INSERT INTO #TEMP VALUES ('B', 12.0)

SELECT * FROM #TEMP

DROP TABLE #TEMP

What is the best way to invoke the MEDIAN function on this table using a GROUP BY on the id column? So, I am looking for something like this:

使用id列上的GROUP BY在此表上调用MEDIAN函数的最佳方法是什么?所以,我正在寻找这样的事情:

SELECT id, COMPUTEMEDIAN(Values)
FROM #TEMP
GROUP BY id

My current approach involves using XMLPATH to combine all values resulting from a GROUP BY operation into a large string and then passing it to the function but this involves the String splitting operation and for large strings this just slows down everything. Any suggestions?

我当前的方法是使用XMLPATH将GROUP BY操作产生的所有值组合成一个大字符串,然后将其传递给函数,但这涉及String拆分操作,对于大字符串,这只会减慢一切。有什么建议么?

3 个解决方案

#1


1  

Since you're using SQL Server 2008, I would suggest writing the aggregate function as a CLR function.

由于您使用的是SQL Server 2008,我建议将聚合函数编写为CLR函数。

http://msdn.microsoft.com/en-us/library/91e6taax(v=vs.80).aspx

Also, people have asked this question before. Perhaps their answers would be helpful

此外,人们之前已经问过这个问题。也许他们的答案会有所帮助

Function to Calculate Median in Sql Server

在Sql Server中计算中值的函数

#2


1  

EDIT: I can confirm this works very very well on a large database (30,000 values)

编辑:我可以确认这在大型数据库上非常有效(30,000个值)

Hmm... Just came across this so the following works perfectly fine but am not sure how expensive it can turn out to be:

嗯......刚刚遇到这个,所以以下工作完全没问题,但我不确定它会变得多么昂贵:

SELECT
   GroupName,
   AVG(Value)
FROM
(
   SELECT
      GroupName,
      cast(Value as decimal(5,2)) Value,
      ROW_NUMBER() OVER (
         PARTITION BY GroupName
         ORDER BY Value ASC) AS RowAsc,
      ROW_NUMBER() OVER (
         PARTITION BY GroupName 
         ORDER BY Value DESC) AS RowDesc
   FROM #TEMP SOH
) x
WHERE 
   RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY GroupName
ORDER BY GroupName;

#3


1  

No need to use a user defined function! Here's how I would do it:

无需使用用户定义的功能!这是我将如何做到这一点:

CREATE TABLE #TEMP (id VARCHAR(MAX), Value DECIMAL)

INSERT INTO #TEMP VALUES('A', 1.0)

INSERT INTO #TEMP VALUES('A', 2.0)
INSERT INTO #TEMP VALUES('A', 3.0)
INSERT INTO #TEMP VALUES('A', 4.0)
INSERT INTO #TEMP VALUES('B', 10.0)
INSERT INTO #TEMP VALUES('B', 11.0)
INSERT INTO #TEMP VALUES('B', 12.0)

SELECT 
    (SELECT TOP 1 Value 
        FROM (SELECT TOP(calcs.medianIndex) Value 
                FROM #temp 
                WHERE #temp.ID = calcs.ID ORDER BY Value ASC) AS subSet
        ORDER BY subSet.Value DESC), ID
FROM
(SELECT 
    CASE WHEN count(*) % 2 = 1 THEN count(*)/2 + 1
        ELSE count(*)/2
    END AS medianIndex,
 ID
FROM #TEMP 
GROUP BY ID) AS calcs

DROP TABLE #TEMP

Might want to double check the behavior when there is an even number of records.

当存在偶数条记录时,可能要仔细检查行为。

EDIT: After reviewing your work in your Median function, I realize that my answer basically just moved your work out of the function and into your regular query. So... why does your median calculation have to be inside of the user-defined function? It seems alot more difficult that way.

编辑:在您的Median函数中查看您的工作后,我意识到我的答案基本上只是将您的工作从函数中移出并进入常规查询。那么......为什么你的中位数计算必须在用户定义的函数内?这种方式似乎有点困难。

#1


1  

Since you're using SQL Server 2008, I would suggest writing the aggregate function as a CLR function.

由于您使用的是SQL Server 2008,我建议将聚合函数编写为CLR函数。

http://msdn.microsoft.com/en-us/library/91e6taax(v=vs.80).aspx

Also, people have asked this question before. Perhaps their answers would be helpful

此外,人们之前已经问过这个问题。也许他们的答案会有所帮助

Function to Calculate Median in Sql Server

在Sql Server中计算中值的函数

#2


1  

EDIT: I can confirm this works very very well on a large database (30,000 values)

编辑:我可以确认这在大型数据库上非常有效(30,000个值)

Hmm... Just came across this so the following works perfectly fine but am not sure how expensive it can turn out to be:

嗯......刚刚遇到这个,所以以下工作完全没问题,但我不确定它会变得多么昂贵:

SELECT
   GroupName,
   AVG(Value)
FROM
(
   SELECT
      GroupName,
      cast(Value as decimal(5,2)) Value,
      ROW_NUMBER() OVER (
         PARTITION BY GroupName
         ORDER BY Value ASC) AS RowAsc,
      ROW_NUMBER() OVER (
         PARTITION BY GroupName 
         ORDER BY Value DESC) AS RowDesc
   FROM #TEMP SOH
) x
WHERE 
   RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY GroupName
ORDER BY GroupName;

#3


1  

No need to use a user defined function! Here's how I would do it:

无需使用用户定义的功能!这是我将如何做到这一点:

CREATE TABLE #TEMP (id VARCHAR(MAX), Value DECIMAL)

INSERT INTO #TEMP VALUES('A', 1.0)

INSERT INTO #TEMP VALUES('A', 2.0)
INSERT INTO #TEMP VALUES('A', 3.0)
INSERT INTO #TEMP VALUES('A', 4.0)
INSERT INTO #TEMP VALUES('B', 10.0)
INSERT INTO #TEMP VALUES('B', 11.0)
INSERT INTO #TEMP VALUES('B', 12.0)

SELECT 
    (SELECT TOP 1 Value 
        FROM (SELECT TOP(calcs.medianIndex) Value 
                FROM #temp 
                WHERE #temp.ID = calcs.ID ORDER BY Value ASC) AS subSet
        ORDER BY subSet.Value DESC), ID
FROM
(SELECT 
    CASE WHEN count(*) % 2 = 1 THEN count(*)/2 + 1
        ELSE count(*)/2
    END AS medianIndex,
 ID
FROM #TEMP 
GROUP BY ID) AS calcs

DROP TABLE #TEMP

Might want to double check the behavior when there is an even number of records.

当存在偶数条记录时,可能要仔细检查行为。

EDIT: After reviewing your work in your Median function, I realize that my answer basically just moved your work out of the function and into your regular query. So... why does your median calculation have to be inside of the user-defined function? It seems alot more difficult that way.

编辑:在您的Median函数中查看您的工作后,我意识到我的答案基本上只是将您的工作从函数中移出并进入常规查询。那么......为什么你的中位数计算必须在用户定义的函数内?这种方式似乎有点困难。