I have a simple MEDIAN calculation function:
我有一个简单的MEDIAN计算功能:
IF OBJECT_ID(N'COMPUTEMEDIAN', N'FN') IS NOT NULL
DROP FUNCTION dbo.COMPUTEMEDIAN;
GO
CREATE FUNCTION dbo.COMPUTEMEDIAN(@VALUES NVARCHAR(MAX))
RETURNS DECIMAL
WITH EXECUTE AS CALLER
AS
BEGIN
DECLARE @SQL NVARCHAR(MAX)
DECLARE @MEDIAN DECIMAL
SET @MEDIAN = 0.0;
DECLARE @MEDIAN_TEMP TABLE (RawValue DECIMAL);
-- This is the Killer!
INSERT INTO @MEDIAN_TEMP
SELECT s FROM master.dbo.Split(',', @VALUES) OPTION(MAXRECURSION 0)
SELECT @MEDIAN =
(
(SELECT MAX(RawValue) FROM
(SELECT TOP 50 PERCENT RawValue FROM @MEDIAN_TEMP ORDER BY RawValue) AS BottomHalf)
+
(SELECT MIN(RawValue) FROM
(SELECT TOP 50 PERCENT RawValue FROM @MEDIAN_TEMP ORDER BY RawValue DESC) AS TopHalf)
) / 2
--PRINT @SQL
RETURN @MEDIAN;
END;
GO
However, my table is of the following form:
但是,我的表格如下:
CREATE TABLE #TEMP (GroupName VARCHAR(MAX), Value DECIMAL)
INSERT INTO #TEMP VALUES ('A', 1.0)
INSERT INTO #TEMP VALUES ('A', 2.0)
INSERT INTO #TEMP VALUES ('A', 3.0)
INSERT INTO #TEMP VALUES ('A', 4.0)
INSERT INTO #TEMP VALUES ('B', 10.0)
INSERT INTO #TEMP VALUES ('B', 11.0)
INSERT INTO #TEMP VALUES ('B', 12.0)
SELECT * FROM #TEMP
DROP TABLE #TEMP
What is the best way to invoke the MEDIAN
function on this table using a GROUP BY
on the id
column? So, I am looking for something like this:
使用id列上的GROUP BY在此表上调用MEDIAN函数的最佳方法是什么?所以,我正在寻找这样的事情:
SELECT id, COMPUTEMEDIAN(Values)
FROM #TEMP
GROUP BY id
My current approach involves using XMLPATH
to combine all values resulting from a GROUP BY
operation into a large string and then passing it to the function but this involves the String splitting operation and for large strings this just slows down everything. Any suggestions?
我当前的方法是使用XMLPATH将GROUP BY操作产生的所有值组合成一个大字符串,然后将其传递给函数,但这涉及String拆分操作,对于大字符串,这只会减慢一切。有什么建议么?
3 个解决方案
#1
1
Since you're using SQL Server 2008, I would suggest writing the aggregate function as a CLR function.
由于您使用的是SQL Server 2008,我建议将聚合函数编写为CLR函数。
http://msdn.microsoft.com/en-us/library/91e6taax(v=vs.80).aspx
Also, people have asked this question before. Perhaps their answers would be helpful
此外,人们之前已经问过这个问题。也许他们的答案会有所帮助
Function to Calculate Median in Sql Server
在Sql Server中计算中值的函数
#2
1
EDIT: I can confirm this works very very well on a large database (30,000 values)
编辑:我可以确认这在大型数据库上非常有效(30,000个值)
Hmm... Just came across this so the following works perfectly fine but am not sure how expensive it can turn out to be:
嗯......刚刚遇到这个,所以以下工作完全没问题,但我不确定它会变得多么昂贵:
SELECT
GroupName,
AVG(Value)
FROM
(
SELECT
GroupName,
cast(Value as decimal(5,2)) Value,
ROW_NUMBER() OVER (
PARTITION BY GroupName
ORDER BY Value ASC) AS RowAsc,
ROW_NUMBER() OVER (
PARTITION BY GroupName
ORDER BY Value DESC) AS RowDesc
FROM #TEMP SOH
) x
WHERE
RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY GroupName
ORDER BY GroupName;
#3
1
No need to use a user defined function! Here's how I would do it:
无需使用用户定义的功能!这是我将如何做到这一点:
CREATE TABLE #TEMP (id VARCHAR(MAX), Value DECIMAL)
INSERT INTO #TEMP VALUES('A', 1.0)
INSERT INTO #TEMP VALUES('A', 2.0)
INSERT INTO #TEMP VALUES('A', 3.0)
INSERT INTO #TEMP VALUES('A', 4.0)
INSERT INTO #TEMP VALUES('B', 10.0)
INSERT INTO #TEMP VALUES('B', 11.0)
INSERT INTO #TEMP VALUES('B', 12.0)
SELECT
(SELECT TOP 1 Value
FROM (SELECT TOP(calcs.medianIndex) Value
FROM #temp
WHERE #temp.ID = calcs.ID ORDER BY Value ASC) AS subSet
ORDER BY subSet.Value DESC), ID
FROM
(SELECT
CASE WHEN count(*) % 2 = 1 THEN count(*)/2 + 1
ELSE count(*)/2
END AS medianIndex,
ID
FROM #TEMP
GROUP BY ID) AS calcs
DROP TABLE #TEMP
Might want to double check the behavior when there is an even number of records.
当存在偶数条记录时,可能要仔细检查行为。
EDIT: After reviewing your work in your Median function, I realize that my answer basically just moved your work out of the function and into your regular query. So... why does your median calculation have to be inside of the user-defined function? It seems alot more difficult that way.
编辑:在您的Median函数中查看您的工作后,我意识到我的答案基本上只是将您的工作从函数中移出并进入常规查询。那么......为什么你的中位数计算必须在用户定义的函数内?这种方式似乎有点困难。
#1
1
Since you're using SQL Server 2008, I would suggest writing the aggregate function as a CLR function.
由于您使用的是SQL Server 2008,我建议将聚合函数编写为CLR函数。
http://msdn.microsoft.com/en-us/library/91e6taax(v=vs.80).aspx
Also, people have asked this question before. Perhaps their answers would be helpful
此外,人们之前已经问过这个问题。也许他们的答案会有所帮助
Function to Calculate Median in Sql Server
在Sql Server中计算中值的函数
#2
1
EDIT: I can confirm this works very very well on a large database (30,000 values)
编辑:我可以确认这在大型数据库上非常有效(30,000个值)
Hmm... Just came across this so the following works perfectly fine but am not sure how expensive it can turn out to be:
嗯......刚刚遇到这个,所以以下工作完全没问题,但我不确定它会变得多么昂贵:
SELECT
GroupName,
AVG(Value)
FROM
(
SELECT
GroupName,
cast(Value as decimal(5,2)) Value,
ROW_NUMBER() OVER (
PARTITION BY GroupName
ORDER BY Value ASC) AS RowAsc,
ROW_NUMBER() OVER (
PARTITION BY GroupName
ORDER BY Value DESC) AS RowDesc
FROM #TEMP SOH
) x
WHERE
RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY GroupName
ORDER BY GroupName;
#3
1
No need to use a user defined function! Here's how I would do it:
无需使用用户定义的功能!这是我将如何做到这一点:
CREATE TABLE #TEMP (id VARCHAR(MAX), Value DECIMAL)
INSERT INTO #TEMP VALUES('A', 1.0)
INSERT INTO #TEMP VALUES('A', 2.0)
INSERT INTO #TEMP VALUES('A', 3.0)
INSERT INTO #TEMP VALUES('A', 4.0)
INSERT INTO #TEMP VALUES('B', 10.0)
INSERT INTO #TEMP VALUES('B', 11.0)
INSERT INTO #TEMP VALUES('B', 12.0)
SELECT
(SELECT TOP 1 Value
FROM (SELECT TOP(calcs.medianIndex) Value
FROM #temp
WHERE #temp.ID = calcs.ID ORDER BY Value ASC) AS subSet
ORDER BY subSet.Value DESC), ID
FROM
(SELECT
CASE WHEN count(*) % 2 = 1 THEN count(*)/2 + 1
ELSE count(*)/2
END AS medianIndex,
ID
FROM #TEMP
GROUP BY ID) AS calcs
DROP TABLE #TEMP
Might want to double check the behavior when there is an even number of records.
当存在偶数条记录时,可能要仔细检查行为。
EDIT: After reviewing your work in your Median function, I realize that my answer basically just moved your work out of the function and into your regular query. So... why does your median calculation have to be inside of the user-defined function? It seems alot more difficult that way.
编辑:在您的Median函数中查看您的工作后,我意识到我的答案基本上只是将您的工作从函数中移出并进入常规查询。那么......为什么你的中位数计算必须在用户定义的函数内?这种方式似乎有点困难。