I have a table that resembles the following (the following code creates a table called #Temp. This has 160,000 rows which is roughly the same number of rows I am working with in my real dataset but there are more columns in the real dataset):
我有一个类似于下面的表(下面的代码创建了一个名为#Temp的表。它有160000行,这与我在真实数据集中处理的行数大致相同,但在真实数据集中有更多的列):
/* Create dummy employees*/
;WITH employeeNumbers
AS ( SELECT 1 AS employeeId
UNION ALL
SELECT employeeNumbers.employeeId + 1
FROM employeeNumbers
WHERE employeeNumbers.employeeId < 16000 )
SELECT *
INTO #employeeId
FROM employeeNumbers
OPTION ( MAXRECURSION 16000 )
/*Create saleItems*/
CREATE TABLE #SalesItems
(
category VARCHAR(100)
, subCategory VARCHAR(100)
, productName VARCHAR(1000)
)
INSERT INTO #SalesItems ( category
, subCategory
, productName )
VALUES ( 'Furniture', 'Bookcases', 'Bush Somerset Collection Bookcase' )
, ( 'Furniture', 'Chairs', 'Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back' )
, ( 'Office Supplies', 'Labels', 'Self-Adhesive Address Labels for Typewriters by Universal' )
, ( 'Furniture', 'Tables', 'Bretford CR4500 Series Slim Rectangular Table' )
, ( 'Office Supplies', 'Storage', 'Eldon Fold n Roll Cart System' )
, ( 'Furniture', 'Furnishings', 'Eldon Expressions Wood and Plastic Desk Accessories, Cherry Wood' )
, ( 'Office Supplies', 'Art', 'Newell 322' )
, ( 'Technology', 'Phones', 'Mitel 5320 IP Phone VoIP phone' )
, ( 'Office Supplies', 'Binders', 'DXL Angle-View Binders with Locking Rings by Samsill' )
, ( 'Technology', 'Phones', 'Samsung Galaxy S8' )
-- Create some random sales figures between 10 and 100
SELECT employeeId
, category
, subCategory
, productName
, CONVERT(DECIMAL(13, 2), 10 + ( 100 - 10 ) * RAND(CHECKSUM(NEWID()))) [Jul 2017]
, CONVERT(DECIMAL(13, 2), 10 + ( 100 - 10 ) * RAND(CHECKSUM(NEWID()))) [Aug 2017]
, CONVERT(DECIMAL(13, 2), 10 + ( 100 - 10 ) * RAND(CHECKSUM(NEWID()))) [Sep 2017]
, CONVERT(DECIMAL(13, 2), 10 + ( 100 - 10 ) * RAND(CHECKSUM(NEWID()))) [Oct 2017]
, CONVERT(DECIMAL(13, 2), 10 + ( 100 - 10 ) * RAND(CHECKSUM(NEWID()))) [Nov 2017]
, CONVERT(DECIMAL(13, 2), 10 + ( 100 - 10 ) * RAND(CHECKSUM(NEWID()))) [Dec 2017]
INTO #Temp
FROM #employeeId
JOIN #SalesItems ON 1 = 1
CREATE INDEX empId
ON #Temp ( employeeId )
SELECT *
FROM #Temp
What I am doing is converting these results to a single json string for every employee id in the table. My query is as follows:
我要做的是将这些结果转换为表中每个员工id的一个json字符串。我的疑问如下:
SELECT DISTINCT x.employeeId
, ( SELECT y.category
, y.subCategory
, y.productName
, [Jul 2017] AS 'salesAmounts.Jul 2017'
, [Aug 2017] AS 'salesAmounts.Aug 2017'
, [Sep 2017] AS 'salesAmounts.Sep 2017'
, [Oct 2017] AS 'salesAmounts.Oct 2017'
, [Nov 2017] AS 'salesAmounts.Nov 2017'
, [Dec 2017] AS 'salesAmounts.Dec 2017'
FROM #Temp y
WHERE y.employeeId = x.employeeId
FOR JSON PATH, INCLUDE_NULL_VALUES ) data
FROM #Temp x
which works, but its performance is not great. In this sample it takes 25 seconds to do this but in my real dataset it takes a lot longer. It takes 1 second to return all the results from the #Temp table. Is there anyway I can redesign my query here to improve the query time? I did try using a cursor to iterate through each employeeId and generating the json string that way, but it still sucks.
这是可行的,但它的表现并不好。在这个示例中,这需要25秒,但是在我的真实数据集中,这需要更长的时间。从#Temp表返回所有结果需要1秒。我是否可以重新设计我的查询以改进查询时间?我确实尝试使用游标遍历每个employeeId并以这种方式生成json字符串,但它仍然很糟糕。
1 个解决方案
#1
3
Read "Performance Surprises and Assumptions : GROUP BY vs. DISTINCT" by Aaron Bertrand
阅读“业绩惊喜与假设:由与截然不同的团队”艾伦·贝特朗。
Try to use GROUP BY
instead of DISTINCT
. DISTINCT
throws duplicates away after the resultset was created, thus calling your call to JSON more often than needed. GROUP BY
should first reduce the set to distinct employeeId
values and perform the JSON only once for each.
尽量用GROUP BY而不是DISTINCT。创建resultset后,不同的文件被丢弃,因此调用JSON的频率比需要的要高。GROUP BY应该首先将集合减少为不同的employeeId值,并对每个值只执行一次JSON。
Cannot test it at the moment, but this should do the same, just faster:
目前还不能测试它,但这应该做同样的事情,只是更快一点:
SELECT x.employeeId
, ( SELECT y.category
, y.subCategory
, y.productName
, [Jul 2017] AS 'salesAmounts.Jul 2017'
, [Aug 2017] AS 'salesAmounts.Aug 2017'
, [Sep 2017] AS 'salesAmounts.Sep 2017'
, [Oct 2017] AS 'salesAmounts.Oct 2017'
, [Nov 2017] AS 'salesAmounts.Nov 2017'
, [Dec 2017] AS 'salesAmounts.Dec 2017'
FROM #Temp y
WHERE y.employeeId = x.employeeId
FOR JSON PATH, INCLUDE_NULL_VALUES ) data
FROM #Temp x
GROUP BY x.EmployeeId
#1
3
Read "Performance Surprises and Assumptions : GROUP BY vs. DISTINCT" by Aaron Bertrand
阅读“业绩惊喜与假设:由与截然不同的团队”艾伦·贝特朗。
Try to use GROUP BY
instead of DISTINCT
. DISTINCT
throws duplicates away after the resultset was created, thus calling your call to JSON more often than needed. GROUP BY
should first reduce the set to distinct employeeId
values and perform the JSON only once for each.
尽量用GROUP BY而不是DISTINCT。创建resultset后,不同的文件被丢弃,因此调用JSON的频率比需要的要高。GROUP BY应该首先将集合减少为不同的employeeId值,并对每个值只执行一次JSON。
Cannot test it at the moment, but this should do the same, just faster:
目前还不能测试它,但这应该做同样的事情,只是更快一点:
SELECT x.employeeId
, ( SELECT y.category
, y.subCategory
, y.productName
, [Jul 2017] AS 'salesAmounts.Jul 2017'
, [Aug 2017] AS 'salesAmounts.Aug 2017'
, [Sep 2017] AS 'salesAmounts.Sep 2017'
, [Oct 2017] AS 'salesAmounts.Oct 2017'
, [Nov 2017] AS 'salesAmounts.Nov 2017'
, [Dec 2017] AS 'salesAmounts.Dec 2017'
FROM #Temp y
WHERE y.employeeId = x.employeeId
FOR JSON PATH, INCLUDE_NULL_VALUES ) data
FROM #Temp x
GROUP BY x.EmployeeId