Excuse the long post, but below I have included a full script to generate and populate my test harness.
请原谅长篇文章,但下面我已经包含了一个完整的脚本来生成和填充我的测试工具。
My test harness has the following tables
我的测试工具有以下表格
|--------| |-------------| |-----| |--------------|
|Column | |ColumnValue | |Row | |RowColumnValue|
|--------| |-------------| |-----| |--------------|
|ColumnId| |ColumnValueId| |RowId| |RowId |
|Name | |ColumnId | |Name | |ColumnValueId |
|--------| |Value | |-----| |--------------|
|-------------|
They represent Rows and Columns in a table. The possible values of a cell in a column are stored in ColumnValue. The selected values for a Row are stored in RowColumnValue. (I hope that is clear)
它们表示表中的行和列。列中单元格的可能值存储在ColumnValue中。行的选定值存储在RowColumnValue中。 (我希望很清楚)
I have populated the data with 10 Columns, 10,000 Rows, 50 Column Values per column (500) and 25 selected column values per Row (250,000).
我已经使用10列,10,000行,每列50列值(500)和每行25个选定列值(250,000)填充数据。
I have some dynamic sql which returns all the rows, pivotted with the columns and contains an XML list of the selected column values for each column.
我有一些动态sql,它返回所有行,与列一起旋转,并包含每列所选列值的XML列表。
Note: For performance testing purposes I have wrapped the query in a SELECT COUNT(*)
so that the query will not return a large amount of data over the network.
注意:出于性能测试的目的,我将查询包装在SELECT COUNT(*)中,以便查询不会通过网络返回大量数据。
My test harness runs this query (with the count) in approx 5-6 seconds. The execution plan shows that 92% of the query is spent on a Clustered Index Seek on [ColumnValue].[PK_ColumnValue]
. The Client Statistics show a Client processing time, Total execution time and Wait time on server replies all at 0.
我的测试工具在大约5-6秒内运行此查询(带计数)。执行计划显示92%的查询花费在[ColumnValue]上的Clustered Index Seek上。[PK_ColumnValue]。客户端统计信息显示客户端处理时间,服务器上的总执行时间和等待时间全部为0。
I realise that 250k rows in the RowColumnValue table is quite a lot and I might be expecting too much from SQL Server. However, my expectation is that query should be able to run a lot faster than this. Or at the very least the execution plan should be presenting a different bottle neck rather than the Clustered Index Seek.
我意识到RowColumnValue表中的250k行非常多,而且我可能期望从SQL Server中获得太多。但是,我的期望是查询应该能够比这更快地运行。或者至少执行计划应该呈现一个不同的瓶颈,而不是聚集指数寻求。
Can anyone shed any light on the problem or give me some suggestions on how to make this more efficient?
任何人都可以对这个问题有所了解,或者就如何提高效率给我一些建议?
Dynamic SQL that runs the pivot to display the table:
运行数据透视表以显示表的动态SQL:
DECLARE @columnDataList NVARCHAR(MAX)
SELECT
@columnDataList =
CAST
(
(
SELECT
', CONVERT(xml, [PVT].[' + [Column].[Name] + ']) [Column.' + [Column].[Name] + ']'
FROM
[Column]
ORDER BY
[Column].[Name]
FOR XML PATH('')
) AS XML
).value('.', 'NVARCHAR(MAX)')
DECLARE @columnPivotList NVARCHAR(MAX)
SELECT
@columnPivotList =
CAST
(
(
SELECT
', [' + [Column].[Name] + ']'
FROM
[Column]
ORDER BY
[Column].[Name]
FOR XML PATH('')
) AS XML
).value('.', 'NVARCHAR(MAX)')
EXEC('
SELECT
COUNT(*)
FROM
(
SELECT
[PVT].[RowId]
' + @columnDataList + '
FROM
(
SELECT
[Row].[RowId],
[Column].[Name] [ColumnName],
[XmlRowColumnValues].[XmlRowColumnValues] [XmlRowColumnValues]
FROM
[Row]
CROSS JOIN
[Column]
CROSS APPLY
(
SELECT
[ColumnValue].[Value] [Value]
FROM
[RowColumnValue]
INNER JOIN
[ColumnValue]
ON
[ColumnValue].[ColumnValueId] = [RowColumnValue].[ColumnValueId]
WHERE
[RowColumnValue].[RowId] = [Row].[RowId]
AND
[ColumnValue].[ColumnId] = [Column].[ColumnId]
FOR XML PATH (''''), ROOT(''Values'')
) [XmlRowColumnValues] ([XmlRowColumnValues])
) [PivotData]
PIVOT
(
MAX([PivotData].[XmlRowColumnValues])
FOR
[ColumnName]
IN
([0]' + @columnPivotList + ')
) PVT
) RowColumnData
')
Script to generate and populate database:
生成和填充数据库的脚本:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Row](
[RowId] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](50) NOT NULL,
CONSTRAINT [PK_Row] PRIMARY KEY CLUSTERED
(
[RowId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Column](
[ColumnId] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](50) NOT NULL,
CONSTRAINT [PK_Column] PRIMARY KEY CLUSTERED
(
[ColumnId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[RowColumnValue](
[RowId] [int] NOT NULL,
[ColumnValueId] [int] NOT NULL,
CONSTRAINT [PK_RowColumnValue] PRIMARY KEY CLUSTERED
(
[RowId] ASC,
[ColumnValueId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[ColumnValue](
[ColumnValueId] [int] IDENTITY(1,1) NOT NULL,
[ColumnId] [int] NOT NULL,
[Value] [nvarchar](50) NOT NULL,
CONSTRAINT [PK_ColumnValue] PRIMARY KEY CLUSTERED
(
[ColumnValueId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [FKIX_ColumnValue_ColumnId] ON [dbo].[ColumnValue]
(
[ColumnId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
ALTER TABLE [dbo].[ColumnValue] WITH CHECK ADD CONSTRAINT [FK_ColumnValue_Column] FOREIGN KEY([ColumnId])
REFERENCES [dbo].[Column] ([ColumnId])
GO
ALTER TABLE [dbo].[ColumnValue] CHECK CONSTRAINT [FK_ColumnValue_Column]
GO
ALTER TABLE [dbo].[RowColumnValue] WITH CHECK ADD CONSTRAINT [FK_RowColumnValue_ColumnValue] FOREIGN KEY([ColumnValueId])
REFERENCES [dbo].[ColumnValue] ([ColumnValueId])
GO
ALTER TABLE [dbo].[RowColumnValue] CHECK CONSTRAINT [FK_RowColumnValue_ColumnValue]
GO
ALTER TABLE [dbo].[RowColumnValue] WITH CHECK ADD CONSTRAINT [FK_RowColumnValue_Row] FOREIGN KEY([RowId])
REFERENCES [dbo].[Row] ([RowId])
GO
ALTER TABLE [dbo].[RowColumnValue] CHECK CONSTRAINT [FK_RowColumnValue_Row]
GO
DECLARE @columnLoop INT
DECLARE @columnValueLoop INT
DECLARE @rowLoop INT
DECLARE @columnId INT
DECLARE @columnValueId INT
DECLARE @rowId INT
SET @columnLoop = 0
WHILE @columnLoop < 10
BEGIN
INSERT INTO [Column] ([Name]) VALUES(NEWID())
SET @columnId = @@IDENTITY
SET @columnValueLoop = 0
WHILE @columnValueLoop < 50
BEGIN
INSERT INTO [ColumnValue] ([ColumnId], [Value]) VALUES(@columnId, NEWID())
SET @columnValueLoop = @columnValueLoop + 1
END
SET @columnLoop = @columnLoop + 1
END
SET @rowLoop = 0
WHILE @rowLoop < 10000
BEGIN
INSERT INTO [Row] ([Name]) VALUES(NEWID())
SET @rowId = @@IDENTITY
INSERT INTO [RowColumnValue] ([RowId], [ColumnValueId]) SELECT TOP 25 @rowId, [ColumnValueId] FROM [ColumnValue] ORDER BY NEWID()
SET @rowLoop = @rowLoop + 1
END
1 个解决方案
#1
2
I agree with @marc_s and @KM that this grand design is doomed from the start.
我同意@marc_s和@KM这个宏伟的设计从一开始就注定要失败。
Millions of Microsoft's developer-hours have gone into building and fine-tuning a robust and powerful database engine, but you're going to reinvent it all by cramming everything into a small number of generic tables and re-implementing everything that SQL Server is already engineered to do for you.
数以百万计的微软开发人员已经开始构建和微调强大而强大的数据库引擎,但是你要通过将所有东西塞进少量通用表并重新实现SQL Server已经完成的所有内容来重新创建它们。专为你而设计。
SQL Server already has tables containing names of entities, names of columns, and so on. The fact that you don't normally interact directly with these system tables is a good thing: It's called abstraction. And it's unlikely that you're going to do a better job of implementing that abstraction than SQL Server does.
SQL Server已经有包含实体名称,列名称等的表。您通常不直接与这些系统表交互的事实是一件好事:它被称为抽象。并且你不太可能比SQL Server更好地实现抽象。
At the end of the day, with your approach (a) even the simplest queries will be monstrous; and (b) you're never going to come close to optimal performance, because you're forgoing all of the query optimization that you'd otherwise get for free.
在一天结束时,用你的方法(a)即使最简单的查询也是怪异的; (b)你永远不会接近最佳性能,因为你放弃了所有你免费获得的查询优化。
Without knowing anything more about your application or your requirements, it's hard to give any sort of specific advice. But I'd suggest that some good old normalization would go a long way. Any well-implemented, non-trivial database has a lot of tables; ten tables plus ten xtab tables shouldn't scare you away.
如果不了解您的应用程序或您的要求,很难提供任何具体的建议。但我建议一些好的旧规范化会有很长的路要走。任何执行良好,非平凡的数据库都有很多表;十张桌子加十张xtab桌子不应该吓跑你。
And don't be afraid of SQL code generation as a way of implementing common interfaces across disparate tables. A little can go a long way.
并且不要害怕SQL代码生成作为跨不同表实现公共接口的方式。有点可以走很长的路。
#1
2
I agree with @marc_s and @KM that this grand design is doomed from the start.
我同意@marc_s和@KM这个宏伟的设计从一开始就注定要失败。
Millions of Microsoft's developer-hours have gone into building and fine-tuning a robust and powerful database engine, but you're going to reinvent it all by cramming everything into a small number of generic tables and re-implementing everything that SQL Server is already engineered to do for you.
数以百万计的微软开发人员已经开始构建和微调强大而强大的数据库引擎,但是你要通过将所有东西塞进少量通用表并重新实现SQL Server已经完成的所有内容来重新创建它们。专为你而设计。
SQL Server already has tables containing names of entities, names of columns, and so on. The fact that you don't normally interact directly with these system tables is a good thing: It's called abstraction. And it's unlikely that you're going to do a better job of implementing that abstraction than SQL Server does.
SQL Server已经有包含实体名称,列名称等的表。您通常不直接与这些系统表交互的事实是一件好事:它被称为抽象。并且你不太可能比SQL Server更好地实现抽象。
At the end of the day, with your approach (a) even the simplest queries will be monstrous; and (b) you're never going to come close to optimal performance, because you're forgoing all of the query optimization that you'd otherwise get for free.
在一天结束时,用你的方法(a)即使最简单的查询也是怪异的; (b)你永远不会接近最佳性能,因为你放弃了所有你免费获得的查询优化。
Without knowing anything more about your application or your requirements, it's hard to give any sort of specific advice. But I'd suggest that some good old normalization would go a long way. Any well-implemented, non-trivial database has a lot of tables; ten tables plus ten xtab tables shouldn't scare you away.
如果不了解您的应用程序或您的要求,很难提供任何具体的建议。但我建议一些好的旧规范化会有很长的路要走。任何执行良好,非平凡的数据库都有很多表;十张桌子加十张xtab桌子不应该吓跑你。
And don't be afraid of SQL code generation as a way of implementing common interfaces across disparate tables. A little can go a long way.
并且不要害怕SQL代码生成作为跨不同表实现公共接口的方式。有点可以走很长的路。