帮助SQL查询(加入视图?)

时间:2021-02-27 01:58:49

I have a table with columns

我有一个带有列的表

Index, Date

where an Index may have multiple Dates, and my goal is the following: select a list that looks like

索引可能有多个日期,我的目标如下:选择一个看起来像的列表

Index, MinDate, MaxDate

Index,MinDate,MaxDate

where each Index is listed only once, and MinDate (MaxDate) represents the earliest (latest) date present in the entire table for that index. That's easy enough, but then let's constrain this list to appear only for Indexes that are present in a given range of dates.

其中每个索引仅列出一次,而MinDate(MaxDate)表示整个表中该索引的最早(最新)日期。这很容易,但是让我们限制此列表仅出现在给定日期范围内的索引。

So far, I have the following:

到目前为止,我有以下内容:

SELECT 
    Index,
    MIN([Date]),
    MAX([Date])
FROM myTable
WHERE
    Index IN
    (SELECT Index From myTable WHERE [Date] BETWEEN '1/1/2000' AND '12/31/2000')
GROUP BY Index
ORDER BY Index ASC

This is excruciatingly slow. Any way to speed this up? [I am running SQL Server 2000.]

这非常缓慢。有什么方法可以加快速度吗? [我正在运行SQL Server 2000.]

Thanks!

Edited: For clarity.

编辑:为了清楚。

8 个解决方案

#1


4  

I would recommend a derived table approach. Like this:

我建议使用派生表方法。像这样:

SELECT 
     myTable.Index,
     MIN(myTable.[Date]),
     MAX(myTable.[Date])
FROM myTable
     Inner Join (
       SELECT Index 
       From myTable 
       WHERE [Date] BETWEEN '1/1/2000' AND '12/31/2000') As AliasName
       On myTable.Index = AliasName.Index
GROUP BY myTable.Index
ORDER BY myTable.Index ASC

EDIT: Upon further review, there is another way you can create this query. The following query may be faster, slower, or execute in the same amount of time. This, of course, depends on how the table is indexed.

编辑:进一步审查后,您可以通过另一种方式创建此查询。以下查询可能在相同的时间内更快,更慢或执行。当然,这取决于表的索引方式。

Select [Index],
       Min([Date]),
       Max([Date])
From   myTable
Group By [Index]
Having Sum(Case When [Date] Between '1/1/2000' And '12/31/2000' Then 1 Else 0 End) > 0

Under the best circumstances, this query will cause an index scan (not a seek) to filter out rows you don't want to display. I encourage you to run both queries and pick this oen the executes the fastest.

在最好的情况下,此查询将导致索引扫描(而不是搜索)过滤掉您不想显示的行。我鼓励你运行两个查询并选择最快的执行。

#2


2  

I am not an SQL Server expert, but if you can do sub-selects like so, this is potentially faster.

我不是SQL Server专家,但如果您可以像这样进行子选择,这可能会更快。

SELECT Index,
  (SELECT MIN([Date] FROM myTable WHERE Index = m.Index),
  (SELECT MAX([Date] FROM myTable WHERE Index = m.Index)
From myTable m 
WHERE [Date] BETWEEN '1/1/2000' AND '12/31/2000'

#3


1  

Jake,

I think you may need to take a different POV at this problem.

我想你可能需要针对这个问题采取不同的POV。

The grouped selected of **Index, Min(Date), Max(Date)** isn't going to change drastically over the course of a day, in comparison with the range of data its covers (presumably many years)

**指数,最小值(日期),最大值(日期)**的分组选择在一天内不会发生剧烈变化,与其涵盖的数据范围(可能是多年)相比

So one option would be to create a summary table based on the data in the main table... e.g.

因此,一种选择是根据主表中的数据创建汇总表...例如

   SELECT 
       Index, 
       Min(Date) as MinDate, 
       Max(Date) as MaxDate
   INTO 
      MySummaryTable
   FROM 
      MyOriginalTable
   GROUP BY
      Index

This table could be dropped and recreated on a semi-regular (daily) base via a sql job. Equally I'd stick an index on the id column of it.

可以通过sql作业在半常规(每日)基础上删除和重新创建此表。同样,我会在它的id列上粘贴一个索引。

Then when you need to run you're daily query,

然后,当你需要运行你的每日查询,

SELECT 
   summary.Index,
   summary.MinDate,
   summary.MaxDate
FROM
   MyOriginalTable mot
   INNER JOIN MySummaryTable summary
      ON mot.Index = summary.Index  --THIS IS WHERE YOUR CLUSTERED INDEX WILL PAY OFF
WHERE
   mot.Date BETWEEN '2000-01-01' AND '2000-12-31' --THIS IS WHERE A SECOND NC INDEX WILL PAY OFF

#4


1  

This should do it in two table scans.

这应该在两个表扫描中完成。

SELECT
     Index,
    MIN([Date]),
    MAX([Date])
FROM myTable
WHERE
    Index IN
    (SELECT Index From myTable WHERE [Date] BETWEEN '1/1/2000' AND '12/31/2000')
GROUP BY Index
ORDER BY Index ASC
OPTION (MERGE JOIN)

Here's another query. This query gets different results than was originally asked for. This will get all Indexes that have date ranges that overlap the period of interest (even if there is not any actual activity in the period of interest for that index).

这是另一个查询。此查询的结果与最初要求的结果不同。这将获得具有与感兴趣的时段重叠的日期范围的所有索引(即使在该索引的感兴趣的时段中没有任何实际活动)。

SELECT
    Index,
    MIN([Date]),
    MAX([Date])
FROM myTable
GROUP BY Index
HAVING MIN([Date]) < '2001-01-01' AND MAX([Date]) >= '2000-01-01')
ORDER BY Index ASC

So this will return, even if 3 has no data in the 2000 year.

所以即使3年没有数据,这将会回归。

3, 1998-01-01, 2005-01-01

3,1998-01-01,2005-01-01

#5


0  

Putting a clustered index on the date column would greatly speed up this query, but obviously it may slow down other currently fast running queries on the table.

在日期列上放置聚簇索引可以大大加快此查询的速度,但显然它可能会减慢当前其他快速运行的查询速度。

#6


0  

Your explanation isn't very clear:

你的解释不是很清楚:

where each Index is listed only once, and MinDate (MaxDate) represents the earliest (latest) date present in the entire table.

其中每个索引仅列出一次,而MinDate(MaxDate)表示整个表中最早(最新)的日期。

If that is the case, you should either return two resultsets or store the answer such as:

如果是这种情况,您应该返回两个结果集或存储答案,例如:

DECLARE @MaxDate datetime, @MinDate datetime
SELECT
    @MinDate = MIN([Date]),
    @MaxDate = MAX([Date])
FROM myTable
--
SELECT
    [Index],
    @MinDate,
    @MaxDate
FROM myTable
WHERE [Date] BETWEEN '1/1/2000' AND '12/31/2000'

If you want to know the minimum/maximum for the entire table as well as for the [Index], then try the following in combination with the previous code:

如果您想知道整个表格以及[索引]的最小值/最大值,请尝试以下代码并结合前面的代码:

SELECT
    [Index],
    MIN([Date]) AS IndexMinDate,
    MAX([Date]) AS IndexMaxDate,
    @MinDate AS TableMinDate,
    @MaxDate AS TableMaxDate
FROM myTable
WHERE [Date] BETWEEN '1/1/2000' AND '12/31/2000'
GROUP BY [Index]
ORDER BY [Index] ASC

Also look into indexing the columns if possible and the query plan. Good luck.

如果可能,还要查看列的索引和查询计划。祝好运。

#7


0  

An EXISTS operator might be faster than the subquery:

EXISTS运算符可能比子查询更快:

SELECT
     t1.Index,
     MIN(t1.[Date]),
     MAX(t1.[Date])
FROM
     myTable t1
WHERE
     EXISTS (SELECT * FROM myTable t2 WHERE t2.Index = t1.Index AND t2.[Date] >= '1/1/2000' AND t2.[Date] < '1/1/2001')
 GROUP BY
      t1.Index

It would depend on table size and indexing I suppose. I like G Mastros HAVING clause solution too.

这取决于我认为的表大小和索引。我也喜欢G Mastros HAVING子句解决方案。

Another important note... if your date is actually a DATETIME and there is a time component in any of your dates (either now or in the future) you could potentially miss some results if an index had a date of 12/31/2000 with any sort of time besides midnight. Just something to keep in mind. You could alternatively use YEAR([Date]) = 2000 (assuming MS SQL Server here). I don't know if the DB would be smart enough to use an index on the date column if you did that though.

另一个重要的注意事项......如果您的日期实际上是DATETIME并且您的任何日期(现在或将来)都有时间组件,如果索引的日期为12/31/2000,您可能会错过一些结果除了午夜之外的任何时间。请记住一些事情。您也可以使用YEAR([Date])= 2000(假设MS SQL Server在这里)。我不知道数据库是否足够智能,如果你这样做,那么在日期列上使用索引。

EDIT: Added GROUP BY and changed date logic thanks to the comment

编辑:由于评论,添加了GROUP BY并更改了日期逻辑

#8


-1  

You don't need the sub-select in the where clause. Also, you could add indexes to the date column. How many rows in the table?

您不需要where子句中的子选择。此外,您可以将索引添加到日期列。表中有多少行?

SELECT
    [INDEX],
    MIN ( [Date] ),
    MAX ( [Date] )
FROM
    myTable
WHERE 
    [Date] Between '1/1/2000' And '12/31/2000'
GROUP BY
    [Index]
ORDER BY
    [INDEX] ASC

#1


4  

I would recommend a derived table approach. Like this:

我建议使用派生表方法。像这样:

SELECT 
     myTable.Index,
     MIN(myTable.[Date]),
     MAX(myTable.[Date])
FROM myTable
     Inner Join (
       SELECT Index 
       From myTable 
       WHERE [Date] BETWEEN '1/1/2000' AND '12/31/2000') As AliasName
       On myTable.Index = AliasName.Index
GROUP BY myTable.Index
ORDER BY myTable.Index ASC

EDIT: Upon further review, there is another way you can create this query. The following query may be faster, slower, or execute in the same amount of time. This, of course, depends on how the table is indexed.

编辑:进一步审查后,您可以通过另一种方式创建此查询。以下查询可能在相同的时间内更快,更慢或执行。当然,这取决于表的索引方式。

Select [Index],
       Min([Date]),
       Max([Date])
From   myTable
Group By [Index]
Having Sum(Case When [Date] Between '1/1/2000' And '12/31/2000' Then 1 Else 0 End) > 0

Under the best circumstances, this query will cause an index scan (not a seek) to filter out rows you don't want to display. I encourage you to run both queries and pick this oen the executes the fastest.

在最好的情况下,此查询将导致索引扫描(而不是搜索)过滤掉您不想显示的行。我鼓励你运行两个查询并选择最快的执行。

#2


2  

I am not an SQL Server expert, but if you can do sub-selects like so, this is potentially faster.

我不是SQL Server专家,但如果您可以像这样进行子选择,这可能会更快。

SELECT Index,
  (SELECT MIN([Date] FROM myTable WHERE Index = m.Index),
  (SELECT MAX([Date] FROM myTable WHERE Index = m.Index)
From myTable m 
WHERE [Date] BETWEEN '1/1/2000' AND '12/31/2000'

#3


1  

Jake,

I think you may need to take a different POV at this problem.

我想你可能需要针对这个问题采取不同的POV。

The grouped selected of **Index, Min(Date), Max(Date)** isn't going to change drastically over the course of a day, in comparison with the range of data its covers (presumably many years)

**指数,最小值(日期),最大值(日期)**的分组选择在一天内不会发生剧烈变化,与其涵盖的数据范围(可能是多年)相比

So one option would be to create a summary table based on the data in the main table... e.g.

因此,一种选择是根据主表中的数据创建汇总表...例如

   SELECT 
       Index, 
       Min(Date) as MinDate, 
       Max(Date) as MaxDate
   INTO 
      MySummaryTable
   FROM 
      MyOriginalTable
   GROUP BY
      Index

This table could be dropped and recreated on a semi-regular (daily) base via a sql job. Equally I'd stick an index on the id column of it.

可以通过sql作业在半常规(每日)基础上删除和重新创建此表。同样,我会在它的id列上粘贴一个索引。

Then when you need to run you're daily query,

然后,当你需要运行你的每日查询,

SELECT 
   summary.Index,
   summary.MinDate,
   summary.MaxDate
FROM
   MyOriginalTable mot
   INNER JOIN MySummaryTable summary
      ON mot.Index = summary.Index  --THIS IS WHERE YOUR CLUSTERED INDEX WILL PAY OFF
WHERE
   mot.Date BETWEEN '2000-01-01' AND '2000-12-31' --THIS IS WHERE A SECOND NC INDEX WILL PAY OFF

#4


1  

This should do it in two table scans.

这应该在两个表扫描中完成。

SELECT
     Index,
    MIN([Date]),
    MAX([Date])
FROM myTable
WHERE
    Index IN
    (SELECT Index From myTable WHERE [Date] BETWEEN '1/1/2000' AND '12/31/2000')
GROUP BY Index
ORDER BY Index ASC
OPTION (MERGE JOIN)

Here's another query. This query gets different results than was originally asked for. This will get all Indexes that have date ranges that overlap the period of interest (even if there is not any actual activity in the period of interest for that index).

这是另一个查询。此查询的结果与最初要求的结果不同。这将获得具有与感兴趣的时段重叠的日期范围的所有索引(即使在该索引的感兴趣的时段中没有任何实际活动)。

SELECT
    Index,
    MIN([Date]),
    MAX([Date])
FROM myTable
GROUP BY Index
HAVING MIN([Date]) < '2001-01-01' AND MAX([Date]) >= '2000-01-01')
ORDER BY Index ASC

So this will return, even if 3 has no data in the 2000 year.

所以即使3年没有数据,这将会回归。

3, 1998-01-01, 2005-01-01

3,1998-01-01,2005-01-01

#5


0  

Putting a clustered index on the date column would greatly speed up this query, but obviously it may slow down other currently fast running queries on the table.

在日期列上放置聚簇索引可以大大加快此查询的速度,但显然它可能会减慢当前其他快速运行的查询速度。

#6


0  

Your explanation isn't very clear:

你的解释不是很清楚:

where each Index is listed only once, and MinDate (MaxDate) represents the earliest (latest) date present in the entire table.

其中每个索引仅列出一次,而MinDate(MaxDate)表示整个表中最早(最新)的日期。

If that is the case, you should either return two resultsets or store the answer such as:

如果是这种情况,您应该返回两个结果集或存储答案,例如:

DECLARE @MaxDate datetime, @MinDate datetime
SELECT
    @MinDate = MIN([Date]),
    @MaxDate = MAX([Date])
FROM myTable
--
SELECT
    [Index],
    @MinDate,
    @MaxDate
FROM myTable
WHERE [Date] BETWEEN '1/1/2000' AND '12/31/2000'

If you want to know the minimum/maximum for the entire table as well as for the [Index], then try the following in combination with the previous code:

如果您想知道整个表格以及[索引]的最小值/最大值,请尝试以下代码并结合前面的代码:

SELECT
    [Index],
    MIN([Date]) AS IndexMinDate,
    MAX([Date]) AS IndexMaxDate,
    @MinDate AS TableMinDate,
    @MaxDate AS TableMaxDate
FROM myTable
WHERE [Date] BETWEEN '1/1/2000' AND '12/31/2000'
GROUP BY [Index]
ORDER BY [Index] ASC

Also look into indexing the columns if possible and the query plan. Good luck.

如果可能,还要查看列的索引和查询计划。祝好运。

#7


0  

An EXISTS operator might be faster than the subquery:

EXISTS运算符可能比子查询更快:

SELECT
     t1.Index,
     MIN(t1.[Date]),
     MAX(t1.[Date])
FROM
     myTable t1
WHERE
     EXISTS (SELECT * FROM myTable t2 WHERE t2.Index = t1.Index AND t2.[Date] >= '1/1/2000' AND t2.[Date] < '1/1/2001')
 GROUP BY
      t1.Index

It would depend on table size and indexing I suppose. I like G Mastros HAVING clause solution too.

这取决于我认为的表大小和索引。我也喜欢G Mastros HAVING子句解决方案。

Another important note... if your date is actually a DATETIME and there is a time component in any of your dates (either now or in the future) you could potentially miss some results if an index had a date of 12/31/2000 with any sort of time besides midnight. Just something to keep in mind. You could alternatively use YEAR([Date]) = 2000 (assuming MS SQL Server here). I don't know if the DB would be smart enough to use an index on the date column if you did that though.

另一个重要的注意事项......如果您的日期实际上是DATETIME并且您的任何日期(现在或将来)都有时间组件,如果索引的日期为12/31/2000,您可能会错过一些结果除了午夜之外的任何时间。请记住一些事情。您也可以使用YEAR([Date])= 2000(假设MS SQL Server在这里)。我不知道数据库是否足够智能,如果你这样做,那么在日期列上使用索引。

EDIT: Added GROUP BY and changed date logic thanks to the comment

编辑:由于评论,添加了GROUP BY并更改了日期逻辑

#8


-1  

You don't need the sub-select in the where clause. Also, you could add indexes to the date column. How many rows in the table?

您不需要where子句中的子选择。此外,您可以将索引添加到日期列。表中有多少行?

SELECT
    [INDEX],
    MIN ( [Date] ),
    MAX ( [Date] )
FROM
    myTable
WHERE 
    [Date] Between '1/1/2000' And '12/31/2000'
GROUP BY
    [Index]
ORDER BY
    [INDEX] ASC