是否有高效的SQL来查询大表的一部分

时间:2021-06-04 01:07:33

The typical way of selecting data is:

选择数据的典型方法是:

select * from my_table

But what if the table contains 10 million records and you only want records 300,010 to 300,020

但是,如果该表包含1000万条记录,并且您只需要记录300,010到300,020

Is there a way to create a SQL statement on Microsoft SQL that only gets 10 records at once?

有没有办法在Microsoft SQL上创建一次只能获取10条记录的SQL语句?

E.g.

select * from my_table from records 300,010 to 300,020

This would be way more efficient than retrieving 10 million records across the network, storing them in the IIS server and then counting to the records you want.

这比通过网络检索1000万条记录,将它们存储在IIS服务器中然后计数到您想要的记录更有效。

6 个解决方案

#1


Try looking at info about pagination. Here's a short summary of it for SQL Server: http://www.singingeels.com/Articles/Pagination_In_SQL_Server_2005.aspx.

尝试查看有关分页的信息。以下是SQL Server的简短摘要:http://www.singingeels.com/Articles/Pagination_In_SQL_Server_2005.aspx。

#2


SELECT * FROM my_table is just the tip of the iceberg. Assuming you're talking a table with an identity field for the primary key, you can just say:

SELECT * FROM my_table只是冰山一角。假设你正在谈论一个带有主键身份字段的表,你可以说:

SELECT * FROM my_table WHERE ID >= 300010 AND ID <= 300020

You should also know that selecting * is considered poor practice in many circles. They want you specify the exact column list.

您还应该知道,在许多圈子中选择*被认为是不好的做法。他们希望您指定确切的列列表。

#3


Absolutely. On MySQL and PostgreSQL (the two databases I've used), the syntax would be

绝对。在MySQL和PostgreSQL(我使用的两个数据库)上,语法是

SELECT [columns] FROM table LIMIT 10 OFFSET 300010;

On MS SQL, it's something like SELECT TOP 10 ...; I don't know the syntax for offsetting the record list.

在MS SQL上,它就像SELECT TOP 10 ......;我不知道偏移记录列表的语法。

Note that you never want to use SELECT *; it's a maintenance nightmare if anything ever changes. This query, though, is going to be incredibly slow since your database will have to scan through and throw away the first 300,010 records to get to the 10 you want. It'll also be unpredictable, since you haven't told the database which order you want the records in.

请注意,您永远不想使用SELECT *;如果发生任何变化,这是一场维护噩梦。但是,这个查询将非常缓慢,因为您的数据库必须扫描并丢弃前300,010条记录才能达到您想要的10条记录。它也是不可预测的,因为你没有告诉数据库你想要记录的顺序。

This is the core of SQL: tell it which 10 records you want, identified by a key in a specific range, and the database will do its best to grab and return those records with minimal work. Look up any tutorial on SQL for more information on how it works.

这是SQL的核心:告诉它您需要哪10条记录,由特定范围内的密钥标识,数据库将尽最大努力以最少的工作获取并返回这些记录。查找有关SQL的任何教程,了解有关它如何工作的更多信息。

#4


When working with large tables, it is often a good idea to make use of Partitioning techniques available in SQL Server.

使用大型表时,通常最好使用SQL Server中提供的分区技术。

The rules of your partitition function typically dictate that only a range of data can reside within a given partition. You could split your partitions by date range or ID for example.

您的partitition函数的规则通常规定只有一系列数据可以驻留在给定的分区中。例如,您可以按日期范围或ID拆分分区。

In order to select from a particular partition you would use a query similar to the following.

要从特定分区中进行选择,您将使用类似于以下内容的查询。

SELECT <Column Name1>…/* 
FROM <Table Name> 
WHERE $PARTITION.<Partition Function Name>(<Column Name>) = <Partition Number>

Take a look at the following white paper for more detailed infromation on partitioning in SQL Server 2005.

请查看以下白皮书,以获取有关SQL Server 2005中分区的更详细信息。

http://msdn.microsoft.com/en-us/library/ms345146.aspx

I hope this helps however please feel free to pose further questions.

我希望这有帮助,但请随时提出进一步的问题。

Cheers, John

#5


I use wrapper queries to select the core query and then just isolate the ROW numbers that i wish to take from the query - this allows the SQL server to do all the heavy lifting inside the CORE query and just pass out the small amount of the table that i have requested. All you need to do is pass the [start_row_variable] and the [end_row_variable] into the SQL query.

我使用包装器查询来选择核心查询,然后只是将我希望从查询中获取的ROW数字隔离开来 - 这允许SQL服务器在CORE查询中完成所有繁重工作,并只传递少量的表我要求的。您需要做的就是将[start_row_variable]和[end_row_variable]传递给SQL查询。

NOTE: The order clause is specified OUTSIDE the core query [sql_order_clause]

注意:在核心查询外部指定order子句[sql_order_clause]

w1 and w2 are TEMPORARY table created by the SQL server as the wrapper tables.

w1和w2是由SQL Server作为包装表创建的TEMPORARY表。

SELECT
    w1.*
FROM(   
    SELECT w2.*, 
    ROW_NUMBER() OVER ([sql_order_clause]) AS ROW
    FROM (

        <!--- CORE QUERY START --->
        SELECT [columns]
        FROM [table_name]
        WHERE [sql_string]
        <!--- CORE QUERY END --->

   ) AS w2
) AS w1
WHERE ROW BETWEEN [start_row_variable] AND [end_row_variable]

This method has hugely optimized my database systems. It works very well.

这种方法极大地优化了我的数据库系统。它工作得很好。

IMPORTANT: Be sure to always explicitly specify only the exact columns you wish to retrieve in the core query as fetching unnecessary data in these CORE queries can cost you serious overhead

重要信息:确保始终只在核心查询中明确指定要检索的确切列,因为在这些CORE查询中获取不必要的数据会花费您严重的开销

#6


Use TOP to select only a limited amont of rows like:

使用TOP仅选择有限的行,例如:

SELECT TOP 10 * FROM my_table WHERE ID >= 300010

SELECT TOP 10 * FROM my_table WHERE ID> = 300010

Add an ORDER BY if you want the results in a particular order.

如果希望结果按特定顺序添加ORDER BY。

To be efficient there has to be an index on the ID column.

为了提高效率,必须在ID列上有一个索引。

#1


Try looking at info about pagination. Here's a short summary of it for SQL Server: http://www.singingeels.com/Articles/Pagination_In_SQL_Server_2005.aspx.

尝试查看有关分页的信息。以下是SQL Server的简短摘要:http://www.singingeels.com/Articles/Pagination_In_SQL_Server_2005.aspx。

#2


SELECT * FROM my_table is just the tip of the iceberg. Assuming you're talking a table with an identity field for the primary key, you can just say:

SELECT * FROM my_table只是冰山一角。假设你正在谈论一个带有主键身份字段的表,你可以说:

SELECT * FROM my_table WHERE ID >= 300010 AND ID <= 300020

You should also know that selecting * is considered poor practice in many circles. They want you specify the exact column list.

您还应该知道,在许多圈子中选择*被认为是不好的做法。他们希望您指定确切的列列表。

#3


Absolutely. On MySQL and PostgreSQL (the two databases I've used), the syntax would be

绝对。在MySQL和PostgreSQL(我使用的两个数据库)上,语法是

SELECT [columns] FROM table LIMIT 10 OFFSET 300010;

On MS SQL, it's something like SELECT TOP 10 ...; I don't know the syntax for offsetting the record list.

在MS SQL上,它就像SELECT TOP 10 ......;我不知道偏移记录列表的语法。

Note that you never want to use SELECT *; it's a maintenance nightmare if anything ever changes. This query, though, is going to be incredibly slow since your database will have to scan through and throw away the first 300,010 records to get to the 10 you want. It'll also be unpredictable, since you haven't told the database which order you want the records in.

请注意,您永远不想使用SELECT *;如果发生任何变化,这是一场维护噩梦。但是,这个查询将非常缓慢,因为您的数据库必须扫描并丢弃前300,010条记录才能达到您想要的10条记录。它也是不可预测的,因为你没有告诉数据库你想要记录的顺序。

This is the core of SQL: tell it which 10 records you want, identified by a key in a specific range, and the database will do its best to grab and return those records with minimal work. Look up any tutorial on SQL for more information on how it works.

这是SQL的核心:告诉它您需要哪10条记录,由特定范围内的密钥标识,数据库将尽最大努力以最少的工作获取并返回这些记录。查找有关SQL的任何教程,了解有关它如何工作的更多信息。

#4


When working with large tables, it is often a good idea to make use of Partitioning techniques available in SQL Server.

使用大型表时,通常最好使用SQL Server中提供的分区技术。

The rules of your partitition function typically dictate that only a range of data can reside within a given partition. You could split your partitions by date range or ID for example.

您的partitition函数的规则通常规定只有一系列数据可以驻留在给定的分区中。例如,您可以按日期范围或ID拆分分区。

In order to select from a particular partition you would use a query similar to the following.

要从特定分区中进行选择,您将使用类似于以下内容的查询。

SELECT <Column Name1>…/* 
FROM <Table Name> 
WHERE $PARTITION.<Partition Function Name>(<Column Name>) = <Partition Number>

Take a look at the following white paper for more detailed infromation on partitioning in SQL Server 2005.

请查看以下白皮书,以获取有关SQL Server 2005中分区的更详细信息。

http://msdn.microsoft.com/en-us/library/ms345146.aspx

I hope this helps however please feel free to pose further questions.

我希望这有帮助,但请随时提出进一步的问题。

Cheers, John

#5


I use wrapper queries to select the core query and then just isolate the ROW numbers that i wish to take from the query - this allows the SQL server to do all the heavy lifting inside the CORE query and just pass out the small amount of the table that i have requested. All you need to do is pass the [start_row_variable] and the [end_row_variable] into the SQL query.

我使用包装器查询来选择核心查询,然后只是将我希望从查询中获取的ROW数字隔离开来 - 这允许SQL服务器在CORE查询中完成所有繁重工作,并只传递少量的表我要求的。您需要做的就是将[start_row_variable]和[end_row_variable]传递给SQL查询。

NOTE: The order clause is specified OUTSIDE the core query [sql_order_clause]

注意:在核心查询外部指定order子句[sql_order_clause]

w1 and w2 are TEMPORARY table created by the SQL server as the wrapper tables.

w1和w2是由SQL Server作为包装表创建的TEMPORARY表。

SELECT
    w1.*
FROM(   
    SELECT w2.*, 
    ROW_NUMBER() OVER ([sql_order_clause]) AS ROW
    FROM (

        <!--- CORE QUERY START --->
        SELECT [columns]
        FROM [table_name]
        WHERE [sql_string]
        <!--- CORE QUERY END --->

   ) AS w2
) AS w1
WHERE ROW BETWEEN [start_row_variable] AND [end_row_variable]

This method has hugely optimized my database systems. It works very well.

这种方法极大地优化了我的数据库系统。它工作得很好。

IMPORTANT: Be sure to always explicitly specify only the exact columns you wish to retrieve in the core query as fetching unnecessary data in these CORE queries can cost you serious overhead

重要信息:确保始终只在核心查询中明确指定要检索的确切列,因为在这些CORE查询中获取不必要的数据会花费您严重的开销

#6


Use TOP to select only a limited amont of rows like:

使用TOP仅选择有限的行,例如:

SELECT TOP 10 * FROM my_table WHERE ID >= 300010

SELECT TOP 10 * FROM my_table WHERE ID> = 300010

Add an ORDER BY if you want the results in a particular order.

如果希望结果按特定顺序添加ORDER BY。

To be efficient there has to be an index on the ID column.

为了提高效率,必须在ID列上有一个索引。