如何从包含百万条记录的数据库中选择第一个“N”条记录?

时间:2022-03-24 22:14:57

I have an oracle database populated with million records. I am trying to write a SQL query that returns the first 'N" sorted records ( say 100 records) from the database based on certain condition.

我有一个填充了百万条记录的oracle数据库。我正在尝试编写一个SQL查询,该查询根据特定条件从数据库返回第一个“N”个排序记录(比如100个记录)。

SELECT * 
FROM myTable 
Where SIZE > 2000 
ORDER BY NAME DESC

Then programmatically select first N records.

然后以编程方式选择前N个记录。

The problem with this approach is :

这种方法的问题是:

  • The query results into half million records and "ORDER BY NAME" causes all the records to be sorted on NAME in the descending order. This sorting is taking lot of time. (nearly 30-40 seconds. If I omit ORDER BY, it takes only 1 second).
  • 查询结果为50万条记录,“ORDER BY NAME”导致所有记录按降序排序在NAME上。这种分类花费了大量时间。 (将近30-40秒。如果我省略ORDER BY,则只需1秒钟)。
  • After the sort I am interested in only first N (100) records. So the sorting of complete records is not useful.
  • 排序后,我只对前N(100)条记录感兴趣。因此,完整记录的排序无用。

My questions are:

我的问题是:

  1. Is it possible to specify the 'N' in query itself? ( so that sort applies to only N records and query becomes faster).
  2. 是否可以在查询中指定“N”? (这样sort只适用于N条记录,查询变得更快)。
  3. Any better way in SQL to improve the query to sort only N elements and return in quick time.
  4. 在SQL中有任何更好的方法来改进查询以仅排序N个元素并在快速时间内返回。

5 个解决方案

#1


19  

If your purpose is to find 100 random rows and sort them afterwards then Lasse's solution is correct. If as I think you want the first 100 rows sorted by name while discarding the others you would build a query like this:

如果您的目的是找到100个随机行并在之后对它们进行排序,那么Lasse的解决方案是正确的。如果我认为您希望按名称排序前100行而丢弃其他行,则可以构建如下查询:

SELECT * 
  FROM (SELECT * 
          FROM myTable 
         WHERE SIZE > 2000 ORDER BY NAME DESC) 
 WHERE ROWNUM <= 100

The optimizer will understand that it is a TOP-N query and will be able to use an index on NAME. It won't have to sort the entire result set, it will just start at the end of the index and read it backwards and stop after 100 rows.

优化器将理解它是一个TOP-N查询,并且能够在NAME上使用索引。它不必对整个结果集进行排序,它只会从索引的末尾开始并向后读取并在100行后停止。

You could also add an hint to your original query to let the optimizer understand that you are interested in the first rows only. This will probably generate a similar access path:

您还可以向原始查询添加提示,以使优化器了解您只对第一行感兴趣。这可能会生成类似的访问路径:

SELECT /*+ FIRST_ROWS*/* FROM myTable WHERE SIZE > 2000 ORDER BY NAME DESC

Edit: just adding AND rownum <= 100 to the query won't work since in Oracle rownum is attributed before sorting : this is why you have to use a subquery. Without the subquery Oracle will select 100 random rows then sort them.

编辑:只是在查询中添加AND rownum <= 100将无法正常工作,因为Oracle rownum在排序之前被归因:这就是您必须使用子查询的原因。如果没有子查询,Oracle将选择100个随机行,然后对它们进行排序。

#2


5  

This shows how to pick the top N rows depending on your version of Oracle.

这显示了如何根据您的Oracle版本选择前N行。

From Oracle 9i onwards, the RANK() and DENSE_RANK() functions can be used to determine the TOP N rows. Examples:

从Oracle 9i开始,RANK()和DENSE_RANK()函数可用于确定TOP N行。例子:

Get the top 10 employees based on their salary

根据工资获得前10名员工

SELECT ename, sal FROM ( SELECT ename, sal, RANK() OVER (ORDER BY sal DESC) sal_rank FROM emp ) WHERE sal_rank <= 10;

SELECT ename,sal FROM(SELECT ename,sal,RANK()OVER(ORDER BY sal DESC)sal_rank FROM emp)WHERE sal_rank <= 10;

Select the employees making the top 10 salaries

选择排名前10位的员工

SELECT ename, sal FROM ( SELECT ename, sal, DENSE_RANK() OVER (ORDER BY sal DESC) sal_dense_rank FROM emp ) WHERE sal_dense_rank <= 10;

SELECT ename,sal FROM(SELECT ename,sal,DENSE_RANK()OVER(ORDER BY sal DESC)sal_dense_rank FROM emp)WHERE sal_dense_rank <= 10;

The difference between the two is explained here

这里解释了两者之间的区别

#3


4  

Add this:

添加这个:

 AND rownum <= 100

to your WHERE-clause.

你的WHERE子句。

However, this won't do what you're asking.

但是,这不会做你所要求的。

If you want to pick 100 random rows, sort those, and then return them, you'll have to formulate a query without the ORDER BY first, then limit that to 100 rows, then select from that and sort.

如果要选择100个随机行,对它们进行排序,然后返回它们,则必须首先制定不带ORDER BY的查询,然后将其限制为100行,然后从中进行选择并排序。

This could work, but unfortunately I don't have an Oracle server available to test:

这可能有效,但遗憾的是我没有可用于测试的Oracle服务器:

SELECT *
FROM (
    SELECT *
    FROM myTable
    WHERE SIZE > 2000
      AND rownum <= 100
    ) x
ORDER BY NAME DESC

But note the "random" part there, you're saying "give me 100 rows with SIZE > 2000, I don't care which 100".

但请注意那里的“随机”部分,你说“给我100行SIZE> 2000,我不在乎哪100”。

Is that really what you want?

这真的是你想要的吗?

And no, you won't actually get a random result, in the sense that it'll change each time you query the server, but you are at the mercy of the query optimizer. If the data load and index statistics for that table changes over time, at some point you might get different data than you did on the previous query.

不,你实际上不会得到随机结果,因为每次查询服务器时它都会改变,但是你受到查询优化器的支配。如果该表的数据加载和索引统计信息随时间发生变化,则在某些时候您可能会获得与上一个查询不同的数据。

#4


0  

Your problem is that the sort is being done every time the query is run. You can eliminate the sort operation by using an index - the optimiser can use an index to eliminate a sort operation - if the sorted column is declared NOT NULL.

您的问题是每次运行查询时都要进行排序。您可以通过使用索引来消除排序操作 - 如果排序列声明为NOT NULL,则优化器可以使用索引来消除排序操作。

(If the column is nullable, it is still possible, by either (a) adding a NOT NULL predicate to the query, or (b) adding a function-based index and modifying the ORDER BY clause accordingly).

(如果列可以为空,则仍然可以通过(a)向查询添加NOT NULL谓词,或(b)添加基于函数的索引并相应地修改ORDER BY子句来实现。

#5


0  

Just for reference, in Oracle 12c, this task can be done using FETCH clause. You can see here for examples and additional reference links regarding this matter.

仅供参考,在Oracle 12c中,可以使用FETCH子句完成此任务。您可以在此处查看有关此事项的示例和其他参考链接。

#1


19  

If your purpose is to find 100 random rows and sort them afterwards then Lasse's solution is correct. If as I think you want the first 100 rows sorted by name while discarding the others you would build a query like this:

如果您的目的是找到100个随机行并在之后对它们进行排序,那么Lasse的解决方案是正确的。如果我认为您希望按名称排序前100行而丢弃其他行,则可以构建如下查询:

SELECT * 
  FROM (SELECT * 
          FROM myTable 
         WHERE SIZE > 2000 ORDER BY NAME DESC) 
 WHERE ROWNUM <= 100

The optimizer will understand that it is a TOP-N query and will be able to use an index on NAME. It won't have to sort the entire result set, it will just start at the end of the index and read it backwards and stop after 100 rows.

优化器将理解它是一个TOP-N查询,并且能够在NAME上使用索引。它不必对整个结果集进行排序,它只会从索引的末尾开始并向后读取并在100行后停止。

You could also add an hint to your original query to let the optimizer understand that you are interested in the first rows only. This will probably generate a similar access path:

您还可以向原始查询添加提示,以使优化器了解您只对第一行感兴趣。这可能会生成类似的访问路径:

SELECT /*+ FIRST_ROWS*/* FROM myTable WHERE SIZE > 2000 ORDER BY NAME DESC

Edit: just adding AND rownum <= 100 to the query won't work since in Oracle rownum is attributed before sorting : this is why you have to use a subquery. Without the subquery Oracle will select 100 random rows then sort them.

编辑:只是在查询中添加AND rownum <= 100将无法正常工作,因为Oracle rownum在排序之前被归因:这就是您必须使用子查询的原因。如果没有子查询,Oracle将选择100个随机行,然后对它们进行排序。

#2


5  

This shows how to pick the top N rows depending on your version of Oracle.

这显示了如何根据您的Oracle版本选择前N行。

From Oracle 9i onwards, the RANK() and DENSE_RANK() functions can be used to determine the TOP N rows. Examples:

从Oracle 9i开始,RANK()和DENSE_RANK()函数可用于确定TOP N行。例子:

Get the top 10 employees based on their salary

根据工资获得前10名员工

SELECT ename, sal FROM ( SELECT ename, sal, RANK() OVER (ORDER BY sal DESC) sal_rank FROM emp ) WHERE sal_rank <= 10;

SELECT ename,sal FROM(SELECT ename,sal,RANK()OVER(ORDER BY sal DESC)sal_rank FROM emp)WHERE sal_rank <= 10;

Select the employees making the top 10 salaries

选择排名前10位的员工

SELECT ename, sal FROM ( SELECT ename, sal, DENSE_RANK() OVER (ORDER BY sal DESC) sal_dense_rank FROM emp ) WHERE sal_dense_rank <= 10;

SELECT ename,sal FROM(SELECT ename,sal,DENSE_RANK()OVER(ORDER BY sal DESC)sal_dense_rank FROM emp)WHERE sal_dense_rank <= 10;

The difference between the two is explained here

这里解释了两者之间的区别

#3


4  

Add this:

添加这个:

 AND rownum <= 100

to your WHERE-clause.

你的WHERE子句。

However, this won't do what you're asking.

但是,这不会做你所要求的。

If you want to pick 100 random rows, sort those, and then return them, you'll have to formulate a query without the ORDER BY first, then limit that to 100 rows, then select from that and sort.

如果要选择100个随机行,对它们进行排序,然后返回它们,则必须首先制定不带ORDER BY的查询,然后将其限制为100行,然后从中进行选择并排序。

This could work, but unfortunately I don't have an Oracle server available to test:

这可能有效,但遗憾的是我没有可用于测试的Oracle服务器:

SELECT *
FROM (
    SELECT *
    FROM myTable
    WHERE SIZE > 2000
      AND rownum <= 100
    ) x
ORDER BY NAME DESC

But note the "random" part there, you're saying "give me 100 rows with SIZE > 2000, I don't care which 100".

但请注意那里的“随机”部分,你说“给我100行SIZE> 2000,我不在乎哪100”。

Is that really what you want?

这真的是你想要的吗?

And no, you won't actually get a random result, in the sense that it'll change each time you query the server, but you are at the mercy of the query optimizer. If the data load and index statistics for that table changes over time, at some point you might get different data than you did on the previous query.

不,你实际上不会得到随机结果,因为每次查询服务器时它都会改变,但是你受到查询优化器的支配。如果该表的数据加载和索引统计信息随时间发生变化,则在某些时候您可能会获得与上一个查询不同的数据。

#4


0  

Your problem is that the sort is being done every time the query is run. You can eliminate the sort operation by using an index - the optimiser can use an index to eliminate a sort operation - if the sorted column is declared NOT NULL.

您的问题是每次运行查询时都要进行排序。您可以通过使用索引来消除排序操作 - 如果排序列声明为NOT NULL,则优化器可以使用索引来消除排序操作。

(If the column is nullable, it is still possible, by either (a) adding a NOT NULL predicate to the query, or (b) adding a function-based index and modifying the ORDER BY clause accordingly).

(如果列可以为空,则仍然可以通过(a)向查询添加NOT NULL谓词,或(b)添加基于函数的索引并相应地修改ORDER BY子句来实现。

#5


0  

Just for reference, in Oracle 12c, this task can be done using FETCH clause. You can see here for examples and additional reference links regarding this matter.

仅供参考,在Oracle 12c中,可以使用FETCH子句完成此任务。您可以在此处查看有关此事项的示例和其他参考链接。