I have a code like this below,
我有下面这样的代码,
try (Connection connection = this.getDataSource().getConnection();
PreparedStatement statement = connection.prepareStatement(sqlQuery);) {
try {
statement.setFetchSize(10000); // Set fetch size
resultSet = statement.executeQuery();
while (true) {
resultSet.setFetchSize(10000);
boolean more = resultSet.next();
if (! more) {
break;
}
// populating an arraylist from the value from resultSet
}
}
catch (Exception e) {
LOGGER.error("Exception : "+e);
}
} catch (SQLException e) {
LOGGER.error("Exception : "+e);
}
My understanding is as follows,
我的理解是:
The statement fetch size is 10000. when statement.executeQuery() is executed, it returns the ResultSet cursor. It will have 10000 rows in memory. When resultSet.next is called, it gets one row from the memory buffer. (one row per call). When no more rows in memory, the query is fired again and 10000 rows are fetched from database again and stored in buffer. This continues until there is no rows to be fetched from DB
语句的大小是10000。执行statement.executeQuery()时,它返回ResultSet游标。内存中有10000行。当结果集。接下来调用,它从内存缓冲区获取一行。(每调用一行)。当内存中不再有行时,查询再次被触发,10000行从数据库中再次获取并存储在缓冲区中。这将继续,直到不再从DB获取行
So if my understanding is correct, how many actual DB calls will be there for a total rows of 210000? Is it 21 ? (210000 / 10000)
如果我的理解是正确的,那么总共210000行有多少实际的DB调用?这是21吗?(210000/10000)
Also when and who calls the DB (when rows in buffer are all read) to get more rows (10000, in my case) and store in buffer. Also when is the buffer cleared ?
同时,当调用DB(当读取缓冲区中的行时)并将更多的行(在我的例子中是10000行)存储到缓冲区中。何时清除缓冲区?
Please correct me if I am wrong in my understanding.
如果我的理解有误,请指正。
I need to work with millions of data in Oracle Database.
我需要处理Oracle数据库中的数百万数据。
Thanks for any pointers/info
感谢任何指针/信息
Regards,
问候,
SD
SD
2 个解决方案
#1
4
Sorry, but your understanding is wrong. There is no such thing as "query is fired again".
对不起,你的理解错了。不存在“查询再次触发”这样的事情。
The execution of the query is done once. This will take an initial amount of time to process the query, (which you can't do anything about other than optimizing your query,) and then it will begin producing rows on the server, which need to be transferred to the client. While the rows are being transferred, the server will probably be continuing to generate more rows to be transferred, and buffering them on the server. This server-side buffering is totally unrelated to the kind of buffering that we are talking about in this Q&A, and you have very little control over it. (Perhaps by means of server configuration, if at all.) At some point all rows will have been collected on the server, and then the only remaining thing to do will be to transfer the remaining rows from the server to the client.
执行查询一次。这将花费最初的时间来处理查询(除了优化查询之外,您无法做任何事情),然后它将开始在服务器上生成行,这些行需要转移到客户端。在传输行时,服务器可能会继续生成更多要传输的行,并在服务器上对它们进行缓冲。这种服务器端缓存与我们在本问答中讨论的那种缓存完全无关,您几乎无法控制它。(如果有的话,可能是通过服务器配置。)在某些时候,所有的行都将在服务器上收集,然后惟一要做的就是将其余的行从服务器转移到客户端。
So, as far as the client can tell, once it has sent the query to the server, there is a certain delay while the server is thinking about it, after which rows are becoming available at a rate which is usually as fast as the wire can carry them. So, the client starts reading these rows with resultSet.next()
.
因此,就客户端所知,一旦它将查询发送到服务器,在服务器正在考虑它的时候就会有一定的延迟,在此之后,行就会变得可用,其速度通常与线路承载它们的速度一样快。因此,客户端开始使用resultSet.next()读取这些行。
Without any buffering, each call to resultSet.next()
would send a request from the client to the server, telling it to send the next row, and the server would respond with just that row. That would yield the first row very quickly, but it would be very inefficient in the long run, because it would be causing too many round-trips between the client and the server.
没有任何缓冲,每个对resultSet.next()的调用都会从客户端向服务器发送一个请求,告诉它发送下一行,服务器就会用这一行进行响应。这将很快产生第一行,但是从长远来看,这会非常低效,因为这会导致客户机和服务器之间的往返次数过多。
With buffering, the first call to resultSet.next()
will request a bunch of rows from the server. This will impose a penalty on the time to receive the first row, because you are going to have to wait for 100 rows to be sent over the wire, but in the long run it will significantly reduce total network overhead, because there will be only one round-trip between the client and the server per bunch-of-rows.
使用缓冲,对resultSet.next()的第一个调用将从服务器请求一组行。这将施以惩罚的时间接收第一行,因为你将不得不等待发送100行线,但从长远来看它会大大降低网络开销,因为只有一个每bunch-of-rows客户机和服务器之间往返。
The ideal strategy for resultSet.setFetchSize()
is to leave it as it is and not worry too much about it.
setfetchsize()的理想策略是保持原样,不要太担心它。
But if you are paranoid about performance, then a good strategy would be to begin with a fairly small fetch size, (say 10,) so as to get your first row quickly, and then keep doubling it until it reaches a certain maximum (say 100,) beyond which there is really no improvement.
但是如果你担心性能,然后开始一个好的策略是一个相当小的取大小,(10),以快速得到你的第一行,然后继续翻,直到它达到一定最大(100),超过这个真的没有改善。
#2
3
The only people who can reply to your question are the authors of the Oracle JDBC driver.
只有Oracle JDBC驱动程序的作者才能回答您的问题。
That being said a call to db to read the next chunk of data won't take more then a few ms (or less), the bulk of the time will depend on the transfer rate, and possibly how you get data from the resultset.
也就是说,调用db来读取下一个数据块所需的时间不会超过几毫秒(或更少),大部分时间将取决于传输速率,以及从resultset中获取数据的方式。
I think that once you go above a few hundred record per call you are into diminishing return setting a bigger fetch size.
我认为,一旦你每次调用超过几百条记录你就进入了递减循环设置一个更大的取回大小。
About clearing the buffer, that's mostly garbage collection domain, once you loose reference to the resultset.
关于清除缓冲区,一旦对resultset失去了引用,这主要是垃圾收集域。
Just make sure your statement is FORWARD ONLY, both for performance reason and memory footprint.
只需确保您的语句仅为FORWARD,这是出于性能原因和内存占用。
connection.createStatement(ResultSet.TYPE_FORWARD_ONLY,ResultSet.CONCUR_READ_ONLY );
connection.createStatement(ResultSet.TYPE_FORWARD_ONLY结果集。CONCUR_READ_ONLY);
#1
4
Sorry, but your understanding is wrong. There is no such thing as "query is fired again".
对不起,你的理解错了。不存在“查询再次触发”这样的事情。
The execution of the query is done once. This will take an initial amount of time to process the query, (which you can't do anything about other than optimizing your query,) and then it will begin producing rows on the server, which need to be transferred to the client. While the rows are being transferred, the server will probably be continuing to generate more rows to be transferred, and buffering them on the server. This server-side buffering is totally unrelated to the kind of buffering that we are talking about in this Q&A, and you have very little control over it. (Perhaps by means of server configuration, if at all.) At some point all rows will have been collected on the server, and then the only remaining thing to do will be to transfer the remaining rows from the server to the client.
执行查询一次。这将花费最初的时间来处理查询(除了优化查询之外,您无法做任何事情),然后它将开始在服务器上生成行,这些行需要转移到客户端。在传输行时,服务器可能会继续生成更多要传输的行,并在服务器上对它们进行缓冲。这种服务器端缓存与我们在本问答中讨论的那种缓存完全无关,您几乎无法控制它。(如果有的话,可能是通过服务器配置。)在某些时候,所有的行都将在服务器上收集,然后惟一要做的就是将其余的行从服务器转移到客户端。
So, as far as the client can tell, once it has sent the query to the server, there is a certain delay while the server is thinking about it, after which rows are becoming available at a rate which is usually as fast as the wire can carry them. So, the client starts reading these rows with resultSet.next()
.
因此,就客户端所知,一旦它将查询发送到服务器,在服务器正在考虑它的时候就会有一定的延迟,在此之后,行就会变得可用,其速度通常与线路承载它们的速度一样快。因此,客户端开始使用resultSet.next()读取这些行。
Without any buffering, each call to resultSet.next()
would send a request from the client to the server, telling it to send the next row, and the server would respond with just that row. That would yield the first row very quickly, but it would be very inefficient in the long run, because it would be causing too many round-trips between the client and the server.
没有任何缓冲,每个对resultSet.next()的调用都会从客户端向服务器发送一个请求,告诉它发送下一行,服务器就会用这一行进行响应。这将很快产生第一行,但是从长远来看,这会非常低效,因为这会导致客户机和服务器之间的往返次数过多。
With buffering, the first call to resultSet.next()
will request a bunch of rows from the server. This will impose a penalty on the time to receive the first row, because you are going to have to wait for 100 rows to be sent over the wire, but in the long run it will significantly reduce total network overhead, because there will be only one round-trip between the client and the server per bunch-of-rows.
使用缓冲,对resultSet.next()的第一个调用将从服务器请求一组行。这将施以惩罚的时间接收第一行,因为你将不得不等待发送100行线,但从长远来看它会大大降低网络开销,因为只有一个每bunch-of-rows客户机和服务器之间往返。
The ideal strategy for resultSet.setFetchSize()
is to leave it as it is and not worry too much about it.
setfetchsize()的理想策略是保持原样,不要太担心它。
But if you are paranoid about performance, then a good strategy would be to begin with a fairly small fetch size, (say 10,) so as to get your first row quickly, and then keep doubling it until it reaches a certain maximum (say 100,) beyond which there is really no improvement.
但是如果你担心性能,然后开始一个好的策略是一个相当小的取大小,(10),以快速得到你的第一行,然后继续翻,直到它达到一定最大(100),超过这个真的没有改善。
#2
3
The only people who can reply to your question are the authors of the Oracle JDBC driver.
只有Oracle JDBC驱动程序的作者才能回答您的问题。
That being said a call to db to read the next chunk of data won't take more then a few ms (or less), the bulk of the time will depend on the transfer rate, and possibly how you get data from the resultset.
也就是说,调用db来读取下一个数据块所需的时间不会超过几毫秒(或更少),大部分时间将取决于传输速率,以及从resultset中获取数据的方式。
I think that once you go above a few hundred record per call you are into diminishing return setting a bigger fetch size.
我认为,一旦你每次调用超过几百条记录你就进入了递减循环设置一个更大的取回大小。
About clearing the buffer, that's mostly garbage collection domain, once you loose reference to the resultset.
关于清除缓冲区,一旦对resultset失去了引用,这主要是垃圾收集域。
Just make sure your statement is FORWARD ONLY, both for performance reason and memory footprint.
只需确保您的语句仅为FORWARD,这是出于性能原因和内存占用。
connection.createStatement(ResultSet.TYPE_FORWARD_ONLY,ResultSet.CONCUR_READ_ONLY );
connection.createStatement(ResultSet.TYPE_FORWARD_ONLY结果集。CONCUR_READ_ONLY);