I'm executing lot's of batches, containing prepared insert
statements
我正在执行很多批次,包含准备好的插入语句
public static void main(String... args) throws Exception {
Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver");
BufferedReader csv = new BufferedReader(new InputStreamReader(Main.class.getClassLoader().getResourceAsStream("records.csv")));
String line;
createConnectionAndPreparedStatement();
while ((line = csv.readLine()) != null) {
tupleNum++;
count++;
List<String> row = new ArrayList<String>(Arrays.asList(line.split(";")));
tupleCache.add(row);
addBatch(row, ps);
if (count > BATCH_SIZE) {
count = 0;
executeBatch(ps);
tupleCache.clear();
}
}
}
protected static void createConnectionAndPreparedStatement() throws SQLException {
System.out.println("Opening new connection!");
con = DriverManager.getConnection(jdbcUrl, jdbcUser, jdbcPassword);
con.setAutoCommit(true);
con.setAutoCommit(false);
ps = con.prepareStatement(insertQuery);
count = 0;
}
private static void executeBatch(PreparedStatement ps) throws SQLException, IOException, InterruptedException {
try {
ps.executeBatch();
} catch (BatchUpdateException bue) {
if (bue.getMessage() != null && bue.getMessage().contains("Exceeded the memory limit")) {
// silently close the old connection to free resources
try {
con.close();
} catch (Exception ex) {}
createConnectionAndPreparedStatement();
for (List<String> t : tupleCache) {
addBatch(t, ps);
}
// let's retry once
ps.executeBatch();
}
}
System.out.println("Batch succeeded! -->" + tupleNum );
con.commit();
ps.clearWarnings();
ps.clearBatch();
ps.clearParameters();
}
private static void addBatch(List<String> tuple, PreparedStatement ps) throws SQLException {
int sqlPos = 1;
int size = tuple.size();
for (int i = 0; i < size; i++) {
String field = tuple.get(i);
//log.error(String.format("Setting value at pos [%s] to value [%s]", i, field));
if (field != null) {
ps.setString(sqlPos, field);
sqlPos++;
} else {
ps.setNull(sqlPos, java.sql.Types.VARCHAR);
sqlPos++;
}
}
ps.addBatch();
}
So in standalone application everything is fine and no exceptions occur after 700k batch insertions. But when I execute actually same code in custom pig StoreFunc
after about 6-7k batch insertions I get the following exception:
因此,在独立应用程序中,一切都很好,700k批量插入后不会发生异常。但是当我在大约6-7k批量插入后在自定义猪StoreFunc中执行实际相同的代码时,我得到以下异常:
java.sql.BatchUpdateException: 112007;Exceeded the memory limit of 20 MB per session for prepared statements. Reduce the number or size of the prepared statements.
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:1824)
And only restarting connection helps. Can someone help me with ideas why it's happening and how to fix it?
只有重新启动连接才有帮助。有人可以帮助我解决为什么会发生这种情况以及如何解决这个问题吗?
1 个解决方案
#1
2
According to your description & the error information, per my experience, I think the issue was caused by the configuration about memory at the server side of SQL Azure, such as memory limits for connections within the server resource pool.
根据您的描述和错误信息,根据我的经验,我认为问题是由SQL Azure服务器端的内存配置引起的,例如服务器资源池内连接的内存限制。
I tried to follow the clue to search for the specific explaination about connection memory limits, but failed, besides the content below from here.
我试图按照线索搜索关于连接内存限制的具体解释,但是失败了,除了下面的内容。
Connection Memory
SQL Server sets aside three packet buffers for every connection made from a client. Each buffer is sized according to the default network packet size specified by the sp_configure stored procedure. If the default network packet size is less than 8KB, the memory for these packets comes from SQL Server's buffer pool. If it's 8KB or larger, the memory is allocated from SQL Server's MemToLeave region.
SQL Server为客户端的每个连接留出三个数据包缓冲区。根据sp_configure存储过程指定的默认网络包大小调整每个缓冲区的大小。如果默认网络数据包大小小于8KB,则这些数据包的内存来自SQL Server的缓冲池。如果它是8KB或更大,则从SQL Server的MemToLeave区域分配内存。
And I continued to search for packet size
& MemToLeave
and view them.
我继续搜索数据包大小和MemToLeave并查看它们。
Based on the above information, I guess that "Exceeded the memory limit of 20 MB per session for prepared statements" means all memory used of parallel connections over the max memory buffer pool of SQL Azure instance.
基于以上信息,我猜“超出预准备语句每个会话20 MB的内存限制”意味着在SQL Azure实例的最大内存缓冲池上使用并行连接的所有内存。
So there are two solutions I suggested which you can try.
所以我建议您尝试两种解决方案。
- Recommended to reduce the value of
BATCH_SIZE
variable to make the server memory cost less than the max size of memory buffer pool. - Try to scale up your SQL Azure instance.
建议减少BATCH_SIZE变量的值,使服务器内存开销小于内存缓冲池的最大大小。
尝试扩展SQL Azure实例。
Hope it helps.
希望能帮助到你。
Here are two new suggestions.
这是两个新的建议。
- I'm really not sure that the MS jdbc driver whether supports the current scenario using Apache Pig to do this like a paralleled ETL job. Please try to use
jtds
jdbc driver instead of the MS one. - A better way I think is using more professional tools to do this, such as
sqoop
orkettle
.
我真的不确定MS jdbc驱动程序是否支持使用Apache Pig执行此操作的当前场景,就像并行ETL作业一样。请尝试使用jtds jdbc驱动程序而不是MS程序。
我认为更好的方法是使用更专业的工具来完成这项工作,例如sqoop或者水壶。
#1
2
According to your description & the error information, per my experience, I think the issue was caused by the configuration about memory at the server side of SQL Azure, such as memory limits for connections within the server resource pool.
根据您的描述和错误信息,根据我的经验,我认为问题是由SQL Azure服务器端的内存配置引起的,例如服务器资源池内连接的内存限制。
I tried to follow the clue to search for the specific explaination about connection memory limits, but failed, besides the content below from here.
我试图按照线索搜索关于连接内存限制的具体解释,但是失败了,除了下面的内容。
Connection Memory
SQL Server sets aside three packet buffers for every connection made from a client. Each buffer is sized according to the default network packet size specified by the sp_configure stored procedure. If the default network packet size is less than 8KB, the memory for these packets comes from SQL Server's buffer pool. If it's 8KB or larger, the memory is allocated from SQL Server's MemToLeave region.
SQL Server为客户端的每个连接留出三个数据包缓冲区。根据sp_configure存储过程指定的默认网络包大小调整每个缓冲区的大小。如果默认网络数据包大小小于8KB,则这些数据包的内存来自SQL Server的缓冲池。如果它是8KB或更大,则从SQL Server的MemToLeave区域分配内存。
And I continued to search for packet size
& MemToLeave
and view them.
我继续搜索数据包大小和MemToLeave并查看它们。
Based on the above information, I guess that "Exceeded the memory limit of 20 MB per session for prepared statements" means all memory used of parallel connections over the max memory buffer pool of SQL Azure instance.
基于以上信息,我猜“超出预准备语句每个会话20 MB的内存限制”意味着在SQL Azure实例的最大内存缓冲池上使用并行连接的所有内存。
So there are two solutions I suggested which you can try.
所以我建议您尝试两种解决方案。
- Recommended to reduce the value of
BATCH_SIZE
variable to make the server memory cost less than the max size of memory buffer pool. - Try to scale up your SQL Azure instance.
建议减少BATCH_SIZE变量的值,使服务器内存开销小于内存缓冲池的最大大小。
尝试扩展SQL Azure实例。
Hope it helps.
希望能帮助到你。
Here are two new suggestions.
这是两个新的建议。
- I'm really not sure that the MS jdbc driver whether supports the current scenario using Apache Pig to do this like a paralleled ETL job. Please try to use
jtds
jdbc driver instead of the MS one. - A better way I think is using more professional tools to do this, such as
sqoop
orkettle
.
我真的不确定MS jdbc驱动程序是否支持使用Apache Pig执行此操作的当前场景,就像并行ETL作业一样。请尝试使用jtds jdbc驱动程序而不是MS程序。
我认为更好的方法是使用更专业的工具来完成这项工作,例如sqoop或者水壶。