Am trying to export 250mb of data(75 chararray columns) from hdfs to sqlserver. It failed with the below error,
我尝试从hdfs导出250mb的数据(75个chararray列)到sqlserver。它失败于下面的错误,
Caused by: java.io.IOException: com.microsoft.sqlserver.jdbc.SQLServerException: The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Too many parameters were provided in this RPC request. The maximum is 2100.
引起的:java。IOException:com.microsoft.sqlserver.jdbc。SQLServerException:传入的表格数据流(TDS)远程过程调用(RPC)协议流不正确。在这个RPC请求中提供了太多的参数。最大值是2100。
Then i passed "-D sqoop.export.records.per.statement=10"
this statement along with sqoop export it worked but it is very slow. It took 15 minutes to load 250mb of data.
然后我通过了"-D sqop .export. recorders .per "语句=10“此语句与sqoop导出一起工作,但非常缓慢。加载250mb的数据需要15分钟。
Is there anyway we can improve the performence.
我们能不能改进性能?
Below is the actual sqoop cmd:
下面是实际的sqoop cmd:
sqoop export -D sqoop.export.records.per.statement=10 --connect 'jdbc:sqlserver://199.198.165.191:1433;username=;password=;database=database' --table Facttable --columns DimDateID,DimQHourID,ETLMergedFileQHourlyNortelID,DimSWVersionID,DimFreqCellRelationID,OSSC_RC,SubNetwork1,SubNetwork2,MeContext,ENodeBFunction,EUtranCellFDD,EUtranFreqRelation,EUtranCellRelation,Time,GmtOffset,ffv,sn,st,vn,cbt,ts,neun,nedn,nesw,mts,gp,sf,pmHoExeAttLteInterF,pmHoExeAttLteIntraF,pmHoExeSuccLteInterF,pmHoExeSuccLteIntraF,pmHoPrepAttLteInterF,pmHoPrepAttLteIntraF,pmHoPrepSuccLteInterF,pmHoPrepSuccLteIntraF,Count_Null,Count_Negative,Count_Threshold,pmHoExeAttLteInterFLb,pmHoExeSuccLteInterFLb,pmHoOscInterF,pmHoOscIntraF,pmHoPrepAttLteInterFLb,pmHoPrepSuccLteInterFLb,pmHoPrepTNotAllowedLteInterF,pmHoPrepTNotAllowedLteIntraF,pmHoTooEarlyHoInterF,pmHoTooEarlyHoIntraF,pmHoTooLateHoInterF,pmHoTooLateHoIntraF,pmHoWrongCellInterF,pmHoWrongCellIntraF,pmHoWrongCellReestInterF,pmHoWrongCellReestIntraF,pmLbQualifiedUe,pmZtemporary36,pmHoExeAttLteIntraFTuneOut,pmHoExeSuccLteIntraFTuneOut --export-dir /Fact_Peg --direct -m 8 --input-fields-terminated-by "," --input-lines-terminated-by "\n";
sqoop出口- d sqoop.export.records.per。语句=10——连接'jdbc:sqlserver:// 199.195.191:1433;用户名=;密码=DimDateID,DimQHourID、ETLMergedFileQHourlyNortelID DimSWVersionID、DimFreqCellRelationID OSSC_RC,SubNetwork1,SubNetwork2,MeContext,ENodeBFunction,EUtranCellFDD,EUtranFreqRelation,EUtranCellRelation,时间,GmtOffset,ffv,sn,st,vn,cbt,ts,neun,nedn,nesw,mts,gp,科幻,pmHoExeAttLteInterF,pmHoExeAttLteIntraF,pmHoExeSuccLteInterF,pmHoExeSuccLteIntraF,pmHoPrepAttLteInterF,pmHoPrepAttLteIntraF,pmHoPrepSuccLteInterF,pmHoPrepSuccLteIntraF,Count_Null,Count_Negative,Count_Threshold,pmHoExeAttLteInterFLb pmHoExeSuccL,,,,,,,,,,,,,,,,,,,,,,,“\ n”;
"
”
2 个解决方案
#1
1
The bulk inserts is the fastest way. Currently SQOOP and the default drivers that come with it for SQL server do not support bulk inserts. You may want to try a third party JDBC5 drivers from DataDirect.
批量插入是最快的方法。目前SQOOP和SQL server附带的默认驱动程序不支持批量插入。您可能需要尝试来自DataDirect的第三方JDBC5驱动程序。
https://www.progress.co.uk/sitecore/content/Progress%20Root/Home/support-and-services/evaluation-support/support-matrices/jdbc-xe
#2
0
Looking at your sqoop command, you are specifying 8 mappers. First, 8 is probably too many for your DB to handle concurrently. Second, there is no split-by specification for those 8 mappers to divide the export work equally. I would remove the -m 8 parameter and run again. It is only 250mb, depending on your cluster it shouldn't take very long at all.
查看sqoop命令,您指定了8个映射器。首先,对于您的DB来说,8可能太多了,无法并发处理。其次,对于这8个映射器来说,不存在将导出工作平均分配的分片规范。我将删除- m8参数并再次运行。它只有250mb,这取决于您的集群,它不会花很长时间。
#1
1
The bulk inserts is the fastest way. Currently SQOOP and the default drivers that come with it for SQL server do not support bulk inserts. You may want to try a third party JDBC5 drivers from DataDirect.
批量插入是最快的方法。目前SQOOP和SQL server附带的默认驱动程序不支持批量插入。您可能需要尝试来自DataDirect的第三方JDBC5驱动程序。
https://www.progress.co.uk/sitecore/content/Progress%20Root/Home/support-and-services/evaluation-support/support-matrices/jdbc-xe
#2
0
Looking at your sqoop command, you are specifying 8 mappers. First, 8 is probably too many for your DB to handle concurrently. Second, there is no split-by specification for those 8 mappers to divide the export work equally. I would remove the -m 8 parameter and run again. It is only 250mb, depending on your cluster it shouldn't take very long at all.
查看sqoop命令,您指定了8个映射器。首先,对于您的DB来说,8可能太多了,无法并发处理。其次,对于这8个映射器来说,不存在将导出工作平均分配的分片规范。我将删除- m8参数并再次运行。它只有250mb,这取决于您的集群,它不会花很长时间。