In pentaho , when I run a cassandra input step that get around 50,000 rows , I get this exception :
在pentaho中,当我运行一个cassandra输入步骤,该步骤大约有50000行时,我得到了这个异常:
Is there a way to control the query result size in pentaho ? or is there a way to stream the query result and not get it all in bulk?
在pentaho中是否存在控制查询结果大小的方法?或者,是否有一种方法可以对查询结果进行流处理而不批量获取?
2014/10/09 15:14:09 - Cassandra Input.0 - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Unexpected error
2014/10/09 15:14:09 - Cassandra Input.0 - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : org.pentaho.di.core.exception.KettleException:
2014/10/09 15:14:09 - Cassandra Input.0 - Frame size (17727647) larger than max length (16384000)!
2014/10/09 15:14:09 - Cassandra Input.0 - Frame size (17727647) larger than max length (16384000)!
2014/10/09 15:14:09 - Cassandra Input.0 -
2014/10/09 15:14:09 - Cassandra Input.0 - at org.pentaho.di.trans.steps.cassandrainput.CassandraInput.initQuery(CassandraInput.java:355)
2014/10/09 15:14:09 - Cassandra Input.0 - at org.pentaho.di.trans.steps.cassandrainput.CassandraInput.processRow(CassandraInput.java:234)
2014/10/09 15:14:09 - Cassandra Input.0 - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
2014/10/09 15:14:09 - Cassandra Input.0 - at java.lang.Thread.run(Unknown Source)
2014/10/09 15:14:09 - Cassandra Input.0 - Caused by: org.apache.thrift.transport.TTransportException: Frame size (17727647) larger than max length (16384000)!
2014/10/09 15:14:09 - Cassandra Input.0 - at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:137)
2014/10/09 15:14:09 - Cassandra Input.0 - at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
2014/10/09 15:14:09 - Cassandra Input.0 - at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
2014/10/09 15:14:09 - Cassandra Input.0 - at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:362)
2014/10/09 15:14:09 - Cassandra Input.0 - at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:284)
2014/10/09 15:14:09 - Cassandra Input.0 - at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:191)
2014/10/09 15:14:09 - Cassandra Input.0 - at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
2014/10/09 15:14:09 - Cassandra Input.0 - at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_cql_query(Cassandra.java:1656)
2014/10/09 15:14:09 - Cassandra Input.0 - at org.apache.cassandra.thrift.Cassandra$Client.execute_cql_query(Cassandra.java:1642)
2014/10/09 15:14:09 - Cassandra Input.0 - at org.pentaho.cassandra.legacy.LegacyCQLRowHandler.newRowQuery(LegacyCQLRowHandler.java:289)
2014/10/09 15:14:09 - Cassandra Input.0 - at org.pentaho.di.trans.steps.cassandrainput.CassandraInput.initQuery(CassandraInput.java:333)
2014/10/09 15:14:09 - Cassandra Input.0 - ... 3 more
2014/10/09 15:14:09 - Cassandra Input.0 - Finished processing (I=0, O=0, R=0, W=0, U=0, E=1)
2014/10/09 15:14:09 - all customer data - Transformation detected one or more steps with errors.
2014/10/09 15:14:09 - all customer data - Transformation is killing the other steps!
4 个解决方案
#1
2
org.apache.thrift.transport.TTransportException:
Frame size (17727647) larger than max length (16384000)!
A limit is enforced for how large frames (thrift messages) can be to avoid performance degradation. You can tweak this by modifying some settings. The important thing to note here is that you need to set the settings bot client size and server side.
为避免性能下降,对大帧(节俭消息)的大小进行了限制。您可以通过修改一些设置来调整它。这里需要注意的重要一点是,您需要设置设置bot客户端大小和服务器端。
Server side in cassandra.yaml
服务器端在cassandra.yaml
# Frame size for thrift (maximum field length).
# default is 15mb, you'll have to increase this to at-least 18.
thrift_framed_transport_size_in_mb: 18
# The max length of a thrift message, including all fields and
# internal thrift overhead.
# default is 16, try to keep it to thrift_framed_transport_size_in_mb + 1
thrift_max_message_length_in_mb: 19
Setting the client side limit depends on what driver you're using.
设置客户端限制取决于您使用的驱动程序。
#2
0
I resolved these problem by using PDI 5.2 which has the property in Cassandra Input step called as max_length setting this property to higher value like 1GB solves these problem.
我通过使用PDI 5.2解决了这些问题,其中有一个名为max_length的Cassandra输入步骤的属性,将此属性设置为更高的值,比如1GB,可以解决这些问题。
#3
0
You can try the following method on the server side:
您可以在服务器端尝试以下方法:
TNonblockingServerSocket tnbSocketTransport = new TNonblockingServerSocket(listenPort);
TNonblockingServer.Args tnbArgs = new TNonblockingServer.Args(tnbSocketTransport);
// maxLength is configured to 1GB,while the default size is 16MB
/ /最大长度配置为1 gb,而默认大小是16 mb
tnbArgs.transportFactory(new TFramedTransport.Factory(1024 * 1024 * 1024));
tnbArgs.protocolFactory(new TCompactProtocol.Factory());
TProcessor processor = new UcsInterfaceThrift.Processor<UcsInterfaceHandler>(ucsInterfaceHandler);
tnbArgs.processor(processor);
TServer server = new TNonblockingServer(tnbArgs);
server.serve();
#4
0
Well it did work for me..
对我来说确实有用。
Cassandra Version: [cqlsh 5.0.1 | Cassandra 2.2.1 | CQL spec 3.3.0 | Native protocol v4]
Cassandra版本:[cqlsh 5.0.1 | Cassandra 2.2.1 | CQL spec 3.3.0 |原生协议v4]
Pentaho PDI Version: pdi-ce-5.4.0.1-130
Pentaho PDI版本:PDI - ce 5.4.0.1 - 130
Changed Settings in cassandra.yaml:
在cassandra.yaml改变设置:
# Whether to start the thrift rpc server.
start_rpc: true
# Frame size for thrift (maximum message length).
thrift_framed_transport_size_in_mb: 35
Cassandra Output Step Settings Changed to:
Cassandra输出步骤设置更改为:
Port: 9160
"Use CQL Version 3": checked
#1
2
org.apache.thrift.transport.TTransportException:
Frame size (17727647) larger than max length (16384000)!
A limit is enforced for how large frames (thrift messages) can be to avoid performance degradation. You can tweak this by modifying some settings. The important thing to note here is that you need to set the settings bot client size and server side.
为避免性能下降,对大帧(节俭消息)的大小进行了限制。您可以通过修改一些设置来调整它。这里需要注意的重要一点是,您需要设置设置bot客户端大小和服务器端。
Server side in cassandra.yaml
服务器端在cassandra.yaml
# Frame size for thrift (maximum field length).
# default is 15mb, you'll have to increase this to at-least 18.
thrift_framed_transport_size_in_mb: 18
# The max length of a thrift message, including all fields and
# internal thrift overhead.
# default is 16, try to keep it to thrift_framed_transport_size_in_mb + 1
thrift_max_message_length_in_mb: 19
Setting the client side limit depends on what driver you're using.
设置客户端限制取决于您使用的驱动程序。
#2
0
I resolved these problem by using PDI 5.2 which has the property in Cassandra Input step called as max_length setting this property to higher value like 1GB solves these problem.
我通过使用PDI 5.2解决了这些问题,其中有一个名为max_length的Cassandra输入步骤的属性,将此属性设置为更高的值,比如1GB,可以解决这些问题。
#3
0
You can try the following method on the server side:
您可以在服务器端尝试以下方法:
TNonblockingServerSocket tnbSocketTransport = new TNonblockingServerSocket(listenPort);
TNonblockingServer.Args tnbArgs = new TNonblockingServer.Args(tnbSocketTransport);
// maxLength is configured to 1GB,while the default size is 16MB
/ /最大长度配置为1 gb,而默认大小是16 mb
tnbArgs.transportFactory(new TFramedTransport.Factory(1024 * 1024 * 1024));
tnbArgs.protocolFactory(new TCompactProtocol.Factory());
TProcessor processor = new UcsInterfaceThrift.Processor<UcsInterfaceHandler>(ucsInterfaceHandler);
tnbArgs.processor(processor);
TServer server = new TNonblockingServer(tnbArgs);
server.serve();
#4
0
Well it did work for me..
对我来说确实有用。
Cassandra Version: [cqlsh 5.0.1 | Cassandra 2.2.1 | CQL spec 3.3.0 | Native protocol v4]
Cassandra版本:[cqlsh 5.0.1 | Cassandra 2.2.1 | CQL spec 3.3.0 |原生协议v4]
Pentaho PDI Version: pdi-ce-5.4.0.1-130
Pentaho PDI版本:PDI - ce 5.4.0.1 - 130
Changed Settings in cassandra.yaml:
在cassandra.yaml改变设置:
# Whether to start the thrift rpc server.
start_rpc: true
# Frame size for thrift (maximum message length).
thrift_framed_transport_size_in_mb: 35
Cassandra Output Step Settings Changed to:
Cassandra输出步骤设置更改为:
Port: 9160
"Use CQL Version 3": checked