最近用Python做网页的抓取,因为想得到JS解释后的HTML,先后尝试了selenium,windmill,htmlunit等web测试框架,因为只要得到html不需要界面展现,最后选择了htmlunit,而htmlunit只有Java的实现,所以考虑用RPC来进行python与JAVA的连接
最开始试用了一下ICE,JAVA端无问题,在用python做client的时候,发现ICE现在还不支持python2.7,放弃,再来看看thrift
下载地址http://thrift.apache.org/download/
先编写一个IDL接口定义
demo.thrift
[plain] view plain copy
- namespace java service.demo
- service Hello {
- string helloString(1:string word)
- }
再生成JAVA文件与python文件
[plain] view plain copy
- thrift --gen java demo.thrift
- thrift --gen py demo.thrift
接下来编译thrift的JAVA代码,解压
thrift-0.9.0.tar.gz
,在thrift-0.9.0\lib\java目录下用ant编译
编写JAVAServer
接口方法实现
[java] view plain copy
- package service.demo;
-
- import org.apache.thrift.TException;
-
- import service.demo.Hello.Iface;
-
- public class HelloImpl implements Iface {
-
- @Override
- public String helloString(String word) throws TException {
- System.out.println("get " + word);
- return "hello " + word;
- }
-
- }
Server实现
[java] view plain copy
- package service.demo;
-
- import org.apache.thrift.protocol.TBinaryProtocol;
- import org.apache.thrift.protocol.TBinaryProtocol.Factory;
- import org.apache.thrift.server.TServer;
- import org.apache.thrift.server.TThreadPoolServer;
- import org.apache.thrift.server.TThreadPoolServer.Args;
- import org.apache.thrift.transport.TServerSocket;
- import org.apache.thrift.transport.TTransportException;
-
- import service.demo.Hello.Processor;
-
- public class Server {
-
- public void startServer() {
- try {
- TServerSocket serverTransport = new TServerSocket(1234);
- Hello.Processor process = new Processor(new HelloImpl());
- Factory portFactory = new TBinaryProtocol.Factory(true, true);
- Args args = new Args(serverTransport);
- args.processor(process);
- args.protocolFactory(portFactory);
- TServer server = new TThreadPoolServer(args);
- server.serve();
- } catch (TTransportException e) {
- e.printStackTrace();
- }
- }
-
- public static void main(String[] args) {
- Server server = new Server();
- server.startServer();
- }
- }
JAVAClient实现
[java] view plain copy
- package service.demo;
-
- import org.apache.thrift.TException;
- import org.apache.thrift.protocol.TBinaryProtocol;
- import org.apache.thrift.protocol.TProtocol;
- import org.apache.thrift.transport.TSocket;
- import org.apache.thrift.transport.TTransport;
- import org.apache.thrift.transport.TTransportException;
-
- public class Client {
-
- public void startClient() {
- TTransport transport;
- try {
- transport = new TSocket("localhost", 1234);
- TProtocol protocol = new TBinaryProtocol(transport);
- Hello.Client client = new Hello.Client(protocol);
- transport.open();
- System.out.println(client.helloString("panguso"));
- transport.close();
- } catch (TTransportException e) {
- e.printStackTrace();
- } catch (TException e) {
- e.printStackTrace();
- }
- }
-
- public static void main(String[] args) {
- Client client = new Client();
- client.startClient();
- }
- }
编写pythonClient
首先要安装一下thrift的python支持,在thrift-0.9.0\lib\py下执行python setup.py install,此处要注意的是如果在eclise下编写代码要在pvdev->interpreter-python->system pythonpath下加入C:\Python27\Lib\site-packages\thrift-0.9.0-py2.7.egg
pythonclient实现
[python] view plain copy
- from WebGetIce import Hello
- from thrift.protocol import TBinaryProtocol
- from thrift.transport import TSocket
-
-
- transport = TSocket.TSocket("localhost", 1234)
- transport.open()
- protocol = TBinaryProtocol.TBinaryProtocol(transport)
-
-
- client = Hello.Client(protocol)
- print client.helloString("python")
-