如何在Python ctypes中处理c++返回类型std::vector ?

时间:2023-01-22 16:38:15

I cannot find how ctypes will bridge the gap between std::vector and Python; no where on the internet is the combination mentioned. Is this bad practice, does it not exist or am I missing something?

我找不到ctype如何弥合std::vector和Python之间的差距;互联网上没有提到的组合。这是一种坏习惯吗?它不存在吗?还是我漏掉了什么?

C++ : xxx.cpp

c++:xxx.cpp

#include <fstream>
#include <string>
using namespace std;
extern "C" std::vector<int> foo(const char* FILE_NAME)
{
    string line;
    std::vector<int> result;
    ifstream myfile(FILE_NAME);
    while (getline(myfile, line)) {
      result.push_back(1);
    }

    return(result);
}

Python: xxx.py

Python:xxx.py

import ctypes
xxx = ctypes.CDLL("./libxxx.so")
xxx.foo.argtypes = ??????
xxx.foo.restype = ??????

3 个解决方案

#1


6  

The particular reason is that speed is important. I'm creating an application that should be able to handle big data. On 200,000 rows the missings have to be counted on 300 values (200k by 300 matrix). I believe, but correct me if I'm wrong, that C++ will be significantly faster.

特别的原因是速度很重要。我正在创建一个应用程序,它应该能够处理大数据。在200000行上,错误必须根据300个值(200k×300矩阵)计算。我相信,但如果我错了,请纠正我,c++会快得多。

Well, if you're reading from a large file, your process is going to be mostly IO-bound, so the timings between Python and C probably won't be significantly different.

如果您正在从一个大文件中读取数据,那么您的进程将主要受io限制,因此Python和C之间的时间间隔可能不会有太大的差异。

The following code...

下面的代码…

result = []
for line in open('test.txt'):
    result.append(line.count('NA'))

...seems to run just as fast as anything I can hack together in C, although it's using some optimized algorithm I'm not really familiar with.

…在C语言中,似乎运行的速度和我能破解的任何东西一样快,尽管它使用的是一些我不太熟悉的优化算法。

It takes less than a second to process 200,000 lines, although I'd be interested to see if you can write a C function which is significantly faster.

处理200,000行代码只需要不到一秒钟的时间,尽管我很想看看你能否写出一个C函数,它的速度要快得多。


Update

更新

If you want to do it in C, and end up with a Python list, it's probably more efficient to use the Python/C API to build the list yourself, rather than building a std::vector then converting to a Python list later on.

如果您想在C语言中完成,并最终得到一个Python列表,那么使用Python/C API自己构建列表可能比构建std::vector更有效,然后稍后再转换为Python列表。

An example which just returns a list of integers from 0 to 99...

一个只返回从0到99的整数列表的例子……

// hack.c

#include <python2.7/Python.h>

PyObject* foo(const char* filename)
{
    PyObject* result = PyList_New(0);
    int i;

    for (i = 0; i < 100; ++i)
    {
        PyList_Append(result, PyInt_FromLong(i));
    }

    return result;
}

Compiled with...

编译与…

$ gcc -c hack.c -fPIC
$ ld -o hack.so -shared hack.o -lpython2.7

Example of usage...

使用的例子…

>>> from ctypes import *
>>> dll = CDLL('./hack.so')
>>> dll.foo.restype = py_object
>>> dll.foo('foo')
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...]

#2


12  

Whether or not this approach actually provides faster execution time, I'll explain a bit about how you could go about doing it. Basically, create a pointer to a C++ vector which can interface with Python through C functions. You can then wrap the C++ code in a Python class, hiding the implementation details of ctypes.

无论这种方法是否真的提供了更快的执行时间,我将解释一些关于如何执行的内容。基本上,创建一个指向c++向量的指针,它可以通过C函数与Python进行交互。然后可以将c++代码封装到Python类中,隐藏ctype的实现细节。

I included what I thought would be helpful magic methods to include in the Python class. You can choose to remove them or add more to suit your needs. The destructor is important to keep though.

我在Python类中包含了我认为有用的神奇方法。您可以选择删除它们或添加更多以满足您的需要。析构函数很重要。

C++

// vector_python.cpp
#include <vector>
#include <iostream>
#include <fstream>
#include <string>

using namespace std;

extern "C" void foo(vector<int>* v, const char* FILE_NAME){
    string line;
    ifstream myfile(FILE_NAME);
    while (getline(myfile, line)) v->push_back(1);
}

extern "C" {
    vector<int>* new_vector(){
        return new vector<int>;
    }
    void delete_vector(vector<int>* v){
        cout << "destructor called in C++ for " << v << endl;
        delete v;
    }
    int vector_size(vector<int>* v){
        return v->size();
    }
    int vector_get(vector<int>* v, int i){
        return v->at(i);
    }
    void vector_push_back(vector<int>* v, int i){
        v->push_back(i);
    }
}

Compile it as a shared library. On Mac OS X this might look like,

将其编译为共享库。在Mac OS X上,

g++ -c -fPIC vector_python.cpp -o vector_python.o
g++ -shared -Wl,-install_name,vector_python_lib.so -o vector_python_lib.so vector_python.o

Python

from ctypes import *

class Vector(object):
    lib = cdll.LoadLibrary('vector_python_lib.so') # class level loading lib
    lib.new_vector.restype = c_void_p
    lib.new_vector.argtypes = []
    lib.delete_vector.restype = None
    lib.delete_vector.argtypes = [c_void_p]
    lib.vector_size.restype = c_int
    lib.vector_size.argtypes = [c_void_p]
    lib.vector_get.restype = c_int
    lib.vector_get.argtypes = [c_void_p, c_int]
    lib.vector_push_back.restype = None
    lib.vector_push_back.argtypes = [c_void_p, c_int]
    lib.foo.restype = None
    lib.foo.argtypes = [c_void_p]

    def __init__(self):
        self.vector = Vector.lib.new_vector()  # pointer to new vector

    def __del__(self):  # when reference count hits 0 in Python,
        Vector.lib.delete_vector(self.vector)  # call C++ vector destructor

    def __len__(self):
        return Vector.lib.vector_size(self.vector)

    def __getitem__(self, i):  # access elements in vector at index
        if 0 <= i < len(self):
            return Vector.lib.vector_get(self.vector, c_int(i))
        raise IndexError('Vector index out of range')

    def __repr__(self):
        return '[{}]'.format(', '.join(str(self[i]) for i in range(len(self))))

    def push(self, i):  # push calls vector's push_back
        Vector.lib.vector_push_back(self.vector, c_int(i))

    def foo(self, filename):  # foo in Python calls foo in C++
        Vector.lib.foo(self.vector, c_char_p(filename))

You can then test it out in the interpreter (file.txt just consists of three lines of jibberish).

然后您可以在解释器(文件)中测试它。txt由三行jibberish组成。

>>> from vector import Vector
>>> a = Vector()
>>> a.push(22)
>>> a.push(88)
>>> a
[22, 88]
>>> a[1]
88
>>> a[2]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "vector.py", line 30, in __getitem__
    raise IndexError('Vector index out of range')
IndexError: Vector index out of range
>>> a.foo('file.txt')
>>> a
[22, 88, 1, 1, 1]
>>> b = Vector()
>>> ^D
destructor called in C++ for 0x1003884d0
destructor called in C++ for 0x10039df10

#3


3  

Basically, returning a C++ object from a dynamically loaded library is not a good idea. To use the C++ vector in Python code, you must teach Python to deal with C++ objects (and this includes binary representation of the objects which can change with new version of a C++ compiler or STL).

基本上,从动态加载的库返回c++对象不是一个好主意。要在Python代码中使用c++向量,您必须教Python处理c++对象(这包括用新版本的c++编译器或STL更改的对象的二进制表示)。

ctypes allows you to interact with a library using C types. Not C++.

ctypes允许您使用C类型与库进行交互。而不是c++。

Maybe the problem is solvable via boost::python, but it looks more reliable to use plain C for the interaction.

也许问题可以通过boost::python来解决,但是使用普通C进行交互看起来更可靠。

#1


6  

The particular reason is that speed is important. I'm creating an application that should be able to handle big data. On 200,000 rows the missings have to be counted on 300 values (200k by 300 matrix). I believe, but correct me if I'm wrong, that C++ will be significantly faster.

特别的原因是速度很重要。我正在创建一个应用程序,它应该能够处理大数据。在200000行上,错误必须根据300个值(200k×300矩阵)计算。我相信,但如果我错了,请纠正我,c++会快得多。

Well, if you're reading from a large file, your process is going to be mostly IO-bound, so the timings between Python and C probably won't be significantly different.

如果您正在从一个大文件中读取数据,那么您的进程将主要受io限制,因此Python和C之间的时间间隔可能不会有太大的差异。

The following code...

下面的代码…

result = []
for line in open('test.txt'):
    result.append(line.count('NA'))

...seems to run just as fast as anything I can hack together in C, although it's using some optimized algorithm I'm not really familiar with.

…在C语言中,似乎运行的速度和我能破解的任何东西一样快,尽管它使用的是一些我不太熟悉的优化算法。

It takes less than a second to process 200,000 lines, although I'd be interested to see if you can write a C function which is significantly faster.

处理200,000行代码只需要不到一秒钟的时间,尽管我很想看看你能否写出一个C函数,它的速度要快得多。


Update

更新

If you want to do it in C, and end up with a Python list, it's probably more efficient to use the Python/C API to build the list yourself, rather than building a std::vector then converting to a Python list later on.

如果您想在C语言中完成,并最终得到一个Python列表,那么使用Python/C API自己构建列表可能比构建std::vector更有效,然后稍后再转换为Python列表。

An example which just returns a list of integers from 0 to 99...

一个只返回从0到99的整数列表的例子……

// hack.c

#include <python2.7/Python.h>

PyObject* foo(const char* filename)
{
    PyObject* result = PyList_New(0);
    int i;

    for (i = 0; i < 100; ++i)
    {
        PyList_Append(result, PyInt_FromLong(i));
    }

    return result;
}

Compiled with...

编译与…

$ gcc -c hack.c -fPIC
$ ld -o hack.so -shared hack.o -lpython2.7

Example of usage...

使用的例子…

>>> from ctypes import *
>>> dll = CDLL('./hack.so')
>>> dll.foo.restype = py_object
>>> dll.foo('foo')
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...]

#2


12  

Whether or not this approach actually provides faster execution time, I'll explain a bit about how you could go about doing it. Basically, create a pointer to a C++ vector which can interface with Python through C functions. You can then wrap the C++ code in a Python class, hiding the implementation details of ctypes.

无论这种方法是否真的提供了更快的执行时间,我将解释一些关于如何执行的内容。基本上,创建一个指向c++向量的指针,它可以通过C函数与Python进行交互。然后可以将c++代码封装到Python类中,隐藏ctype的实现细节。

I included what I thought would be helpful magic methods to include in the Python class. You can choose to remove them or add more to suit your needs. The destructor is important to keep though.

我在Python类中包含了我认为有用的神奇方法。您可以选择删除它们或添加更多以满足您的需要。析构函数很重要。

C++

// vector_python.cpp
#include <vector>
#include <iostream>
#include <fstream>
#include <string>

using namespace std;

extern "C" void foo(vector<int>* v, const char* FILE_NAME){
    string line;
    ifstream myfile(FILE_NAME);
    while (getline(myfile, line)) v->push_back(1);
}

extern "C" {
    vector<int>* new_vector(){
        return new vector<int>;
    }
    void delete_vector(vector<int>* v){
        cout << "destructor called in C++ for " << v << endl;
        delete v;
    }
    int vector_size(vector<int>* v){
        return v->size();
    }
    int vector_get(vector<int>* v, int i){
        return v->at(i);
    }
    void vector_push_back(vector<int>* v, int i){
        v->push_back(i);
    }
}

Compile it as a shared library. On Mac OS X this might look like,

将其编译为共享库。在Mac OS X上,

g++ -c -fPIC vector_python.cpp -o vector_python.o
g++ -shared -Wl,-install_name,vector_python_lib.so -o vector_python_lib.so vector_python.o

Python

from ctypes import *

class Vector(object):
    lib = cdll.LoadLibrary('vector_python_lib.so') # class level loading lib
    lib.new_vector.restype = c_void_p
    lib.new_vector.argtypes = []
    lib.delete_vector.restype = None
    lib.delete_vector.argtypes = [c_void_p]
    lib.vector_size.restype = c_int
    lib.vector_size.argtypes = [c_void_p]
    lib.vector_get.restype = c_int
    lib.vector_get.argtypes = [c_void_p, c_int]
    lib.vector_push_back.restype = None
    lib.vector_push_back.argtypes = [c_void_p, c_int]
    lib.foo.restype = None
    lib.foo.argtypes = [c_void_p]

    def __init__(self):
        self.vector = Vector.lib.new_vector()  # pointer to new vector

    def __del__(self):  # when reference count hits 0 in Python,
        Vector.lib.delete_vector(self.vector)  # call C++ vector destructor

    def __len__(self):
        return Vector.lib.vector_size(self.vector)

    def __getitem__(self, i):  # access elements in vector at index
        if 0 <= i < len(self):
            return Vector.lib.vector_get(self.vector, c_int(i))
        raise IndexError('Vector index out of range')

    def __repr__(self):
        return '[{}]'.format(', '.join(str(self[i]) for i in range(len(self))))

    def push(self, i):  # push calls vector's push_back
        Vector.lib.vector_push_back(self.vector, c_int(i))

    def foo(self, filename):  # foo in Python calls foo in C++
        Vector.lib.foo(self.vector, c_char_p(filename))

You can then test it out in the interpreter (file.txt just consists of three lines of jibberish).

然后您可以在解释器(文件)中测试它。txt由三行jibberish组成。

>>> from vector import Vector
>>> a = Vector()
>>> a.push(22)
>>> a.push(88)
>>> a
[22, 88]
>>> a[1]
88
>>> a[2]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "vector.py", line 30, in __getitem__
    raise IndexError('Vector index out of range')
IndexError: Vector index out of range
>>> a.foo('file.txt')
>>> a
[22, 88, 1, 1, 1]
>>> b = Vector()
>>> ^D
destructor called in C++ for 0x1003884d0
destructor called in C++ for 0x10039df10

#3


3  

Basically, returning a C++ object from a dynamically loaded library is not a good idea. To use the C++ vector in Python code, you must teach Python to deal with C++ objects (and this includes binary representation of the objects which can change with new version of a C++ compiler or STL).

基本上,从动态加载的库返回c++对象不是一个好主意。要在Python代码中使用c++向量,您必须教Python处理c++对象(这包括用新版本的c++编译器或STL更改的对象的二进制表示)。

ctypes allows you to interact with a library using C types. Not C++.

ctypes允许您使用C类型与库进行交互。而不是c++。

Maybe the problem is solvable via boost::python, but it looks more reliable to use plain C for the interaction.

也许问题可以通过boost::python来解决,但是使用普通C进行交互看起来更可靠。