在Python(也包括MySQL)中使用C/ c++进行繁重的计算

时间:2022-09-06 13:25:28

I'm implementing an algorithm into my Python web application, and it includes doing some (possibly) large clustering and matrix calculations. I've seen that Python can use C/C++ libraries, and thought that it might be a good idea to utilize this to speed things up.

我正在我的Python web应用程序中实现一个算法,它包括一些(可能的)大型集群和矩阵计算。我已经看到Python可以使用C/ c++库,并且认为利用它来加快速度可能是个好主意。

First: Are there any reasons not to, or anything I should keep in mind while doing this?


Second: I have some reluctance against connecting C to MySQL (where I would get the data the calculations). Is this in any way justified?


3 个解决方案



Use the ecosystem.


For matrices, using numpy and scipy can provide approximately the same range of functionality as tools like Matlab. If you learn to write idiomatic code with these modules, the inner loops can take place in the C or FORTRAN implementations of the modules, resulting in C-like overall performance with Python expressiveness for most tasks. You may also be interested in numexpr, which can further accelerate and in some cases parallelize numpy/scipy expressions.


If you must write compute-intensive inner loops in Python, think hard about it first. Maybe you can reformulate the problem in a way more suited to numpy/scipy. Or, maybe you can use data structures available in Python to come up with a better algorithm rather than a faster implementation of the same algorithm. If not, there’s Cython, which uses a restricted subset of Python to compile to machine code.


Only as a last resort, and after profiling to identify the absolute worst bottlenecks, should you consider writing an extension module in C/C++. There are just so many easier ways to meet the vast majority of performance requirements, and numeric/mathematical code is an area with very good existing library support.

只有在分析了最糟糕的瓶颈之后,您才应该考虑使用C/ c++编写一个扩展模块。有很多更容易的方法来满足绝大部分的性能需求,并且数字/数学代码是一个有很好的现有库支持的领域。



Not the answer you expected, but i have been down that road and advise KISS:


  • First make it work in the most simple way possible.
  • 首先让它以最简单的方式工作。
  • Only than look into speeding things up later / complicating the design.
  • 只是为了以后能加快速度/使设计复杂化。

There are lots of other ways to phrase this such as "do not fix hypothetical problems unless resources are unlimited".




cython support for c++ is much better than what it was. You can use most of the standard library in cython seamlessly. There are up to 500x speedups in the extreme best case.


My experience is that it is best to keep the cython code extremely thin, and forward all arguments to c++. It is much easier to debug c++ directly, and the syntax is better understood. Having to maintain a code base unnecessarily in three different languages is a pain.


Using c++/cython means that you have to spend a little time thinking about ownership issues. I.e. it is often safest not to allocate anything in c++ but prepare the memory in python / cython. (Use array.array or numpy.array). Alternatively, make a c++ object wrapped in cython which has a deallocation function. All this means that your application will be more fragile than if it is written only in python or c++: You are abandoning both RAII / gc.

使用c++/cython意味着您必须花一点时间考虑所有权问题。例如,通常最安全的方法是不使用c++分配任何内容,而是使用python / cython准备内存。(用数组。数组或numpy.array)。或者,在cython中封装一个包含deallocation函数的c++对象。这一切意味着,与只使用python或c++编写的应用程序相比,您的应用程序将更加脆弱:您将放弃RAII / gc。

On the other hand, your python code should translate line for line into modern c++. So this reminds you not to use old fashioned new or delete etc in your new c++ code but make things fast and clean by keeping the abstractions at a high level.


Remember too to re-examine the assumptions behind your original algorithmic choices. What is sensible for python might be foolish for c++.


Finally, python makes everything significantly simpler and cleaner and faster to debug than c++. But in many ways, c++ encourages more powerful abstractions and better separation of concerns.


When you programme with python and cython and c++, it slowly comes to feel like taking the worse bits of both approaches. It might be worth biting the bullet and rewriting completely in c++. You can keep the python test harness and use the original design as a prototype / testbed.




Use the ecosystem.


For matrices, using numpy and scipy can provide approximately the same range of functionality as tools like Matlab. If you learn to write idiomatic code with these modules, the inner loops can take place in the C or FORTRAN implementations of the modules, resulting in C-like overall performance with Python expressiveness for most tasks. You may also be interested in numexpr, which can further accelerate and in some cases parallelize numpy/scipy expressions.


If you must write compute-intensive inner loops in Python, think hard about it first. Maybe you can reformulate the problem in a way more suited to numpy/scipy. Or, maybe you can use data structures available in Python to come up with a better algorithm rather than a faster implementation of the same algorithm. If not, there’s Cython, which uses a restricted subset of Python to compile to machine code.


Only as a last resort, and after profiling to identify the absolute worst bottlenecks, should you consider writing an extension module in C/C++. There are just so many easier ways to meet the vast majority of performance requirements, and numeric/mathematical code is an area with very good existing library support.

只有在分析了最糟糕的瓶颈之后,您才应该考虑使用C/ c++编写一个扩展模块。有很多更容易的方法来满足绝大部分的性能需求,并且数字/数学代码是一个有很好的现有库支持的领域。



Not the answer you expected, but i have been down that road and advise KISS:


  • First make it work in the most simple way possible.
  • 首先让它以最简单的方式工作。
  • Only than look into speeding things up later / complicating the design.
  • 只是为了以后能加快速度/使设计复杂化。

There are lots of other ways to phrase this such as "do not fix hypothetical problems unless resources are unlimited".




cython support for c++ is much better than what it was. You can use most of the standard library in cython seamlessly. There are up to 500x speedups in the extreme best case.


My experience is that it is best to keep the cython code extremely thin, and forward all arguments to c++. It is much easier to debug c++ directly, and the syntax is better understood. Having to maintain a code base unnecessarily in three different languages is a pain.


Using c++/cython means that you have to spend a little time thinking about ownership issues. I.e. it is often safest not to allocate anything in c++ but prepare the memory in python / cython. (Use array.array or numpy.array). Alternatively, make a c++ object wrapped in cython which has a deallocation function. All this means that your application will be more fragile than if it is written only in python or c++: You are abandoning both RAII / gc.

使用c++/cython意味着您必须花一点时间考虑所有权问题。例如,通常最安全的方法是不使用c++分配任何内容,而是使用python / cython准备内存。(用数组。数组或numpy.array)。或者,在cython中封装一个包含deallocation函数的c++对象。这一切意味着,与只使用python或c++编写的应用程序相比,您的应用程序将更加脆弱:您将放弃RAII / gc。

On the other hand, your python code should translate line for line into modern c++. So this reminds you not to use old fashioned new or delete etc in your new c++ code but make things fast and clean by keeping the abstractions at a high level.


Remember too to re-examine the assumptions behind your original algorithmic choices. What is sensible for python might be foolish for c++.


Finally, python makes everything significantly simpler and cleaner and faster to debug than c++. But in many ways, c++ encourages more powerful abstractions and better separation of concerns.


When you programme with python and cython and c++, it slowly comes to feel like taking the worse bits of both approaches. It might be worth biting the bullet and rewriting completely in c++. You can keep the python test harness and use the original design as a prototype / testbed.
