在Python(也包括MySQL)中使用C/ c++进行繁重的计算

I'm implementing an algorithm into my Python web application, and it includes doing some (possibly) large clustering and matrix calculations. I've seen that Python can use C/C++ libraries, and thought that it might be a good idea to utilize this to speed things up.

我正在我的Python web应用程序中实现一个算法，它包括一些(可能的)大型集群和矩阵计算。我已经看到Python可以使用C/ c++库，并且认为利用它来加快速度可能是个好主意。

First: Are there any reasons not to, or anything I should keep in mind while doing this?

首先:有什么理由不这么做，或者我在做这件事的时候应该记住什么?

Second: I have some reluctance against connecting C to MySQL (where I would get the data the calculations). Is this in any way justified?

第二:我不太愿意将C连接到MySQL(在那里我可以得到数据和计算)。这有什么道理吗?

3 个解决方案

#1

Use the ecosystem.

用生态系统。

For matrices, using numpy and scipy can provide approximately the same range of functionality as tools like Matlab. If you learn to write idiomatic code with these modules, the inner loops can take place in the C or FORTRAN implementations of the modules, resulting in C-like overall performance with Python expressiveness for most tasks. You may also be interested in numexpr, which can further accelerate and in some cases parallelize numpy/scipy expressions.

对于矩阵，使用numpy和scipy可以提供与Matlab之类的工具大致相同的功能范围。如果您学习用这些模块编写惯用代码，那么内部循环可以发生在模块的C或FORTRAN实现中，从而使大多数任务具有类似C的整体性能，并且具有Python的表现力。您可能还对numexpr感兴趣，它可以进一步加速，在某些情况下并行化numpy/scipy表达式。

If you must write compute-intensive inner loops in Python, think hard about it first. Maybe you can reformulate the problem in a way more suited to numpy/scipy. Or, maybe you can use data structures available in Python to come up with a better algorithm rather than a faster implementation of the same algorithm. If not, there’s Cython, which uses a restricted subset of Python to compile to machine code.

如果您必须在Python中编写计算密集型的内部循环，请首先认真考虑它。也许你可以用一种更适合于numpy/scipy的方式重新定义这个问题。或者，也许您可以使用Python中可用的数据结构来提出更好的算法，而不是更快地实现相同的算法。如果没有，还有Cython，它使用受限的Python子集编译为机器代码。

Only as a last resort, and after profiling to identify the absolute worst bottlenecks, should you consider writing an extension module in C/C++. There are just so many easier ways to meet the vast majority of performance requirements, and numeric/mathematical code is an area with very good existing library support.

只有在分析了最糟糕的瓶颈之后，您才应该考虑使用C/ c++编写一个扩展模块。有很多更容易的方法来满足绝大部分的性能需求，并且数字/数学代码是一个有很好的现有库支持的领域。

#2

Not the answer you expected, but i have been down that road and advise KISS:

不是你所期待的答案，但我已经沿着那条路走了，并建议你亲吻:

First make it work in the most simple way possible.
首先让它以最简单的方式工作。
Only than look into speeding things up later / complicating the design.
只是为了以后能加快速度/使设计复杂化。

There are lots of other ways to phrase this such as "do not fix hypothetical problems unless resources are unlimited".

还有很多其他的方法可以这样表述:“除非资源是无限的，否则不要解决假设的问题”。

#3

cython support for c++ is much better than what it was. You can use most of the standard library in cython seamlessly. There are up to 500x speedups in the extreme best case.

对c++的cython支持比以前好多了。您可以无缝地使用cython中的大多数标准库。在最好的情况下，有高达500倍的加速。

My experience is that it is best to keep the cython code extremely thin, and forward all arguments to c++. It is much easier to debug c++ directly, and the syntax is better understood. Having to maintain a code base unnecessarily in three different languages is a pain.

我的经验是，最好保持cython代码极细，并将所有参数转发给c++。直接调试c++更容易，语法也更容易理解。在三种不同的语言中不必要地维护一个代码库是一种痛苦。

Using c++/cython means that you have to spend a little time thinking about ownership issues. I.e. it is often safest not to allocate anything in c++ but prepare the memory in python / cython. (Use array.array or numpy.array). Alternatively, make a c++ object wrapped in cython which has a deallocation function. All this means that your application will be more fragile than if it is written only in python or c++: You are abandoning both RAII / gc.

使用c++/cython意味着您必须花一点时间考虑所有权问题。例如，通常最安全的方法是不使用c++分配任何内容，而是使用python / cython准备内存。(用数组。数组或numpy.array)。或者，在cython中封装一个包含deallocation函数的c++对象。这一切意味着，与只使用python或c++编写的应用程序相比，您的应用程序将更加脆弱:您将放弃RAII / gc。

On the other hand, your python code should translate line for line into modern c++. So this reminds you not to use old fashioned new or delete etc in your new c++ code but make things fast and clean by keeping the abstractions at a high level.

另一方面，您的python代码应该将行转换为现代c++。因此，这提醒您不要在新的c++代码中使用过时的新代码或删除代码等，而是通过保持较高的抽象级别使事情快速而干净。

Remember too to re-examine the assumptions behind your original algorithmic choices. What is sensible for python might be foolish for c++.

记住，也要重新审视你最初的算法选择背后的假设。对python来说合理的东西对c++来说可能是愚蠢的。

Finally, python makes everything significantly simpler and cleaner and faster to debug than c++. But in many ways, c++ encourages more powerful abstractions and better separation of concerns.

最后，与c++相比，python使所有东西都变得更简单、更清晰、调试速度更快。但是在许多方面，c++鼓励更强大的抽象和更好的关注点分离。

When you programme with python and cython and c++, it slowly comes to feel like taking the worse bits of both approaches. It might be worth biting the bullet and rewriting completely in c++. You can keep the python test harness and use the original design as a prototype / testbed.

当您使用python、cython和c++编写程序时，您会慢慢地觉得这两种方法都很糟糕。这可能是值得一试的，完全用c++重写。您可以保留python测试工具，并将原始设计用作原型/测试台。

#1