Whilst researching performance trade-offs between Python and C++, I've devised a small example, which mostly focusses on a dumb substring matching.
在研究Python和c++之间的性能权衡时,我设计了一个小示例,它主要关注于哑子字符串匹配。
Here is the relevant C++:
以下是相关的c++:
using std::string;
std::vector<string> matches;
std::copy_if(patterns.cbegin(), patterns.cend(), back_inserter(matches),
[&fileContents] (const string &pattern) { return fileContents.find(pattern) != string::npos; } );
The above is built with -O3.
上面是用-O3构建的。
And here is Python:
这是Python:
def getMatchingPatterns(patterns, text):
return filter(text.__contains__, patterns)
Both of them take a large-ish set of patterns and input file, and filter down the list of patterns to the ones found in the file using a dumb substring search.
它们都采用一组大的模式和输入文件,并使用哑子字符串搜索将模式列表过滤到文件中找到的模式。
The versions are:
版本:
- gcc - 4.8.2 (Ubuntu) and 4.9.2 (cygwin)
- gcc - 4.8.2 (Ubuntu)和4.9.2 (cygwin)
- python - 2.7.6 (Ubuntu) and 2.7.8 (cygwin)
- python - 2.7.6 (Ubuntu)和2.7.8 (cygwin)
What was surprising to me is the performance. I've run both on a low-spec Ubuntu and Python was faster by about 20%. The same on mid-spec PC with cygwin - Python twice faster. Profiler shows that 99+% of cycles are spent in string matching (string copying and list comprehensions are insignificant).
令我惊讶的是表演。我在低规格的Ubuntu上运行过,Python的运行速度快了20%。同样的事情也发生在中型PC上,cygwin - Python比cygwin快了两倍。Profiler显示99% +%的循环用于字符串匹配(不需要进行字符串复制和列表理解)。
Obviously, the Python implementation is native C, and I'd expected to be roughly the same as C++, but did not expect it as fast.
显然,Python实现是本地的C,我希望它与c++大致相同,但没想到它会这么快。
Any insight into relevant CPython optimisations in comparison to gcc would be most welcome.
与gcc相比,任何有关CPython优化的见解都是最受欢迎的。
For reference, here are the full examples. The inputs just take a set of 50K HTLMs (all read from disk in each test, no special caching):
作为参考,这里有完整的例子。输入只接受一组50K HTLMs(每个测试都从磁盘读取,没有特殊缓存):
Python:
Python:
import sys
def getMatchingPatterns(patterns, text):
return filter(text.__contains__, patterns)
def serialScan(filenames, patterns):
return zip(filenames, [getMatchingPatterns(patterns, open(filename).read()) for filename in filenames])
if __name__ == "__main__":
with open(sys.argv[1]) as filenamesListFile:
filenames = filenamesListFile.read().split()
with open(sys.argv[2]) as patternsFile:
patterns = patternsFile.read().split()
resultTuple = serialScan(filenames, patterns)
for filename, patterns in resultTuple:
print ': '.join([filename, ','.join(patterns)])
C++:
c++:
#include <iostream>
#include <iterator>
#include <fstream>
#include <string>
#include <vector>
#include <unordered_map>
#include <algorithm>
using namespace std;
using MatchResult = unordered_map<string, vector<string>>;
static const size_t PATTERN_RESERVE_DEFAULT_SIZE = 5000;
MatchResult serialMatch(const vector<string> &filenames, const vector<string> &patterns)
{
MatchResult res;
for (auto &filename : filenames)
{
ifstream file(filename);
const string fileContents((istreambuf_iterator<char>(file)),
istreambuf_iterator<char>());
vector<string> matches;
std::copy_if(patterns.cbegin(), patterns.cend(), back_inserter(matches),
[&fileContents] (const string &pattern) { return fileContents.find(pattern) != string::npos; } );
res.insert(make_pair(filename, std::move(matches)));
}
return res;
}
int main(int argc, char **argv)
{
vector<string> filenames;
ifstream filenamesListFile(argv[1]);
std::copy(istream_iterator<string>(filenamesListFile), istream_iterator<string>(),
back_inserter(filenames));
vector<string> patterns;
patterns.reserve(PATTERN_RESERVE_DEFAULT_SIZE);
ifstream patternsFile(argv[2]);
std::copy(istream_iterator<string>(patternsFile), istream_iterator<string>(),
back_inserter(patterns));
auto matchResult = serialMatch(filenames, patterns);
for (const auto &matchItem : matchResult)
{
cout << matchItem.first << ": ";
for (const auto &matchString : matchItem.second)
cout << matchString << ",";
cout << endl;
}
}
1 个解决方案
#1
13
The python 3.4 code b'abc' in b'abcabc'
(or b'abcabc'.__contains__(b'abc')
as in your example) executes the bytes_contains
method, which in turn calls the inlined function stringlib_find
; which delegates the search to FASTSEARCH
.
在b'abcabc'(或b'abcabc'.__contains__ '.__contains__(b'abc')中的python 3.4代码中,执行bytes_contains方法,该方法反过来调用内联函数stringlib_find;将搜索委托给FASTSEARCH。
The FASTSEARCH
function then uses a modified Boyer-Moore search algorithm:
FASTSEARCH函数然后使用一个经过修改的Boyer-Moore搜索算法:
fast search/count implementation, based on a mix between boyer- moore and horspool, with a few more bells and whistles on the top. for some more background, see: http://effbot.org/zone/stringlib.htm
快速搜索/计数实现,基于boyer- moore和horspool的混合,上面还有一些附加功能。有关更多的背景知识,请参见:http://effbot.org/zone/stringlib.htm
There are some modifications too, as noted by the comments:
如评论所指出,也有一些修改:
note: fastsearch may access
s[n]
, which isn't a problem when using Python's ordinary string types, but may cause problems if you're using this code in other contexts. also, the count mode returns-1
if there cannot possible be a match in the target string, and0
if it has actually checked for matches, but didn't find any. callers beware!注意:fastsearch可能访问s[n],这在使用Python的普通字符串类型时不是问题,但是如果在其他上下文中使用此代码,则可能会导致问题。此外,如果目标字符串中不可能有匹配项,计数模式返回-1;如果确实检查了匹配项,返回0,但没有找到匹配项。调用者小心!
The GNU C++ Standard Library basic_string<T>::find()
implementation is as generic (and dumb) as possible; it just tries dumbly matching the pattern at each and every consecutive character position until it finds the match.
GNU c++标准库basic_string
TL;DR: The reason why the C++ standard library is so slow compared to Python is because it tries to do a generic algorithm on top of std::basic_string<char>
, but fails to do it efficiently for the more interesting cases; whereas in Python the programmer gets the most efficient algorithms on case-by-case basis for free.
与Python相比,c++标准库如此缓慢的原因是,它试图在std::basic_string
#1
13
The python 3.4 code b'abc' in b'abcabc'
(or b'abcabc'.__contains__(b'abc')
as in your example) executes the bytes_contains
method, which in turn calls the inlined function stringlib_find
; which delegates the search to FASTSEARCH
.
在b'abcabc'(或b'abcabc'.__contains__ '.__contains__(b'abc')中的python 3.4代码中,执行bytes_contains方法,该方法反过来调用内联函数stringlib_find;将搜索委托给FASTSEARCH。
The FASTSEARCH
function then uses a modified Boyer-Moore search algorithm:
FASTSEARCH函数然后使用一个经过修改的Boyer-Moore搜索算法:
fast search/count implementation, based on a mix between boyer- moore and horspool, with a few more bells and whistles on the top. for some more background, see: http://effbot.org/zone/stringlib.htm
快速搜索/计数实现,基于boyer- moore和horspool的混合,上面还有一些附加功能。有关更多的背景知识,请参见:http://effbot.org/zone/stringlib.htm
There are some modifications too, as noted by the comments:
如评论所指出,也有一些修改:
note: fastsearch may access
s[n]
, which isn't a problem when using Python's ordinary string types, but may cause problems if you're using this code in other contexts. also, the count mode returns-1
if there cannot possible be a match in the target string, and0
if it has actually checked for matches, but didn't find any. callers beware!注意:fastsearch可能访问s[n],这在使用Python的普通字符串类型时不是问题,但是如果在其他上下文中使用此代码,则可能会导致问题。此外,如果目标字符串中不可能有匹配项,计数模式返回-1;如果确实检查了匹配项,返回0,但没有找到匹配项。调用者小心!
The GNU C++ Standard Library basic_string<T>::find()
implementation is as generic (and dumb) as possible; it just tries dumbly matching the pattern at each and every consecutive character position until it finds the match.
GNU c++标准库basic_string
TL;DR: The reason why the C++ standard library is so slow compared to Python is because it tries to do a generic algorithm on top of std::basic_string<char>
, but fails to do it efficiently for the more interesting cases; whereas in Python the programmer gets the most efficient algorithms on case-by-case basis for free.
与Python相比,c++标准库如此缓慢的原因是,它试图在std::basic_string