C ++的“yield”关键字，如何从函数中返回迭代器？

Consider the following code.

请考虑以下代码。

std::vector<result_data> do_processing() 
{
    pqxx::result input_data = get_data_from_database();
    return process_data(input_data);
}

std::vector<result_data> process_data(pqxx::result const & input_data)
{
    std::vector<result_data> ret;
    pqxx::result::const_iterator row;
    for (row = input_data.begin(); row != inpupt_data.end(); ++row) 
    {
        // somehow populate output vector
    }
    return ret;
}

While I was thinking about whether or not I could expect Return Value Optimization (RVO) to happen, I found this answer by Jerry Coffin [emphasis mine]:

当我在考虑是否可以期待返回值优化(RVO)时,我发现Jerry Coffin的这个答案[强调我的]:

At least IMO, it's usually a poor idea, but not for efficiency reasons. It's a poor idea because the function in question should usually be written as a generic algorithm that produces its output via an iterator. Almost any code that accepts or returns a container instead of operating on iterators should be considered suspect.

至少IMO,这通常是一个糟糕的主意,但不是出于效率原因。这是一个糟糕的主意,因为有问题的函数通常应该写成通过迭代器生成其输出的通用算法。几乎所有接受或返回容器而不是在迭代器上运行的代码都应该被认为是可疑的。

Don't get me wrong: there are times it makes sense to pass around collection-like objects (e.g., strings) but for the example cited, I'd consider passing or returning the vector a poor idea.

不要误解我的意思:有时候传递类似于集合的对象(例如字符串)是有意义的,但对于引用的例子,我会考虑传递或返回向量一个糟糕的想法。

Having some Python background, I like Generators very much. Actually, if it were Python, I would have written above function as a Generator, i.e. to avoid the necessity of processing the entire data before anything else could happen. For example like this:

有一些Python背景,我非常喜欢Generators。实际上,如果它是Python,我会把上面的函数写成一个Generator,即避免在其他任何事情发生之前处理整个数据的必要性。例如这样:

def process_data(input_data):
    for item in input_data:
        # somehow process items
        yield result_data

If I correctly interpreted Jerry Coffins note, this is what he suggested, isn't it? If so, how can I implement this in C++?

如果我正确地解释了Jerry Coffins的说法,这就是他的建议,不是吗?如果是这样,我如何在C ++中实现它?

4 个解决方案

#1

No, that’s not what Jerry means, at least not directly.

不,这不是杰瑞的意思,至少不是直接的。

yield in Python implements coroutines. C++ doesn’t have them (but they can of course be emulated but that’s a bit involved if done cleanly).

Python中的yield实现了协同程序。 C ++没有它们(但它们当然可以被模拟,但如果干净利落就有点参与)。

But what Jerry meant is simply that you should pass in an output iterator which is then written to:

但Jerry的意思只是你应该传入一个输出迭代器,然后写入:

template <typename O>
void process_data(pqxx::result const & input_data, O iter) {
    for (row = input_data.begin(); row != inpupt_data.end(); ++row)
        *iter++ = some_value;
}

And call it:

并称之为:

std::vector<result_data> result;
process_data(input, std::back_inserter(result));

I’m not convinced though that this is generally better than just returning the vector.

我不相信这通常比返回矢量更好。

#2

There is a blog post by Boost.Asio author Chris Kohlhoff about this: http://blog.think-async.com/2009/08/secret-sauce-revealed.html

Boost.Asio作者Chris Kohlhoff发表了一篇博文,内容如下:http://blog.think-async.com/2009/08/secret-sauce-revealed.html

He simulates yield with a macro

他用宏来模拟产量

#define yield \
  if ((_coro_value = __LINE__) == 0) \
  { \
    case __LINE__: ; \
    (void)&you_forgot_to_add_the_entry_label; \
  } \
  else \
    for (bool _coro_bool = false;; \
         _coro_bool = !_coro_bool) \
      if (_coro_bool) \
        goto bail_out_of_coroutine; \
      else

This has to be used in conjunction with a coroutine class. See the blog for more details.

这必须与协程类一起使用。有关详细信息,请参阅博客。

#3

When you parse something recursively or when the processing has states, the generator pattern could be a good idea and simplify the code greatly—one cannot easily iterate then, and normally callbacks are the alternative. I want to have yield, and find that Boost.Coroutine2 seems good to use now.

当你递归地解析某些东西或者当处理有状态时,生成器模式可能是一个好主意并且大大简化了代码 - 人们不能轻易迭代,通常回调是替代方案。我想得到收益,并发现Boost.Coroutine2现在似乎很好用。

The code below is an example to cat files. Of course it is meaningless, until the point when you want to process the text lines further:

下面的代码是cat文件的示例。当然,这是没有意义的,直到你想要进一步处理文本行为止:

#include <fstream>
#include <functional>
#include <iostream>
#include <string>
#include <boost/coroutine2/all.hpp>

using namespace std;

typedef boost::coroutines2::coroutine<const string&> coro_t;

void cat(coro_t::push_type& yield, int argc, char* argv[])
{
    for (int i = 1; i < argc; ++i) {
        ifstream ifs(argv[i]);
        for (;;) {
            string line;
            if (getline(ifs, line)) {
                yield(line);
            } else {
                break;
            }
        }
    }
}

int main(int argc, char* argv[])
{
    using namespace std::placeholders;
    coro_t::pull_type seq(
            boost::coroutines2::fixedsize_stack(),
            bind(cat, _1, argc, argv));
    for (auto& line : seq) {
        cout << line << endl;
    }
}

#4

I found that a istream-like behavior would come close to what I had in mind. Consider the following (untested) code:

我发现类似于蠢叫的行为会接近我的想法。考虑以下(未经测试的)代码:

struct data_source {
public:
    // for delivering data items
    data_source& operator>>(input_data_t & i) {
        i = input_data.front(); 
        input_data.pop_front(); 
        return *this; 
    }
    // for boolean evaluation
    operator void*() { return input_data.empty() ? 0 : this; }

private:
    std::deque<input_data_t> input_data;

    // appends new data to private input_data
    // potentially asynchronously
    void get_data_from_database();
};

Now I can do as the following example shows:

现在我可以做,如下例所示:

int main () {
    data_source d;
    input_data_t i;
    while (d >> i) {
        // somehow process items
        result_data_t r(i);
        cout << r << endl;
    }
}

This way the data acquisition is somehow decoupled from the processing and is thereby allowed to happen lazy/asynchronously. That is, I could process the items as they arrive and I don't have to wait until the vector is filled completely as in the other example.

这样,数据采集以某种方式与处理分离,从而允许以惰性/异步方式发生。也就是说,我可以在它们到达时处理这些项目,而不必像其他示例那样等待向量完全填充。

#1