如何写一个boost:::spirit::qi解析器做什么?“在正则表达式?

时间:2022-09-06 20:58:48

Let's say we have a regex "start:(?: ([0-9]{1,2}))? ([0-9].*)".

假设我们有一个regex“start:(?([0 - 9]{ 1,2 }))?([0 - 9]*)”。

It will match

它将匹配

std::string string1 = "start: 01 0ab";

and

std::string string2 = "start: 0ab";

We can also get the 2 matched string respectively.

我们还可以分别得到两个匹配的字符串。

I try to use boost::spirit::qi parser to parse string2 but it couldn't match.

我尝试使用boost::spirit: qi解析器来解析string2,但它不能匹配。

qi::rule<std::string::const_iterator, std::string()> rule1 = qi::repeat(1,2)[qi::digit];
qi::rule<std::string::const_iterator, std::string()> rule2 = qi::digit >> *qi::char_;
std::vector<std::string> attr;
auto it_begin = string2.begin();
auto it_end = string2.end();
if (qi::parse(
    it_begin,
    it_end,
    qi::lit("start:")
         >> -(qi::lit(" ") >> rule1)
         >> qi::lit(" ") >> rule2
         >> qi::eoi,
    attr))
    std::cout<<"match"<<std::endl;
else
    std::cout<<"not match"<<std::endl;

We can of course use a look-ahead operator to check what's behind rule1, but is there a more generic approach to implement regex operator '?' ? Thanks!

当然,我们可以使用前面的操作符来检查rule1后面的内容,但是是否有一个更通用的方法来实现regex操作符呢?”?谢谢!

1 个解决方案

#1


3  

I'm not sure what's wrong with the expectation. It is the only way for otherwise ambiguous rules, since PEG grammars are always greedy.

我不确定期望有什么问题。这是唯一的方法,否则模糊的规则,因为PEG语法总是贪婪。

However, maybe you didn't arrive at the most elegant form, since you were looking for something "better". Here's what I'd do.

然而,也许你没有达到最优雅的形式,因为你在寻找“更好”的东西。这就是我做的。

I'd use a skipper to match spaces¹:

我使用一个队长¹匹配空间:

    if (qi::phrase_parse(it_begin, it_end,
                "start:" >> -rule1 >> rule2 >> qi::eoi,
                qi::space, attr))

Where the rules are still lexemes (because there were declared without the skipper):

规则仍然是词汇表的地方(因为没有船长声明):

qi::rule<It, std::string()> const 
    rule1 = qi::digit >> qi::digit >> &qi::space,
    rule2 = qi::digit >> *qi::graph;

Note qi::graph doesn't match whitespace, where *qi::char_ simply matches anything at all greedily.

注::图不匹配空格,其中*qi::char_只是贪婪地匹配任何东西。

Live On Coliru

住在Coliru

#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;

int main() {
    using It = std::string::const_iterator;

    // implicitly lexemes (no skipper in rule declaration)
    qi::rule<It, std::string()> const 
        rule1 = qi::digit >> qi::digit >> &qi::space,
        rule2 = qi::digit >> *qi::graph;

    for (std::string const input : { "start: 01 0ab", "start: 0ab", }) {
        std::vector<std::string> attr;

        auto it_begin = input.begin();
        auto it_end   = input.end();

        if (qi::phrase_parse(it_begin, it_end, "start:" >> -rule1 >> rule2 >> qi::eoi, qi::space, attr))
            std::cout << "match\n";
        else
            std::cout << "not match\n";

        if (it_begin!=it_end)
            std::cout<<"Remaining unparsed input: '" << std::string(it_begin, it_end) << "'\n";
    }
}

Prints

打印

match
match

¹ this assumes that multiple/different whitespace is okay. If newlines should not count as whitespace, use qi::blank instead of qi::space

¹这假设多个/不同的空格是好的。如果换行不应该算作空格,使用qi::blank而不是qi::space

#1


3  

I'm not sure what's wrong with the expectation. It is the only way for otherwise ambiguous rules, since PEG grammars are always greedy.

我不确定期望有什么问题。这是唯一的方法,否则模糊的规则,因为PEG语法总是贪婪。

However, maybe you didn't arrive at the most elegant form, since you were looking for something "better". Here's what I'd do.

然而,也许你没有达到最优雅的形式,因为你在寻找“更好”的东西。这就是我做的。

I'd use a skipper to match spaces¹:

我使用一个队长¹匹配空间:

    if (qi::phrase_parse(it_begin, it_end,
                "start:" >> -rule1 >> rule2 >> qi::eoi,
                qi::space, attr))

Where the rules are still lexemes (because there were declared without the skipper):

规则仍然是词汇表的地方(因为没有船长声明):

qi::rule<It, std::string()> const 
    rule1 = qi::digit >> qi::digit >> &qi::space,
    rule2 = qi::digit >> *qi::graph;

Note qi::graph doesn't match whitespace, where *qi::char_ simply matches anything at all greedily.

注::图不匹配空格,其中*qi::char_只是贪婪地匹配任何东西。

Live On Coliru

住在Coliru

#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;

int main() {
    using It = std::string::const_iterator;

    // implicitly lexemes (no skipper in rule declaration)
    qi::rule<It, std::string()> const 
        rule1 = qi::digit >> qi::digit >> &qi::space,
        rule2 = qi::digit >> *qi::graph;

    for (std::string const input : { "start: 01 0ab", "start: 0ab", }) {
        std::vector<std::string> attr;

        auto it_begin = input.begin();
        auto it_end   = input.end();

        if (qi::phrase_parse(it_begin, it_end, "start:" >> -rule1 >> rule2 >> qi::eoi, qi::space, attr))
            std::cout << "match\n";
        else
            std::cout << "not match\n";

        if (it_begin!=it_end)
            std::cout<<"Remaining unparsed input: '" << std::string(it_begin, it_end) << "'\n";
    }
}

Prints

打印

match
match

¹ this assumes that multiple/different whitespace is okay. If newlines should not count as whitespace, use qi::blank instead of qi::space

¹这假设多个/不同的空格是好的。如果换行不应该算作空格,使用qi::blank而不是qi::space