i can´t see my error here .. this rule parse some stuff ok but the last two samples not. Could somebody please give me a hint ..
我看´t错误. .这个规则可以解析一些东西,但是最后两个示例不行。谁能给我一个提示吗
Goal is a parser than can identify member property access and member function calls. Also chained in some way
目标是一个解析器,它可以识别成员属性访问和成员函数调用。也以某种方式被锁住
a()
a(para)
x.a()
x.a(para)
x.a(para).g(para).j()
x.y
x.y.z
x.y.z() <---fail
y.z.z(para) <--- fail
lvalue =
iter_pos >> name[_val = _1]
>> *(lit('(') > paralistopt > lit(')') >> iter_pos)[_val = construct<common_node>(type_cmd_fnc_call, LOCATION_NODE_ITER(_val, _2), key_this, construct<common_node>(_val), key_parameter, construct<std::vector<common_node> >(_1))]
>> *(lit('.') >> name_pure >> lit('(') > paralistopt > lit(')') >> iter_pos)[_val = construct<common_node>(type_cmd_fnc_call, LOCATION_NODE_ITER(_val, _3), key_this, construct<common_node>(_val), key_callname, construct<std::wstring>(_1), key_parameter, construct<std::vector<common_node> >(_2))]
>> *(lit('.') >> name_pure >> iter_pos)[_val = construct<common_node>(type_cmd_dot_call, LOCATION_NODE_ITER(_val, _2), key_this, construct<common_node>(_val), key_propname, construct<std::wstring>(_1))]
;
thank you Markus
谢谢马库斯
2 个解决方案
#1
1
You provide very little information to go at. Let me humor you with my entry into this guessing game:
你提供的信息很少。让我用这个猜谜游戏的入口来逗你:
Let's assume you want to parse a simple "language" that merely allows member expressions and function invocations, but chained.
让我们假设您想要解析一个简单的“语言”,它只允许成员表达式和函数调用,但是是链式的。
Now, your grammar says nothing about the parameters (though it's clear the param list can be empty), so let me go the next mile and assume that you want to accept the same kind of expressions there (so foo(a)
is okay, but also bar(foo(a))
or bar(b.foo(a))
).
现在,您的语法没有说明参数(尽管很明显,param列表可以是空的),所以让我再往前走一英里,假设您希望在那里接受相同的表达式(所以foo(a)可以,但是bar(foo(a)或bar(b.foo(a)))也可以。
Since you accept chaining of function calls, it appears that functions are first-class objects (and functions can return functions), so foo(a)(b, c, d)
should be accepted as well.
由于您接受函数调用的链接,因此看起来函数是一级对象(函数可以返回函数),因此也应该接受foo(a)(b、c、d)。
You didn't mention it, but parameters often include literals (sqrt(9)
comes to mind, or println("hello world")
).
您没有提到它,但是参数通常包括文字(sqrt(9)),或者println(“hello world”)。
Other items:
其他物品:
- you didn't say but likely you want to ignore whitespace in certain spots
- 你没有说,但很可能你想忽略某些地方的空白。
- from the
iter_pos
(ab)use it seems you're interested in tracking the original source location inside the resulting AST. - 通过使用iter_pos (ab),您似乎对跟踪产生的AST中的原始源位置感兴趣。
1. Define An AST
We should keep it simple as ever:
我们应该一如既往地保持简单:
namespace Ast {
using Identifier = boost::iterator_range<It>;
struct MemberExpression;
struct FunctionCall;
using Expression = boost::variant<
double, // some literal types
std::string,
// non-literals
Identifier,
boost::recursive_wrapper<MemberExpression>,
boost::recursive_wrapper<FunctionCall>
>;
struct MemberExpression {
Expression object; // antecedent
Identifier member; // function or field
};
using Parameter = Expression;
using Parameters = std::vector<Parameter>;
struct FunctionCall {
Expression function; // could be a member function
Parameters parameters;
};
}
NOTE We're not going to focus on showing source locations, but already made one provision, storing identifiers as an iterator-range.
注意,我们不会关注显示源位置,而是已经做了一个准备,将标识符作为一个迭代器范围。
NOTE Fusion-adapting the only types not directly supported by Spirit:
注意融合-调整唯一没有精神直接支持的类型:
BOOST_FUSION_ADAPT_STRUCT(Ast::MemberExpression, object, member) BOOST_FUSION_ADAPT_STRUCT(Ast::FunctionCall, function, parameters)
We will find that we don't use these, because Semantic Actions are more convenient here.
我们会发现我们不使用这些,因为语义操作在这里更方便。
2. A Matching Grammar
Grammar() : Grammar::base_type(start) {
using namespace qi;
start = skip(space) [expression];
identifier = raw [ (alpha|'_') >> *(alnum|'_') ];
parameters = -(expression % ',');
expression
= literal
| identifier >> *(
('.' >> identifier)
| ('(' >> parameters >> ')')
);
literal = double_ | string_;
string_ = '"' >> *('\\' >> char_ | ~char_('"')) >> '"';
BOOST_SPIRIT_DEBUG_NODES(
(identifier)(start)(parameters)(expression)(literal)(string_)
);
}
In this skeleton most rules benefit from automatic attribute propagation. The one that doesn't is expression
:
在这个框架中,大多数规则都受益于自动属性传播。没有的是表达式:
qi::rule<It, Expression()> start;
using Skipper = qi::space_type;
qi::rule<It, Expression(), Skipper> expression, literal;
qi::rule<It, Parameters(), Skipper> parameters;
// lexemes
qi::rule<It, Identifier()> identifier;
qi::rule<It, std::string()> string_;
So, let's create some helpers for the semantic actions.
让我们为语义动作创建一些助手。
NOTE An important take-away here is to create your own higher-level building blocks instead of toiling away with
boost::phoenix::construct<>
etc.注意,这里重要的一点是创建您自己的高级构建块,而不是使用boost::phoenix::构造<>等。
Define two simple construction functions:
定义两个简单的构造函数:
struct mme_f { MemberExpression operator()(Expression lhs, Identifier rhs) const { return { lhs, rhs }; } };
struct mfc_f { FunctionCall operator()(Expression f, Parameters params) const { return { f, params }; } };
phx::function<mme_f> make_member_expression;
phx::function<mfc_f> make_function_call;
Then use them:
然后使用它们:
expression
= literal [_val=_1]
| identifier [_val=_1] >> *(
('.' >> identifier) [ _val = make_member_expression(_val, _1)]
| ('(' >> parameters >> ')') [ _val = make_function_call(_val, _1) ]
);
That's all. We're ready to roll!
这是所有。我们准备滚!
3. DEMO
住在Coliru
I created a test bed looking like this:
我设计了一个试验台,看起来是这样的:
int main() {
using It = std::string::const_iterator;
Parser::Grammar<It> const g;
for (std::string const input : {
"a()", "a(para)", "x.a()", "x.a(para)", "x.a(para).g(para).j()", "x.y", "x.y.z",
"x.y.z()",
"y.z.z(para)",
// now let's add some funkyness that you didn't mention
"bar(foo(a))",
"bar(b.foo(a))",
"foo(a)(b, c, d)", // first class functions
"sqrt(9)",
"println(\"hello world\")",
"allocate(strlen(\"aaaaa\"))",
"3.14",
"object.rotate(180)",
"object.rotate(event.getAngle(), \"torque\")",
"app.mainwindow().find_child(\"InputBox\").font().size(12)",
"app.mainwindow().find_child(\"InputBox\").font(config().preferences.baseFont(style.PROPORTIONAL))"
}) {
std::cout << " =========== '" << input << "' ========================\n";
It f(input.begin()), l(input.end());
Ast::Expression parsed;
bool ok = parse(f, l, g, parsed);
if (ok) {
std::cout << "Parsed: " << parsed << "\n";
}
else
std::cout << "Parse failed\n";
if (f != l)
std::cout << "Remaining unparsed input: '" << std::string(f, l) << "'\n";
}
}
Incredible as it may appear, this already parses all the test cases and prints:
令人难以置信的是,它已经解析了所有的测试用例和打印:
=========== 'a()' ========================
Parsed: a()
=========== 'a(para)' ========================
Parsed: a(para)
=========== 'x.a()' ========================
Parsed: x.a()
=========== 'x.a(para)' ========================
Parsed: x.a(para)
=========== 'x.a(para).g(para).j()' ========================
Parsed: x.a(para).g(para).j()
=========== 'x.y' ========================
Parsed: x.y
=========== 'x.y.z' ========================
Parsed: x.y.z
=========== 'x.y.z()' ========================
Parsed: x.y.z()
=========== 'y.z.z(para)' ========================
Parsed: y.z.z(para)
=========== 'bar(foo(a))' ========================
Parsed: bar(foo(a))
=========== 'bar(b.foo(a))' ========================
Parsed: bar(b.foo(a))
=========== 'foo(a)(b, c, d)' ========================
Parsed: foo(a)(b, c, d)
=========== 'sqrt(9)' ========================
Parsed: sqrt(9)
=========== 'println("hello world")' ========================
Parsed: println(hello world)
=========== 'allocate(strlen("aaaaa"))' ========================
Parsed: allocate(strlen(aaaaa))
=========== '3.14' ========================
Parsed: 3.14
=========== 'object.rotate(180)' ========================
Parsed: object.rotate(180)
=========== 'object.rotate(event.getAngle(), "torque")' ========================
Parsed: object.rotate(event.getAngle(), torque)
=========== 'app.mainwindow().find_child("InputBox").font().size(12)' ========================
Parsed: app.mainwindow().find_child(InputBox).font().size(12)
=========== 'app.mainwindow().find_child("InputBox").font(config().preferences.baseFont(style.PROPORTIONAL))' ========================
Parsed: app.mainwindow().find_child(InputBox).font(config().preferences.baseFont(style.PROPORTIONAL))
4. Too Good To Be True?
You're right. I cheated. I didn't show you this code required to debug print the parsed AST:
你是对的。我被骗了。我没有向您展示调试输出解析AST所需的代码:
namespace Ast {
static inline std::ostream& operator<<(std::ostream& os, MemberExpression const& me) {
return os << me.object << "." << me.member;
}
static inline std::ostream& operator<<(std::ostream& os, FunctionCall const& fc) {
os << fc.function << "(";
bool first = true;
for (auto& p : fc.parameters) { if (!first) os << ", "; first = false; os << p; }
return os << ")";
}
}
It's only debug printing, as string literals aren't correctly roundtripped. But it's only 10 lines of code, that's a bonus.
这只是调试打印,因为字符串文字没有被正确地往返。但是只有10行代码,这是额外的奖励。
5. The Full Monty: Source Locations
This had your interest, so let's show it working. Let's add a simple loop to print all locations of identifiers:
这是你的兴趣所在,让我们来展示一下。让我们添加一个简单的循环来打印标识符的所有位置:
using IOManip::showpos;
for (auto& id : all_identifiers(parsed)) {
std::cout << " - " << id << " at " << showpos(id, input) << "\n";
}
Of course, this begs the question, what are showpos
and all_identifiers
?
当然,这引出了一个问题,什么是showpos和all_identifier ?
namespace IOManip {
struct showpos_t {
boost::iterator_range<It> fragment;
std::string const& source;
friend std::ostream& operator<<(std::ostream& os, showpos_t const& manip) {
auto ofs = [&](It it) { return it - manip.source.begin(); };
return os << "[" << ofs(manip.fragment.begin()) << ".." << ofs(manip.fragment.end()) << ")";
}
};
showpos_t showpos(boost::iterator_range<It> fragment, std::string const& source) {
return {fragment, source};
}
}
As for the identifier extraction:
关于标识符提取:
std::vector<Identifier> all_identifiers(Expression const& expr) {
std::vector<Identifier> result;
struct Harvest {
using result_type = void;
std::back_insert_iterator<std::vector<Identifier> > out;
void operator()(Identifier const& id) { *out++ = id; }
void operator()(MemberExpression const& me) { apply_visitor(*this, me.object); *out++ = me.member; }
void operator()(FunctionCall const& fc) {
apply_visitor(*this, fc.function);
for (auto& p : fc.parameters) apply_visitor(*this, p);
}
// non-identifier expressions
void operator()(std::string const&) { }
void operator()(double) { }
} harvest { back_inserter(result) };
boost::apply_visitor(harvest, expr);
return result;
}
That's a tree visitor that harvests all identifiers recursively, inserting them into the back of a container.
树访问器递归地获取所有标识符,并将它们插入到容器的后面。
住在Coliru
Where output looks like (excerpt):
其中输出为(摘录):
=========== 'app.mainwindow().find_child("InputBox").font(config().preferences.baseFont(style.PROPORTIONAL))' ========================
Parsed: app.mainwindow().find_child(InputBox).font(config().preferences.baseFont(style.PROPORTIONAL))
- app at [0..3)
- mainwindow at [4..14)
- find_child at [17..27)
- font at [40..44)
- config at [45..51)
- preferences at [54..65)
- baseFont at [66..74)
- style at [75..80)
- PROPORTIONAL at [81..93)
#2
-1
Try changing
试着改变
>> *(lit('.') >> name_pure >> lit('(') > paralistopt > lit(')'))
to
来
>> *(*(lit('.') >> name_pure) >> lit('(') > paralistopt > lit(')'))
#1
1
You provide very little information to go at. Let me humor you with my entry into this guessing game:
你提供的信息很少。让我用这个猜谜游戏的入口来逗你:
Let's assume you want to parse a simple "language" that merely allows member expressions and function invocations, but chained.
让我们假设您想要解析一个简单的“语言”,它只允许成员表达式和函数调用,但是是链式的。
Now, your grammar says nothing about the parameters (though it's clear the param list can be empty), so let me go the next mile and assume that you want to accept the same kind of expressions there (so foo(a)
is okay, but also bar(foo(a))
or bar(b.foo(a))
).
现在,您的语法没有说明参数(尽管很明显,param列表可以是空的),所以让我再往前走一英里,假设您希望在那里接受相同的表达式(所以foo(a)可以,但是bar(foo(a)或bar(b.foo(a)))也可以。
Since you accept chaining of function calls, it appears that functions are first-class objects (and functions can return functions), so foo(a)(b, c, d)
should be accepted as well.
由于您接受函数调用的链接,因此看起来函数是一级对象(函数可以返回函数),因此也应该接受foo(a)(b、c、d)。
You didn't mention it, but parameters often include literals (sqrt(9)
comes to mind, or println("hello world")
).
您没有提到它,但是参数通常包括文字(sqrt(9)),或者println(“hello world”)。
Other items:
其他物品:
- you didn't say but likely you want to ignore whitespace in certain spots
- 你没有说,但很可能你想忽略某些地方的空白。
- from the
iter_pos
(ab)use it seems you're interested in tracking the original source location inside the resulting AST. - 通过使用iter_pos (ab),您似乎对跟踪产生的AST中的原始源位置感兴趣。
1. Define An AST
We should keep it simple as ever:
我们应该一如既往地保持简单:
namespace Ast {
using Identifier = boost::iterator_range<It>;
struct MemberExpression;
struct FunctionCall;
using Expression = boost::variant<
double, // some literal types
std::string,
// non-literals
Identifier,
boost::recursive_wrapper<MemberExpression>,
boost::recursive_wrapper<FunctionCall>
>;
struct MemberExpression {
Expression object; // antecedent
Identifier member; // function or field
};
using Parameter = Expression;
using Parameters = std::vector<Parameter>;
struct FunctionCall {
Expression function; // could be a member function
Parameters parameters;
};
}
NOTE We're not going to focus on showing source locations, but already made one provision, storing identifiers as an iterator-range.
注意,我们不会关注显示源位置,而是已经做了一个准备,将标识符作为一个迭代器范围。
NOTE Fusion-adapting the only types not directly supported by Spirit:
注意融合-调整唯一没有精神直接支持的类型:
BOOST_FUSION_ADAPT_STRUCT(Ast::MemberExpression, object, member) BOOST_FUSION_ADAPT_STRUCT(Ast::FunctionCall, function, parameters)
We will find that we don't use these, because Semantic Actions are more convenient here.
我们会发现我们不使用这些,因为语义操作在这里更方便。
2. A Matching Grammar
Grammar() : Grammar::base_type(start) {
using namespace qi;
start = skip(space) [expression];
identifier = raw [ (alpha|'_') >> *(alnum|'_') ];
parameters = -(expression % ',');
expression
= literal
| identifier >> *(
('.' >> identifier)
| ('(' >> parameters >> ')')
);
literal = double_ | string_;
string_ = '"' >> *('\\' >> char_ | ~char_('"')) >> '"';
BOOST_SPIRIT_DEBUG_NODES(
(identifier)(start)(parameters)(expression)(literal)(string_)
);
}
In this skeleton most rules benefit from automatic attribute propagation. The one that doesn't is expression
:
在这个框架中,大多数规则都受益于自动属性传播。没有的是表达式:
qi::rule<It, Expression()> start;
using Skipper = qi::space_type;
qi::rule<It, Expression(), Skipper> expression, literal;
qi::rule<It, Parameters(), Skipper> parameters;
// lexemes
qi::rule<It, Identifier()> identifier;
qi::rule<It, std::string()> string_;
So, let's create some helpers for the semantic actions.
让我们为语义动作创建一些助手。
NOTE An important take-away here is to create your own higher-level building blocks instead of toiling away with
boost::phoenix::construct<>
etc.注意,这里重要的一点是创建您自己的高级构建块,而不是使用boost::phoenix::构造<>等。
Define two simple construction functions:
定义两个简单的构造函数:
struct mme_f { MemberExpression operator()(Expression lhs, Identifier rhs) const { return { lhs, rhs }; } };
struct mfc_f { FunctionCall operator()(Expression f, Parameters params) const { return { f, params }; } };
phx::function<mme_f> make_member_expression;
phx::function<mfc_f> make_function_call;
Then use them:
然后使用它们:
expression
= literal [_val=_1]
| identifier [_val=_1] >> *(
('.' >> identifier) [ _val = make_member_expression(_val, _1)]
| ('(' >> parameters >> ')') [ _val = make_function_call(_val, _1) ]
);
That's all. We're ready to roll!
这是所有。我们准备滚!
3. DEMO
住在Coliru
I created a test bed looking like this:
我设计了一个试验台,看起来是这样的:
int main() {
using It = std::string::const_iterator;
Parser::Grammar<It> const g;
for (std::string const input : {
"a()", "a(para)", "x.a()", "x.a(para)", "x.a(para).g(para).j()", "x.y", "x.y.z",
"x.y.z()",
"y.z.z(para)",
// now let's add some funkyness that you didn't mention
"bar(foo(a))",
"bar(b.foo(a))",
"foo(a)(b, c, d)", // first class functions
"sqrt(9)",
"println(\"hello world\")",
"allocate(strlen(\"aaaaa\"))",
"3.14",
"object.rotate(180)",
"object.rotate(event.getAngle(), \"torque\")",
"app.mainwindow().find_child(\"InputBox\").font().size(12)",
"app.mainwindow().find_child(\"InputBox\").font(config().preferences.baseFont(style.PROPORTIONAL))"
}) {
std::cout << " =========== '" << input << "' ========================\n";
It f(input.begin()), l(input.end());
Ast::Expression parsed;
bool ok = parse(f, l, g, parsed);
if (ok) {
std::cout << "Parsed: " << parsed << "\n";
}
else
std::cout << "Parse failed\n";
if (f != l)
std::cout << "Remaining unparsed input: '" << std::string(f, l) << "'\n";
}
}
Incredible as it may appear, this already parses all the test cases and prints:
令人难以置信的是,它已经解析了所有的测试用例和打印:
=========== 'a()' ========================
Parsed: a()
=========== 'a(para)' ========================
Parsed: a(para)
=========== 'x.a()' ========================
Parsed: x.a()
=========== 'x.a(para)' ========================
Parsed: x.a(para)
=========== 'x.a(para).g(para).j()' ========================
Parsed: x.a(para).g(para).j()
=========== 'x.y' ========================
Parsed: x.y
=========== 'x.y.z' ========================
Parsed: x.y.z
=========== 'x.y.z()' ========================
Parsed: x.y.z()
=========== 'y.z.z(para)' ========================
Parsed: y.z.z(para)
=========== 'bar(foo(a))' ========================
Parsed: bar(foo(a))
=========== 'bar(b.foo(a))' ========================
Parsed: bar(b.foo(a))
=========== 'foo(a)(b, c, d)' ========================
Parsed: foo(a)(b, c, d)
=========== 'sqrt(9)' ========================
Parsed: sqrt(9)
=========== 'println("hello world")' ========================
Parsed: println(hello world)
=========== 'allocate(strlen("aaaaa"))' ========================
Parsed: allocate(strlen(aaaaa))
=========== '3.14' ========================
Parsed: 3.14
=========== 'object.rotate(180)' ========================
Parsed: object.rotate(180)
=========== 'object.rotate(event.getAngle(), "torque")' ========================
Parsed: object.rotate(event.getAngle(), torque)
=========== 'app.mainwindow().find_child("InputBox").font().size(12)' ========================
Parsed: app.mainwindow().find_child(InputBox).font().size(12)
=========== 'app.mainwindow().find_child("InputBox").font(config().preferences.baseFont(style.PROPORTIONAL))' ========================
Parsed: app.mainwindow().find_child(InputBox).font(config().preferences.baseFont(style.PROPORTIONAL))
4. Too Good To Be True?
You're right. I cheated. I didn't show you this code required to debug print the parsed AST:
你是对的。我被骗了。我没有向您展示调试输出解析AST所需的代码:
namespace Ast {
static inline std::ostream& operator<<(std::ostream& os, MemberExpression const& me) {
return os << me.object << "." << me.member;
}
static inline std::ostream& operator<<(std::ostream& os, FunctionCall const& fc) {
os << fc.function << "(";
bool first = true;
for (auto& p : fc.parameters) { if (!first) os << ", "; first = false; os << p; }
return os << ")";
}
}
It's only debug printing, as string literals aren't correctly roundtripped. But it's only 10 lines of code, that's a bonus.
这只是调试打印,因为字符串文字没有被正确地往返。但是只有10行代码,这是额外的奖励。
5. The Full Monty: Source Locations
This had your interest, so let's show it working. Let's add a simple loop to print all locations of identifiers:
这是你的兴趣所在,让我们来展示一下。让我们添加一个简单的循环来打印标识符的所有位置:
using IOManip::showpos;
for (auto& id : all_identifiers(parsed)) {
std::cout << " - " << id << " at " << showpos(id, input) << "\n";
}
Of course, this begs the question, what are showpos
and all_identifiers
?
当然,这引出了一个问题,什么是showpos和all_identifier ?
namespace IOManip {
struct showpos_t {
boost::iterator_range<It> fragment;
std::string const& source;
friend std::ostream& operator<<(std::ostream& os, showpos_t const& manip) {
auto ofs = [&](It it) { return it - manip.source.begin(); };
return os << "[" << ofs(manip.fragment.begin()) << ".." << ofs(manip.fragment.end()) << ")";
}
};
showpos_t showpos(boost::iterator_range<It> fragment, std::string const& source) {
return {fragment, source};
}
}
As for the identifier extraction:
关于标识符提取:
std::vector<Identifier> all_identifiers(Expression const& expr) {
std::vector<Identifier> result;
struct Harvest {
using result_type = void;
std::back_insert_iterator<std::vector<Identifier> > out;
void operator()(Identifier const& id) { *out++ = id; }
void operator()(MemberExpression const& me) { apply_visitor(*this, me.object); *out++ = me.member; }
void operator()(FunctionCall const& fc) {
apply_visitor(*this, fc.function);
for (auto& p : fc.parameters) apply_visitor(*this, p);
}
// non-identifier expressions
void operator()(std::string const&) { }
void operator()(double) { }
} harvest { back_inserter(result) };
boost::apply_visitor(harvest, expr);
return result;
}
That's a tree visitor that harvests all identifiers recursively, inserting them into the back of a container.
树访问器递归地获取所有标识符,并将它们插入到容器的后面。
住在Coliru
Where output looks like (excerpt):
其中输出为(摘录):
=========== 'app.mainwindow().find_child("InputBox").font(config().preferences.baseFont(style.PROPORTIONAL))' ========================
Parsed: app.mainwindow().find_child(InputBox).font(config().preferences.baseFont(style.PROPORTIONAL))
- app at [0..3)
- mainwindow at [4..14)
- find_child at [17..27)
- font at [40..44)
- config at [45..51)
- preferences at [54..65)
- baseFont at [66..74)
- style at [75..80)
- PROPORTIONAL at [81..93)
#2
-1
Try changing
试着改变
>> *(lit('.') >> name_pure >> lit('(') > paralistopt > lit(')'))
to
来
>> *(*(lit('.') >> name_pure) >> lit('(') > paralistopt > lit(')'))