I've been looking through the ANTLR v3 documentation (and my trusty copy of "The Definitive ANTLR reference"), and I can't seem to find a clean way to implement escape sequences in string literals (I'm currently using the Java target). I had hoped to be able to do something like:
我一直在查看ANTLR v3文档(以及我对“The Definitive ANTLR reference”的可靠副本),我似乎无法找到一种在字符串文字中实现转义序列的简洁方法(我目前正在使用Java目标)。我希望能够做到这样的事情:
fragment
ESCAPE_SEQUENCE
: '\\' '\'' { setText("'"); }
;
STRING
: '\'' (ESCAPE_SEQUENCE | ~('\'' | '\\'))* '\''
{
// strip the quotes from the resulting token
setText(getText().substring(1, getText().length() - 1));
}
;
For example, I would want the input token "'Foo\'s House'
" to become the String "Foo's House
".
例如,我希望输入标记“'Foo \'s House'”成为String“Foo's House”。
Unfortunately, the setText(...)
call in the ESCAPE_SEQUENCE
fragment sets the text for the entire STRING
token, which is obviously not what I want.
不幸的是,ESCAPE_SEQUENCE片段中的setText(...)调用设置了整个STRING令牌的文本,这显然不是我想要的。
Is there a way to implement this grammar without adding a method to go back through the resulting string and manually replace escape sequences (e.g., with something like setText(escapeString(getText()))
in the STRING
rule)?
有没有办法实现这个语法而不添加一个方法来返回结果字符串并在STRING规则中手动替换转义序列(例如,使用类似setText(escapeString(getText()))?
4 个解决方案
#1
14
Here is how I accomplished this in the JSON parser I wrote.
以下是我在我编写的JSON解析器中完成此操作的方法。
STRING
@init{StringBuilder lBuf = new StringBuilder();}
:
'"'
( escaped=ESC {lBuf.append(getText());} |
normal=~('"'|'\\'|'\n'|'\r') {lBuf.appendCodePoint(normal);} )*
'"'
{setText(lBuf.toString());}
;
fragment
ESC
: '\\'
( 'n' {setText("\n");}
| 'r' {setText("\r");}
| 't' {setText("\t");}
| 'b' {setText("\b");}
| 'f' {setText("\f");}
| '"' {setText("\"");}
| '\'' {setText("\'");}
| '/' {setText("/");}
| '\\' {setText("\\");}
| ('u')+ i=HEX_DIGIT j=HEX_DIGIT k=HEX_DIGIT l=HEX_DIGIT
{setText(ParserUtil.hexToChar(i.getText(),j.getText(),
k.getText(),l.getText()));}
)
;
#2
4
Another (possibly more efficient) alternative is to use rule arguments:
另一种(可能更有效)的替代方法是使用规则参数:
STRING
@init { final StringBuilder buf = new StringBuilder(); }
:
'"'
(
ESCAPE[buf]
| i = ~( '\\' | '"' ) { buf.appendCodePoint(i); }
)*
'"'
{ setText(buf.toString()); };
fragment ESCAPE[StringBuilder buf] :
'\\'
( 't' { buf.append('\t'); }
| 'n' { buf.append('\n'); }
| 'r' { buf.append('\r'); }
| '"' { buf.append('\"'); }
| '\\' { buf.append('\\'); }
| 'u' a = HEX_DIGIT b = HEX_DIGIT c = HEX_DIGIT d = HEX_DIGIT { buf.append(ParserUtil.hexChar(a, b, c, d)); }
);
#3
4
For ANTLR4, Java target and standard escaped string grammar, I used a dedicated singleton class : CharSupport to translate string. It is available in antlr API :
对于ANTLR4,Java目标和标准转义字符串语法,我使用了一个专用的单例类:CharSupport来翻译字符串。它在antlr API中可用:
STRING : '"'
( ESC
| ~('"'|'\\'|'\n'|'\r')
)*
'"' {
setText(
org.antlr.v4.misc.CharSupport.getStringFromGrammarStringLiteral(
getText()
)
);
}
;
As I saw in V4 documentation and by experiments, @init is no longer supported in lexer part!
正如我在V4文档和实验中看到的那样,lexer部分不再支持@init!
#4
3
I needed to do just that, but my target was C and not Java. Here's how I did it based on answer #1 (and comment), in case anyone needs something alike:
我需要这样做,但我的目标是C而不是Java。以下是我根据答案#1(和评论)做的事情,以防任何人需要类似的东西:
QUOTE : '\'';
STR
@init{ pANTLR3_STRING unesc = GETTEXT()->factory->newRaw(GETTEXT()->factory); }
: QUOTE ( reg = ~('\\' | '\'') { unesc->addc(unesc, reg); }
| esc = ESCAPED { unesc->appendS(unesc, GETTEXT()); } )+ QUOTE { SETTEXT(unesc); };
fragment
ESCAPED : '\\'
( '\\' { SETTEXT(GETTEXT()->factory->newStr8(GETTEXT()->factory, (pANTLR3_UINT8)"\\")); }
| '\'' { SETTEXT(GETTEXT()->factory->newStr8(GETTEXT()->factory, (pANTLR3_UINT8)"\'")); }
)
;
HTH.
HTH。
#1
14
Here is how I accomplished this in the JSON parser I wrote.
以下是我在我编写的JSON解析器中完成此操作的方法。
STRING
@init{StringBuilder lBuf = new StringBuilder();}
:
'"'
( escaped=ESC {lBuf.append(getText());} |
normal=~('"'|'\\'|'\n'|'\r') {lBuf.appendCodePoint(normal);} )*
'"'
{setText(lBuf.toString());}
;
fragment
ESC
: '\\'
( 'n' {setText("\n");}
| 'r' {setText("\r");}
| 't' {setText("\t");}
| 'b' {setText("\b");}
| 'f' {setText("\f");}
| '"' {setText("\"");}
| '\'' {setText("\'");}
| '/' {setText("/");}
| '\\' {setText("\\");}
| ('u')+ i=HEX_DIGIT j=HEX_DIGIT k=HEX_DIGIT l=HEX_DIGIT
{setText(ParserUtil.hexToChar(i.getText(),j.getText(),
k.getText(),l.getText()));}
)
;
#2
4
Another (possibly more efficient) alternative is to use rule arguments:
另一种(可能更有效)的替代方法是使用规则参数:
STRING
@init { final StringBuilder buf = new StringBuilder(); }
:
'"'
(
ESCAPE[buf]
| i = ~( '\\' | '"' ) { buf.appendCodePoint(i); }
)*
'"'
{ setText(buf.toString()); };
fragment ESCAPE[StringBuilder buf] :
'\\'
( 't' { buf.append('\t'); }
| 'n' { buf.append('\n'); }
| 'r' { buf.append('\r'); }
| '"' { buf.append('\"'); }
| '\\' { buf.append('\\'); }
| 'u' a = HEX_DIGIT b = HEX_DIGIT c = HEX_DIGIT d = HEX_DIGIT { buf.append(ParserUtil.hexChar(a, b, c, d)); }
);
#3
4
For ANTLR4, Java target and standard escaped string grammar, I used a dedicated singleton class : CharSupport to translate string. It is available in antlr API :
对于ANTLR4,Java目标和标准转义字符串语法,我使用了一个专用的单例类:CharSupport来翻译字符串。它在antlr API中可用:
STRING : '"'
( ESC
| ~('"'|'\\'|'\n'|'\r')
)*
'"' {
setText(
org.antlr.v4.misc.CharSupport.getStringFromGrammarStringLiteral(
getText()
)
);
}
;
As I saw in V4 documentation and by experiments, @init is no longer supported in lexer part!
正如我在V4文档和实验中看到的那样,lexer部分不再支持@init!
#4
3
I needed to do just that, but my target was C and not Java. Here's how I did it based on answer #1 (and comment), in case anyone needs something alike:
我需要这样做,但我的目标是C而不是Java。以下是我根据答案#1(和评论)做的事情,以防任何人需要类似的东西:
QUOTE : '\'';
STR
@init{ pANTLR3_STRING unesc = GETTEXT()->factory->newRaw(GETTEXT()->factory); }
: QUOTE ( reg = ~('\\' | '\'') { unesc->addc(unesc, reg); }
| esc = ESCAPED { unesc->appendS(unesc, GETTEXT()); } )+ QUOTE { SETTEXT(unesc); };
fragment
ESCAPED : '\\'
( '\\' { SETTEXT(GETTEXT()->factory->newStr8(GETTEXT()->factory, (pANTLR3_UINT8)"\\")); }
| '\'' { SETTEXT(GETTEXT()->factory->newStr8(GETTEXT()->factory, (pANTLR3_UINT8)"\'")); }
)
;
HTH.
HTH。