I have 9 different grammars. One of these will be loaded depending on what the first line of txt is on the file it is parsing.
我有9种不同的语法。其中一个将被加载,具体取决于它正在解析的文件的第一行txt。
I was thinking about deriving the lexer/parser spawning into sep. classes and then instantiating them as soon as I get a match -- not sure whether that would slow me down or not though. I guess some benchmarking is in order.
我正在考虑将词法分析器/解析器派生到sep中。类,然后在我得到匹配时立即实例化它们 - 不确定这是否会减慢我的速度但不会。我想一些基准测试是有序的。
Really, speed is definitely my goal here but I know this is ugly code.
真的,速度绝对是我的目标,但我知道这是丑陋的代码。
Right now the code looks something like this:
现在代码看起来像这样:
sin.mark(0)
site = findsite(txt)
sin.reset()
if ( site == "site1") {
loadlexer1;
loadparser1;
} else if (site == "site2") {
loadlexer2;
loadparser2;
}
.................
} else if (site == "site8") {
loadparser8;
loadparser8;
}
findsite(txt) {
...................
if line.indexOf("site1-identifier") {
site = site1;
} else if(line.indexOf("site2-identifier") {
site = site2;
} else if(line.indexOf("site3-identifier") {
site = site3;
}
.........................
} else if(line.indexOf("site8-identifier") {
site = site8;
}
}
some clarifications
1) yes, I truly have 9 different grammars I built with antlr so they will ALL have their own lexer/parser objs.
1)是的,我确实有9个不同的语法,我用antlr构建,所以他们将拥有自己的词法分析器/解析器objs。
2) yes, as of right now we are comparing strings and obivously that'll be replaced with some sort of integer map. I've also considered sticking the site identifiers into one regex, however I don't believe that will speed anything up.
2)是的,截至目前我们正在比较字符串,并且显然将用某种整数映射替换。我也考虑过将网站标识符粘贴到一个正则表达式中,但是我不认为这会加快任何速度。
3) yes, this is pseudocode so I wouldn't get too picky on the semantics here..
3)是的,这是伪代码所以我不会对这里的语义过于挑剔..
4) kdgregory is correct in noting that I am unable to create one instance of the lexer/parser pair
4)kdgregory注意到我无法创建lexer / parser对的一个实例是正确的
I like the hash idea to make the code a little bit better looking, however I don't think it's going to speed me up any.
我喜欢哈希的想法,让代码看起来更好看,但我不认为它会加速我的速度。
11 个解决方案
#1
The standard approach is to use a Map to connect the key strings to the lexers that will handle them:
标准方法是使用Map将键字符串连接到将处理它们的词法分析器:
Map<String,Lexer> lexerMap = new HashMap<String,Lexer>();
lexerMap.put("source1", new Lexer01());
lexerMap.put("source2", new Lexer02());
// and so on
Once you've retrieve the string that identifies the lexer to use, you'd retrieve it from the Map like so:
一旦你检索到标识要使用的词法分析器的字符串,你就可以从地图中检索它,如下所示:
String grammarId = // read it from a file, whatever
Lexer myLexer = lexerMap.get(grammarId);
Your example code has a few quirks, however. First, the indexOf() calls indicate that you don't have a stand-alone string, and Map won't look inside the string. So you need to have some way to extract the actual key from whatever string you read.
但是,您的示例代码有一些怪癖。首先,indexOf()调用表明您没有独立字符串,并且Map不会查看字符串内部。因此,您需要有一些方法从您阅读的任何字符串中提取实际键。
Second, lexers and parsers usually maintain state, so you won't be able to create a single instance and reuse it. That indicates that you need to create a factory class, and store it in the map (this is the Abstract Factory pattern).
其次,词法分析器和解析器通常保持状态,因此您将无法创建单个实例并重用它。这表明您需要创建一个工厂类,并将其存储在地图中(这是抽象工厂模式)。
If you expect to have lots of different lexers/parsers, then it makes sense to use a map-driven approach. For a small number, an if-else chain is probably your best bet, properly encapsulated (this is the Factory Method pattern).
如果你希望有很多不同的词法分析器/解析器,那么使用地图驱动的方法是有意义的。对于一个小数字,if-else链可能是你最好的选择,正确封装(这是工厂方法模式)。
#2
Using polymorphism is almost guaranteed to be faster than string manipulation, and will be checked for correctness at compile time. Is site
really a String? If so, FindSite should be called GetSiteName. I would expect FindSite to return a Site
object that knows the appropriate lexer and parser.
使用多态几乎可以保证比字符串操作更快,并且将在编译时检查其是否正确。网站真的是一个字符串吗?如果是这样,FindSite应该被称为GetSiteName。我希望FindSite返回一个知道相应词法分析器和解析器的Site对象。
Another speed issue is speed of coding. It would definitely be better to have your different lexers and parsers in individual classes (perhaps with shared functionality in another). It'll make your code slightly smaller, and it will be significantly easier for someone to understand.
另一个速度问题是编码速度。在单个类中使用不同的词法分析器和解析器肯定会更好(可能在另一个类中具有共享功能)。它会让你的代码稍微小一些,而且对于某人来说,理解起来要容易得多。
#3
Something like:
Map<String,LexerParserTuple> lptmap = new HashMap<String,LexerParserTuple>(); lpt=lptmap.get(site) lpt.loadlexer() lpt.loadparser()
combined with some regex magic rather than string.indexOf() to grab the names of the sites should dramatically clean up your code.
结合一些正则表达式魔法而不是string.indexOf()来抓取网站的名称应该大大清理你的代码。
#4
Replace Conditional With Polymorphism
用多态替换条件
For a half-measure, for findsite(), you could simply set up a HashMap to get you from site identifier to site. An alternative cleanup would be simply to return the site string, thus:
对于findsite()的半个小节,您可以简单地设置HashMap以使您从站点标识符到站点。另一种清理方法是返回站点字符串,因此:
String findsite(txt) {
...................
if line.indexOf("site1-identifier")
return site1;
if(line.indexOf("site2-identifier")
return site2;
if(line.indexOf("site3-identifier")
return site3;
...
}
Using indexOf() in this way isn't really expressive; I'd use equals() or contains().
以这种方式使用indexOf()并不具有表现力;我使用equals()或contains()。
#5
Suppose your code is inefficient.
假设您的代码效率低下。
Will it take more time than (say) 1% of the time to actually parse the input?
是否需要花费更多时间(例如)1%的时间来实际解析输入?
If not, you've got bigger "fish to fry".
如果没有,你就会有更大的“炸鱼”。
#6
I was thinking about deriving the lexer/parser spawning into sep. classes and then instantiating them as soon as I get a match
我正在考虑将词法分析器/解析器派生到sep中。类,然后在我得到匹配后立即实例化它们
It looks like you have the answer already. That would create code that is more flexible, but not necessary faster.
看起来你已经有了答案。这将创建更灵活的代码,但不是更快。
I guess some benchmarking is in order
我想一些基准测试是有序的
Yes, measure with both approaches and take an informed decision. My guess is the way you have it already would be enough.
是的,用两种方法衡量并做出明智的决定。我的猜测就是你的方式已经足够了。
Perhaps, if what's bothers you is to have a "kilometric" method you could refactor it in different functions with extract method.
也许,如果困扰你的是拥有“千米”方法,你可以使用提取方法在不同的函数中重构它。
The most important thing is to have first a solution that does the job even though it is slow, and once you have it working, profile it and detect points where the performance could be improved. Remember the "Rules of optimization"
最重要的是首先要有一个能够完成工作的解决方案,即使它很慢,一旦你有了工作,就可以对其进行分析并检测可以提高性能的点。记住“优化规则”
#7
i would change the type of findsite to return a site type (super class) and then leverage the polymorphism... That should be faster than string manipulation...
我会改变findsite的类型以返回一个站点类型(超类),然后利用多态...这应该比字符串操作更快...
Do you need separate lexers ?
你需要单独的词法分析器吗?
#8
Use a Map to configure a site to loadstrategy structure. Then a simple lookup is required based on 'site' and you execute the appropriate strategy. Same can be done for findSite().
使用Map将站点配置为loadstrategy结构。然后根据“站点”进行简单查找,然后执行适当的策略。对于findSite()也可以这样做。
#9
Could have a map of idenifiers vs sites, then just iterate over the map entries.
可以有一个标识符与站点的映射,然后迭代映射条目。
// define this as a static somewhere ... build from a properties file
Map<String,String> m = new HashMap<String,String>(){{
put("site1-identifier","site2");
put("site2-identifier","site2");
}}
// in your method
for(Map.Entry<String,String> entry : m.entries()){
if( line.contains(entry.getKey())){
return line.getValue();
}
}
cleaner: yes faster: dunno...should be fast enough
清洁:是的更快:不知道......应该足够快
#10
You could use reflection possibly
你可以使用反射
char site = line.charAt(4);
Method lexerMethod = this.getClass().getMethod( "loadLexer" + site, *parameters types here*)
Method parserMethod = this.getClass().getMethod( "loadparser" + site, *parameters types here*)
lexerMethod.invoke(this, *parameters here*);
parserMethod.invoke(this, *parameters here*);
#11
I don't know about Java but some language allow switch to take strings.
我不知道Java,但有些语言允许切换到字符串。
switch(site)
{
case "site1": loadlexer1; loadparser1; break;
case "site2": loadlexer2; loadparser2; break;
...
}
As for the seconds bit, use a regex to extract the identifier and switch on that. You might be better off using an enum
.
至于秒位,使用正则表达式提取标识符并打开它。使用枚举可能会更好。
#1
The standard approach is to use a Map to connect the key strings to the lexers that will handle them:
标准方法是使用Map将键字符串连接到将处理它们的词法分析器:
Map<String,Lexer> lexerMap = new HashMap<String,Lexer>();
lexerMap.put("source1", new Lexer01());
lexerMap.put("source2", new Lexer02());
// and so on
Once you've retrieve the string that identifies the lexer to use, you'd retrieve it from the Map like so:
一旦你检索到标识要使用的词法分析器的字符串,你就可以从地图中检索它,如下所示:
String grammarId = // read it from a file, whatever
Lexer myLexer = lexerMap.get(grammarId);
Your example code has a few quirks, however. First, the indexOf() calls indicate that you don't have a stand-alone string, and Map won't look inside the string. So you need to have some way to extract the actual key from whatever string you read.
但是,您的示例代码有一些怪癖。首先,indexOf()调用表明您没有独立字符串,并且Map不会查看字符串内部。因此,您需要有一些方法从您阅读的任何字符串中提取实际键。
Second, lexers and parsers usually maintain state, so you won't be able to create a single instance and reuse it. That indicates that you need to create a factory class, and store it in the map (this is the Abstract Factory pattern).
其次,词法分析器和解析器通常保持状态,因此您将无法创建单个实例并重用它。这表明您需要创建一个工厂类,并将其存储在地图中(这是抽象工厂模式)。
If you expect to have lots of different lexers/parsers, then it makes sense to use a map-driven approach. For a small number, an if-else chain is probably your best bet, properly encapsulated (this is the Factory Method pattern).
如果你希望有很多不同的词法分析器/解析器,那么使用地图驱动的方法是有意义的。对于一个小数字,if-else链可能是你最好的选择,正确封装(这是工厂方法模式)。
#2
Using polymorphism is almost guaranteed to be faster than string manipulation, and will be checked for correctness at compile time. Is site
really a String? If so, FindSite should be called GetSiteName. I would expect FindSite to return a Site
object that knows the appropriate lexer and parser.
使用多态几乎可以保证比字符串操作更快,并且将在编译时检查其是否正确。网站真的是一个字符串吗?如果是这样,FindSite应该被称为GetSiteName。我希望FindSite返回一个知道相应词法分析器和解析器的Site对象。
Another speed issue is speed of coding. It would definitely be better to have your different lexers and parsers in individual classes (perhaps with shared functionality in another). It'll make your code slightly smaller, and it will be significantly easier for someone to understand.
另一个速度问题是编码速度。在单个类中使用不同的词法分析器和解析器肯定会更好(可能在另一个类中具有共享功能)。它会让你的代码稍微小一些,而且对于某人来说,理解起来要容易得多。
#3
Something like:
Map<String,LexerParserTuple> lptmap = new HashMap<String,LexerParserTuple>(); lpt=lptmap.get(site) lpt.loadlexer() lpt.loadparser()
combined with some regex magic rather than string.indexOf() to grab the names of the sites should dramatically clean up your code.
结合一些正则表达式魔法而不是string.indexOf()来抓取网站的名称应该大大清理你的代码。
#4
Replace Conditional With Polymorphism
用多态替换条件
For a half-measure, for findsite(), you could simply set up a HashMap to get you from site identifier to site. An alternative cleanup would be simply to return the site string, thus:
对于findsite()的半个小节,您可以简单地设置HashMap以使您从站点标识符到站点。另一种清理方法是返回站点字符串,因此:
String findsite(txt) {
...................
if line.indexOf("site1-identifier")
return site1;
if(line.indexOf("site2-identifier")
return site2;
if(line.indexOf("site3-identifier")
return site3;
...
}
Using indexOf() in this way isn't really expressive; I'd use equals() or contains().
以这种方式使用indexOf()并不具有表现力;我使用equals()或contains()。
#5
Suppose your code is inefficient.
假设您的代码效率低下。
Will it take more time than (say) 1% of the time to actually parse the input?
是否需要花费更多时间(例如)1%的时间来实际解析输入?
If not, you've got bigger "fish to fry".
如果没有,你就会有更大的“炸鱼”。
#6
I was thinking about deriving the lexer/parser spawning into sep. classes and then instantiating them as soon as I get a match
我正在考虑将词法分析器/解析器派生到sep中。类,然后在我得到匹配后立即实例化它们
It looks like you have the answer already. That would create code that is more flexible, but not necessary faster.
看起来你已经有了答案。这将创建更灵活的代码,但不是更快。
I guess some benchmarking is in order
我想一些基准测试是有序的
Yes, measure with both approaches and take an informed decision. My guess is the way you have it already would be enough.
是的,用两种方法衡量并做出明智的决定。我的猜测就是你的方式已经足够了。
Perhaps, if what's bothers you is to have a "kilometric" method you could refactor it in different functions with extract method.
也许,如果困扰你的是拥有“千米”方法,你可以使用提取方法在不同的函数中重构它。
The most important thing is to have first a solution that does the job even though it is slow, and once you have it working, profile it and detect points where the performance could be improved. Remember the "Rules of optimization"
最重要的是首先要有一个能够完成工作的解决方案,即使它很慢,一旦你有了工作,就可以对其进行分析并检测可以提高性能的点。记住“优化规则”
#7
i would change the type of findsite to return a site type (super class) and then leverage the polymorphism... That should be faster than string manipulation...
我会改变findsite的类型以返回一个站点类型(超类),然后利用多态...这应该比字符串操作更快...
Do you need separate lexers ?
你需要单独的词法分析器吗?
#8
Use a Map to configure a site to loadstrategy structure. Then a simple lookup is required based on 'site' and you execute the appropriate strategy. Same can be done for findSite().
使用Map将站点配置为loadstrategy结构。然后根据“站点”进行简单查找,然后执行适当的策略。对于findSite()也可以这样做。
#9
Could have a map of idenifiers vs sites, then just iterate over the map entries.
可以有一个标识符与站点的映射,然后迭代映射条目。
// define this as a static somewhere ... build from a properties file
Map<String,String> m = new HashMap<String,String>(){{
put("site1-identifier","site2");
put("site2-identifier","site2");
}}
// in your method
for(Map.Entry<String,String> entry : m.entries()){
if( line.contains(entry.getKey())){
return line.getValue();
}
}
cleaner: yes faster: dunno...should be fast enough
清洁:是的更快:不知道......应该足够快
#10
You could use reflection possibly
你可以使用反射
char site = line.charAt(4);
Method lexerMethod = this.getClass().getMethod( "loadLexer" + site, *parameters types here*)
Method parserMethod = this.getClass().getMethod( "loadparser" + site, *parameters types here*)
lexerMethod.invoke(this, *parameters here*);
parserMethod.invoke(this, *parameters here*);
#11
I don't know about Java but some language allow switch to take strings.
我不知道Java,但有些语言允许切换到字符串。
switch(site)
{
case "site1": loadlexer1; loadparser1; break;
case "site2": loadlexer2; loadparser2; break;
...
}
As for the seconds bit, use a regex to extract the identifier and switch on that. You might be better off using an enum
.
至于秒位,使用正则表达式提取标识符并打开它。使用枚举可能会更好。