I'm working on a script for work to extract data from an old template engine schema:
我正在编写一个脚本,用于从旧的模板引擎模式中提取数据:
[%price%]
{
$54.99
}
[%/price%]
[%model%]
{
WRT54G
}
[%/model%]
[%brand%]{
LINKSYS
}
[%/brand%]
everything within the [% %] is the key, and everything in the { } is the value. Using Python and regex, I was able to get this far: (?<=[%)(?P\w*?)(?=\%])
[%%]中的所有内容都是键,{}中的所有内容都是值。使用Python和正则表达式,我能够做到这一点:(?<= [%](?P \ w *?)(?= \%])
which returns ['price', 'model', 'brand']
返回['price','model','brand']
I'm just having a problem getting it match the bracket data as a value
我只是遇到一个问题,它将括号数据与值匹配
3 个解决方案
#1
just for grins:
只为了笑容:
import re
RE_kv = re.compile("\[%(.*)%\].*?\n?\s*{\s*(.*)")
matches = re.findall(RE_kv, test, re.M)
for k, v in matches:
print k, v
output:
price $54.99
model WRT54G
brand LINKSYS
Note I did just enough regex to get the matches to show up, it's not even bounded at the end for the close brace. Use at your own risk.
注意我做了足够的正则表达式以使比赛显示出来,它在结束时甚至没有限制。使用风险由您自己承担。
#2
I agree with Devin that a single regex isn't the best solution. If there do happen to be any strange cases that aren't handled by your regex, there's a real risk that you won't find out.
我同意德文的观点,即单一正则表达式不是最好的解决方案。如果确实发生了任何未被正则表达式处理的奇怪案例,那么您将无法找到真正的风险。
I'd suggest using a finite state machine approach. Parse the file line by line, first looking for a price-model-brand block, then parse whatever is within the braces. Also, make sure to note if any blocks aren't opened or closed correctly as these are probably malformed.
我建议使用有限状态机方法。逐行解析文件,首先查找价格模型品牌块,然后解析大括号内的任何内容。此外,请务必注意是否有任何块未正确打开或关闭,因为这些块可能格式不正确。
You should be able to write something like this in python in about 30-40 lines of code.
您应该能够在python中编写类似于30-40行代码的内容。
#3
It looks like it'd be easier to do with re.Scanner
(sadly undocumented) than with a single regular expression.
看起来使用re.Scanner(可悲的是未记录的)比使用单个正则表达式更容易。
#1
just for grins:
只为了笑容:
import re
RE_kv = re.compile("\[%(.*)%\].*?\n?\s*{\s*(.*)")
matches = re.findall(RE_kv, test, re.M)
for k, v in matches:
print k, v
output:
price $54.99
model WRT54G
brand LINKSYS
Note I did just enough regex to get the matches to show up, it's not even bounded at the end for the close brace. Use at your own risk.
注意我做了足够的正则表达式以使比赛显示出来,它在结束时甚至没有限制。使用风险由您自己承担。
#2
I agree with Devin that a single regex isn't the best solution. If there do happen to be any strange cases that aren't handled by your regex, there's a real risk that you won't find out.
我同意德文的观点,即单一正则表达式不是最好的解决方案。如果确实发生了任何未被正则表达式处理的奇怪案例,那么您将无法找到真正的风险。
I'd suggest using a finite state machine approach. Parse the file line by line, first looking for a price-model-brand block, then parse whatever is within the braces. Also, make sure to note if any blocks aren't opened or closed correctly as these are probably malformed.
我建议使用有限状态机方法。逐行解析文件,首先查找价格模型品牌块,然后解析大括号内的任何内容。此外,请务必注意是否有任何块未正确打开或关闭,因为这些块可能格式不正确。
You should be able to write something like this in python in about 30-40 lines of code.
您应该能够在python中编写类似于30-40行代码的内容。
#3
It looks like it'd be easier to do with re.Scanner
(sadly undocumented) than with a single regular expression.
看起来使用re.Scanner(可悲的是未记录的)比使用单个正则表达式更容易。