I'm trying to use the Earley parser in NLTK to parse sentences such as:
我正在尝试使用NLTK中的Earley解析器来解析句子,例如:
If date is before 12/21/2010 then serial = 10
如果日期在12/21/2010之前,则序列= 10
To do this, I'm trying to write a CFG but the problem is I would need to have a general format of dates and integers as terminals, instead of the specific values. Is there any ways to specify the right hand side of a production rule as a regular expression, which would allow this kind of processing?
为此,我正在尝试编写CFG,但问题是我需要将日期和整数的一般格式作为终端,而不是特定值。有没有办法将生产规则的右侧指定为正则表达式,这将允许这种处理?
Something like:
就像是:
S -> '[0-9]+'
which would handle all integers.
它将处理所有整数。
1 个解决方案
#1
2
For this to work, you'll need to tokenize the date so that each digit and slash is a separate token.
为此,您需要对日期进行标记,以便每个数字和斜杠都是一个单独的标记。
from nltk.parse.earleychart import EarleyChartParser
import nltk
grammar = nltk.parse_cfg("""
DATE -> MONTH SEP DAY SEP YEAR
SEP -> "/"
MONTH -> DIGIT | DIGIT DIGIT
DAY -> DIGIT | DIGIT DIGIT
YEAR -> DIGIT DIGIT DIGIT DIGIT
DIGIT -> '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | '0'
""")
parser = EarleyChartParser(grammar)
print parser.parse(["1", "/", "1", "0", "/", "1", "9", "8", "7"])
The output is:
输出是:
(DATE
(MONTH (DIGIT 1))
(SEP /)
(DAY (DIGIT 1) (DIGIT 0))
(SEP /)
(YEAR (DIGIT 1) (DIGIT 9) (DIGIT 8) (DIGIT 7)))
This also affords some flexibility in the form of allowing dates and months to be single-digit.
这也提供了一些灵活性,允许日期和月份为单位数。
#1
2
For this to work, you'll need to tokenize the date so that each digit and slash is a separate token.
为此,您需要对日期进行标记,以便每个数字和斜杠都是一个单独的标记。
from nltk.parse.earleychart import EarleyChartParser
import nltk
grammar = nltk.parse_cfg("""
DATE -> MONTH SEP DAY SEP YEAR
SEP -> "/"
MONTH -> DIGIT | DIGIT DIGIT
DAY -> DIGIT | DIGIT DIGIT
YEAR -> DIGIT DIGIT DIGIT DIGIT
DIGIT -> '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | '0'
""")
parser = EarleyChartParser(grammar)
print parser.parse(["1", "/", "1", "0", "/", "1", "9", "8", "7"])
The output is:
输出是:
(DATE
(MONTH (DIGIT 1))
(SEP /)
(DAY (DIGIT 1) (DIGIT 0))
(SEP /)
(YEAR (DIGIT 1) (DIGIT 9) (DIGIT 8) (DIGIT 7)))
This also affords some flexibility in the form of allowing dates and months to be single-digit.
这也提供了一些灵活性,允许日期和月份为单位数。