我使用Python包为玩具DSL提供了一个PEG语法:
from arpeggio.cleanpeg import ParserPEG
grammar = """
root = block* EOF
block = header (item1+ / item2+)
header = "block"
item1 = number name comment?
item2 = number name list comment?
number = r"\d+"
name = r"\w+"
list = r"\[.*\]"
comment = r"\/\/.*"
"""
doc = """
block
5 alpha [] //
3 beta [a, b, c] // this is an item2
block
6 foo
1 bar // This is an item1
4 baz // more stuff
"""
parser = ParserPEG(grammar, 'root', debug=True)
parse_tree = parser.parse(doc)
print ('Tree:', parse_tree)这在解析测试文档时给出了奇怪的结果:它在有序的选择中没有正确地匹配item1,但是错误地声称(标记为xxxx的行)它确实匹配了选择,而没有测试item2,这本来是匹配的。
>> Matching rule root=Sequence at position 0 => * block 5
>> Matching rule ZeroOrMore in root at position 0 => * block 5
>> Matching rule block=Sequence in root at position 0 => * block 5
?? Try match rule header=StrMatch(block) in block at position 1 => *block 5
++ Match 'block' at 1 => ' *block* 5 '
>> Matching rule OrderedChoice in block at position 6 => block* 5 alpha
>> Matching rule OneOrMore in block at position 6 => block* 5 alpha
>> Matching rule item1=Sequence in block at position 6 => block* 5 alpha
?? Try match rule number=RegExMatch(\d+) in item1 at position 9 => block *5 alpha []
++ Match '5' at 9 => ' block *5* alpha []'
?? Try match rule name=RegExMatch(\w+) in item1 at position 11 => block 5 *alpha []
++ Match 'alpha' at 11 => 'block 5 *alpha* [] '
>> Matching rule Optional in item1 at position 16 => 5 alpha* []
?? Try match rule comment=RegExMatch(\/\/.*) in item1 at position 17 => 5 alpha *[]
-- NoMatch at 17
<<- Not matched rule Optional in item1 at position 16 => 5 alpha* []
<<+ Matched rule item1=Sequence in item1 at position 16 => 5 alpha* []
>> Matching rule item1=Sequence in block at position 16 => 5 alpha* []
?? Try match rule number=RegExMatch(\d+) in item1 at position 17 => 5 alpha *[]
-- NoMatch at 17
<<- Not matched rule item1=Sequence in item1 at position 16 => 5 alpha* []
xxxx--> <<+ Matched rule OneOrMore in block at position 16 => 5 alpha* []
<<+ Matched rule OrderedChoice in block at position 16 => 5 alpha* []
<<+ Matched rule block=Sequence in block at position 16 => 5 alpha* []
>> Matching rule block=Sequence in root at position 16 => 5 alpha* []
?? Try match rule header=StrMatch(block) in block at position 17 => 5 alpha *[]
-- No match 'block' at 17 => ' 5 alpha *[] * '
<<- Not matched rule block=Sequence in block at position 16 => 5 alpha* []
<<+ Matched rule ZeroOrMore in root at position 16 => 5 alpha* []
?? Try match rule EOF in root at position 17 => 5 alpha *[]
!! EOF not matched.
<<- Not matched rule root=Sequence in root at position 0 => * block 5结果,解析器无法使用如果它在item2失败之后实际匹配item2就会使用的item2。
,这是解析器包中的一个bug,还是我的语法中的一个错误?
请注意,颠倒顺序选择:
block = header (item2+ / item1+)正确地分析示例文档。但是,在玩具问题上很容易解决的异常结果在实际语法中可能要难得多。一个项目无疑是一个item1或item2,所以检查它们的顺序应该是无关的,解析它们的代码应该一致地工作。
发布于 2020-06-23 10:39:48
在钉住中,表达式的顺序是OrderedChoice是很重要的。当解析器尝试item1+时,只要匹配至少一个item1就足够了,然后整个有序的选择就被认为是成功的。
一般来说,总是把更多的具体匹配在开始和更一般的结尾,有序的选择。
更新:维基百科的歧义检测及规则顺序对匹配语言的影响部分有一个很好的解释。
https://stackoverflow.com/questions/62525111
复制相似问题