I'm re-writing GraphViz2::Marpa (the new version will be V 2.00), and have a BNF (Marpa::R2-style grammar) which allows Marpa to trigger 2 events simultaneously as it parses a single lexeme.
For more on these BNFs, see the docs for Marpa's DSL.
Here's one way to handle such a problem.
The input stream is in the DOT language.
In GraphViz2::Marpa, sometimes an identifier needs to be classified as an 'attribute name' or a 'node name'.
Typical syntax, in pseudo-Perl, is:
Here, $attr_name is an attribute and $attr_value is its value.
Here, $node_name is a node with the given attributes.
So the 1st non-whitespace char after $attr_name/$node_name, here '=' or '[', differentiates between the 2 cases.
:lexeme ~ attribute_name pause => before event => attribute_name attribute_name ~ string_char_set+ :lexeme ~ node_name pause => before event => node_name node_name ~ string_char_set+ escaped_char ~ '\' [[:print:]] string_char_set ~ escaped_char | [^;\s\[\]\{\}] # Neither a separator [;] nor a terminator [\s\[\]\{\}].
This just runs the code to trigger the events. Having used pauses in the grammar, we call Marpa's read() method in a loop.
for ( my $pos = $self -> recce -> read(\$string); $pos < $length; $pos = $self -> recce -> resume($pos) ) { ($start, $span) = $self -> recce -> pause_span; $event_name = $self -> _validate_event($string, $start, $span); ... }
Besides constructing nice error messages, we will step through these tests:
sub _validate_event { my($self, $string, $start, $span) = @_; my(@event) = @{$self -> recce -> events}; my($event_name) = ${$event[0]}[0]; # The default. my($lexeme) = substr($string, $start, $span); my($line, $column) = $self -> recce -> line_column($start); my($literal) = substr($string, $start + $span, 20); $literal =~ tr/\n/ /; $literal =~ s/^\s+//; $literal =~ s/\s+$//; my($message) = "Location: ($line, $column). Lexeme: !$lexeme!. Next few chars: !$literal!"; if (! ${$self -> known_events}{$event_name}) { $message = "$message. Unexpected event name '$event_name'"; $self -> log(error => $message); die "$message\n"; } my($event_count) = scalar @event; if ($event_count > 1) { # We can handle ambiguous events when they are 'attribute_name' and 'node_name'. # 'attribute_name' is followed by '=', and 'node_name' is followed by anything else. # Often, 'node_name' is folowed by '[' to indicate the start of its attributes. if ($event_count == 2) { my(@event_name) = sort (${$event[0]}[0], ${$event[1]}[0]); my($expected) = "$event_name[0].$event_name[1]"; if ($expected eq 'attribute_name.node_name') { $self -> log(debug => $message); # This might return undef. $event_name = $self -> _identify_lexeme($string, $start, $span); } else { $event_name = undef; } if (! defined $event_name) { $message = "$message. Events triggered: $event_count. Names: "; $self -> log(error => $message . join(', ', map{${$_}[0]} @event) . '.'); die "Cannot identify lexeme as either 'attribute_name' or 'node_name'. \n"; } } else { $message = "$message. Events triggered: $event_count. Names: "; $self -> log(error => $message . join(', ', map{${$_}[0]} @event) . '.'); die "The code only handles 1 event at a time, or the pair ('attribute_name', 'node_name'). \n"; } } return $event_name; } # End of _validate_event.
Luckily, in this case, just one token (after the lexeme which triggered the events) needs to be examined, to differentiate between the 2 cases.
And because the grammar uses pause => before
, we're classifing a lexeme which technically we haven't even read yet!
sub _identify_lexeme { my($self, $string, $start, $span) = @_; # Set pos() in preparation for the \G in the regexp. pos($string) = $start + $span; $string =~ /\G\s*(\S)/ || return; # Return undef for failure. my($type) = ($1 eq '=') ? 'attribute_name' : 'node_name'; $self -> log(debug => "Disambiguated lexeme as '$type'"); return $type; } # End of _identify_lexeme.
When the number of events goes up, we would like to have a data structure which helps manage any number of them. Here's an (untested) code suggestion, which goes in _validate_event().
my(@events_triggered) = join('.', sort map{$$_[0]} @event); my(%cases) = ( a => { event_list => [qw/attribute_name node_name/], handler => sub{...}, }, b => { ... }, ); my($events_handled); for my $case (keys %cases) { $events_handled = join('.', sort @{$cases{$case}{event_list}); if ($events_handled eq $events_triggered) { $event_name = $cases{$case}{handler} -> (...); } }
Of course, the handler sub could be a closure which assigns directly to $event_name
.
So, as far as possible, we extend the hash %cases
rather than needing to add complexity to the loop which considers the list of cases.
Ron Savage .
Marpa's homepage: http://savage.net.au/Marpa.html
Homepage: http://savage.net.au/index.html
Australian Copyright © 2014 Ron Savage. All rights reserved.
All Programs of mine are 'OSI Certified Open Source Software'; you can redistribute them and/or modify them under the terms of The Artistic License, a copy of which is available at: http://www.opensource.org/licenses/index.html