I'm re-writing GraphViz2::Marpa (the new version will be V 2.00), and have a BNF (Marpa::R2-style grammar) which allows Marpa to trigger 2 events simultaneously as it parses a single lexeme.
For more on these BNFs, see the docs for Marpa's DSL.
Here's one way to handle such a problem.
The input stream is in the DOT language.
In GraphViz2::Marpa, sometimes an identifier needs to be classified as an 'attribute name' or a 'node name'.
Typical syntax, in pseudo-Perl, is:
Here, $attr_name is an attribute and $attr_value is its value.
Here, $node_name is a node with the given attributes.
So the 1st non-whitespace char after $attr_name/$node_name, here '=' or '[', differentiates between the 2 cases.
:lexeme ~ attribute_name pause => before event => attribute_name
attribute_name ~ string_char_set+
:lexeme ~ node_name pause => before event => node_name
node_name ~ string_char_set+
escaped_char ~ '\' [[:print:]]
string_char_set ~ escaped_char
| [^;\s\[\]\{\}] # Neither a separator [;] nor a terminator [\s\[\]\{\}].
This just runs the code to trigger the events. Having used pauses in the grammar, we call Marpa's read() method in a loop.
for
(
my $pos = $self -> recce -> read(\$string);
$pos < $length;
$pos = $self -> recce -> resume($pos)
)
{
($start, $span) = $self -> recce -> pause_span;
$event_name = $self -> _validate_event($string, $start, $span);
...
}
Besides constructing nice error messages, we will step through these tests:
sub _validate_event
{
my($self, $string, $start, $span) = @_;
my(@event) = @{$self -> recce -> events};
my($event_name) = ${$event[0]}[0]; # The default.
my($lexeme) = substr($string, $start, $span);
my($line, $column) = $self -> recce -> line_column($start);
my($literal) = substr($string, $start + $span, 20);
$literal =~ tr/\n/ /;
$literal =~ s/^\s+//;
$literal =~ s/\s+$//;
my($message) = "Location: ($line, $column). Lexeme: !$lexeme!. Next few chars: !$literal!";
if (! ${$self -> known_events}{$event_name})
{
$message = "$message. Unexpected event name '$event_name'";
$self -> log(error => $message);
die "$message\n";
}
my($event_count) = scalar @event;
if ($event_count > 1)
{
# We can handle ambiguous events when they are 'attribute_name' and 'node_name'.
# 'attribute_name' is followed by '=', and 'node_name' is followed by anything else.
# Often, 'node_name' is folowed by '[' to indicate the start of its attributes.
if ($event_count == 2)
{
my(@event_name) = sort (${$event[0]}[0], ${$event[1]}[0]);
my($expected) = "$event_name[0].$event_name[1]";
if ($expected eq 'attribute_name.node_name')
{
$self -> log(debug => $message);
# This might return undef.
$event_name = $self -> _identify_lexeme($string, $start, $span);
}
else
{
$event_name = undef;
}
if (! defined $event_name)
{
$message = "$message. Events triggered: $event_count. Names: ";
$self -> log(error => $message . join(', ', map{${$_}[0]} @event) . '.');
die "Cannot identify lexeme as either 'attribute_name' or 'node_name'. \n";
}
}
else
{
$message = "$message. Events triggered: $event_count. Names: ";
$self -> log(error => $message . join(', ', map{${$_}[0]} @event) . '.');
die "The code only handles 1 event at a time, or the pair ('attribute_name', 'node_name'). \n";
}
}
return $event_name;
} # End of _validate_event.
Luckily, in this case, just one token (after the lexeme which triggered the events) needs to be examined, to differentiate between the 2 cases.
And because the grammar uses pause => before, we're classifing a lexeme which technically we haven't even read yet!
sub _identify_lexeme
{
my($self, $string, $start, $span) = @_;
# Set pos() in preparation for the \G in the regexp.
pos($string) = $start + $span;
$string =~ /\G\s*(\S)/ || return; # Return undef for failure.
my($type) = ($1 eq '=') ? 'attribute_name' : 'node_name';
$self -> log(debug => "Disambiguated lexeme as '$type'");
return $type;
} # End of _identify_lexeme.
When the number of events goes up, we would like to have a data structure which helps manage any number of them. Here's an (untested) code suggestion, which goes in _validate_event().
my(@events_triggered) = join('.', sort map{$$_[0]} @event);
my(%cases) =
(
a =>
{
event_list => [qw/attribute_name node_name/],
handler => sub{...},
},
b =>
{
...
},
);
my($events_handled);
for my $case (keys %cases)
{
$events_handled = join('.', sort @{$cases{$case}{event_list});
if ($events_handled eq $events_triggered)
{
$event_name = $cases{$case}{handler} -> (...);
}
}
Of course, the handler sub could be a closure which assigns directly to $event_name.
So, as far as possible, we extend the hash %cases rather than needing to add complexity to the loop which considers the list of cases.
Ron Savage
.
Marpa's homepage: http://savage.net.au/Marpa.html
Homepage: http://savage.net.au/index.html
Australian Copyright © 2014 Ron Savage. All rights reserved.
All Programs of mine are 'OSI Certified Open Source Software';
you can redistribute them and/or modify them under the terms of
The Artistic License, a copy of which is available at:
http://www.opensource.org/licenses/index.html