Disambiguating Simultaneous Events

The Context

The DOT Syntax in Question

The Grammar

Processing the Input Stream

Analyzing Simultaneous Events

Identifing the Lexeme via the Next Token

Automating Things

Author

Licence

Disambiguating Simultaneous Events

The Context

I'm re-writing GraphViz2::Marpa (the new version will be V 2.00), and have a BNF (Marpa::R2-style grammar) which allows Marpa to trigger 2 events simultaneously as it parses a single lexeme.

For more on these BNFs, see the docs for Marpa's DSL.

Here's one way to handle such a problem.

The DOT Syntax in Question

The input stream is in the DOT language.

In GraphViz2::Marpa, sometimes an identifier needs to be classified as an 'attribute name' or a 'node name'.

Typical syntax, in pseudo-Perl, is:

o $attr_name = $attr_value: Here, $attr_name is an attribute and $attr_value is its value.
o $node_name [$attr_name = $attr_value; ...]: Here, $node_name is a node with the given attributes.

So the 1st non-whitespace char after $attr_name/$node_name, here '=' or '[', differentiates between the 2 cases.

The Grammar

        :lexeme             ~ attribute_name     pause => before     event => attribute_name
        attribute_name      ~ string_char_set+

        :lexeme             ~ node_name          pause => before     event => node_name
        node_name           ~ string_char_set+

        escaped_char        ~ '\' [[:print:]]

        string_char_set     ~ escaped_char
                          | [^;\s\[\]\{\}] # Neither a separator [;] nor a terminator [\s\[\]\{\}].

Processing the Input Stream

This just runs the code to trigger the events. Having used pauses in the grammar, we call Marpa's read() method in a loop.

        for
        (
                my $pos = $self -> recce -> read(\$string);
                $pos < $length;
                $pos = $self -> recce -> resume($pos)
        )
        {
                ($start, $span) = $self -> recce -> pause_span;
                $event_name     = $self -> _validate_event($string, $start, $span);
                ...
        }

Analyzing Simultaneous Events

Besides constructing nice error messages, we will step through these tests:

o Check the # of events
o If there is just 1 event, return it
o If there are more the 2 events, die
o So now there are 2 events
o If it's 2 we can't handle, die
o So now it's the 2 we can handle
o Use the 1st non-whitespace token after the lexeme to choose an event

        sub _validate_event
        {
                my($self, $string, $start, $span) = @_;
                my(@event)         = @{$self -> recce -> events};
                my($event_name)    = ${$event[0]}[0]; # The default.
                my($lexeme)        = substr($string, $start, $span);
                my($line, $column) = $self -> recce -> line_column($start);
                my($literal)       = substr($string, $start + $span, 20);
                $literal           =~ tr/\n/ /;
                $literal           =~ s/^\s+//;
                $literal           =~ s/\s+$//;
                my($message)       = "Location: ($line, $column). Lexeme: !$lexeme!. Next few chars: !$literal!";

                if (! ${$self -> known_events}{$event_name})
                {
                        $message = "$message. Unexpected event name '$event_name'";

                        $self -> log(error => $message);

                        die "$message\n";
                }

                my($event_count) = scalar @event;

                if ($event_count > 1)
                {
                        # We can handle ambiguous events when they are 'attribute_name' and 'node_name'.
                        # 'attribute_name' is followed by '=', and 'node_name' is followed by anything else.
                        # Often, 'node_name' is folowed by '[' to indicate the start of its attributes.

                        if ($event_count == 2)
                        {
                                my(@event_name) = sort (${$event[0]}[0], ${$event[1]}[0]);
                                my($expected)   = "$event_name[0].$event_name[1]";

                                if ($expected eq 'attribute_name.node_name')
                                {
                                        $self -> log(debug => $message);

                                        # This might return undef.

                                        $event_name = $self -> _identify_lexeme($string, $start, $span);
                                }
                                else
                                {
                                        $event_name = undef;
                                }

                                if (! defined $event_name)
                                {
                                        $message = "$message. Events triggered: $event_count. Names: ";

                                        $self -> log(error => $message . join(', ', map{${$_}[0]} @event) . '.');

                                        die "Cannot identify lexeme as either 'attribute_name' or 'node_name'. \n";
                                }
                        }
                        else
                        {
                                $message = "$message. Events triggered: $event_count. Names: ";

                                $self -> log(error => $message . join(', ', map{${$_}[0]} @event) . '.');

                                die "The code only handles 1 event at a time, or the pair ('attribute_name', 'node_name'). \n";
                        }
                }

                return $event_name;

        } # End of _validate_event.

Identifing the Lexeme via the Next Token

Luckily, in this case, just one token (after the lexeme which triggered the events) needs to be examined, to differentiate between the 2 cases.

And because the grammar uses pause => before, we're classifing a lexeme which technically we haven't even read yet!

        sub _identify_lexeme
        {
                my($self, $string, $start, $span) = @_;

                # Set pos() in preparation for the \G in the regexp.

                pos($string) = $start + $span;
                $string      =~ /\G\s*(\S)/ || return; # Return undef for failure.
                my($type)    = ($1 eq '=') ? 'attribute_name' : 'node_name';

                $self -> log(debug => "Disambiguated lexeme as '$type'");

                return $type;

        } # End of _identify_lexeme.

Automating Things

When the number of events goes up, we would like to have a data structure which helps manage any number of them. Here's an (untested) code suggestion, which goes in _validate_event().

        my(@events_triggered) = join('.', sort map{$$_[0]} @event);
        my(%cases)            =
        (
                a =>
                {
                        event_list => [qw/attribute_name node_name/],
                        handler    => sub{...},
                },
                b =>
                {
                        ...
                },
        );

        my($events_handled);

        for my $case (keys %cases)
        {
                $events_handled = join('.', sort @{$cases{$case}{event_list});

                if ($events_handled eq $events_triggered)
                {
                        $event_name = $cases{$case}{handler} -> (...);
                }
        }

Of course, the handler sub could be a closure which assigns directly to $event_name.

So, as far as possible, we extend the hash %cases rather than needing to add complexity to the loop which considers the list of cases.

Author

Ron Savage .

Marpa's homepage: http://savage.net.au/Marpa.html

Homepage: http://savage.net.au/index.html

Licence

        All Programs of mine are 'OSI Certified Open Source Software';
        you can redistribute them and/or modify them under the terms of
        The Artistic License, a copy of which is available at:
        http://www.opensource.org/licenses/index.html

Disambiguating Simultaneous Events

Table of contents