Chapter 2
Of course, the heading means advantages over other methods of parsing.
So - the advantages:
It is available as a linkable library
It is written in C for speed
Marpa parses in times considered theoretically optimal
Marpa parses all classes of grammar in practical use today in linear time, O(n)
Grammars can be ambiguous
This means Marpa can tell you how many different ways there are of interpreting the input stream.
Or, in other words...
Marpa tries everything, all at once. That's why it can parse such a wide class of grammars. And, that's why some parses generate multiple parse trees.
See the FAQ for more on this intriguing topic.
Marpa supports grammars having nullable symbols
Rules can be recursive
Try that with a regexp!
This means you provide the rules as text, not as code. Marpa provides the code to implement these rules. That is, Marpa generates a parser tailored to your BNF.
This is declarative parsing.
Rules have an optional action adverb, indicating where Marpa is to trigger your callback to handle intermediate results.
This provides an exceptionally clean interface between the rules and any code you need to handle specific cases.
You action subs can communicate with each other via a 'per-parse argument', which is passed to each action sub as its first parameter. This per-parse argument defaults to an empty hashref. For Marpa, it is read-only: It is reserved for the exclusive use of your code.
This Marpa-specific BNF is called SLIF-DSL (Scanless Interface Domain-Specific Language).
This means you do not have to provide a separate lexer.
In special cases you can of course take over the lexing yourself.
For that, start here.
And, the author has a Marpa-oriented blog.
This is procedural parsing.
Consequently, you can freely switch back and forth between declarative and procedural parsing.
Rules have an optional pause adverb, indicating where Marpa is to return to your calling code, until such time as you call resume(). This is not the same as the callback trigger mechanism mentioned above.
During the pause, a set of methods allows you to interrogate Marpa about the current state of the parse.
It also means you can parse the next part of the input stream yourself, and tell Marpa to skip what you've handled.
This means the string just preceeding and just following the error can be retrieved, as well as a list of the tokens Marpa was expecting to see at that point.
Even better, Marpa can tell you which rule(s) it was evaluating.
This is called Ruby Slippers parsing.
Rules can share events via their (event) name.
Events can be activated and deactivated.
The current status of each event can be interrogated by your code during a pause.
LTM is Longest Token Matching. LTM is the tradition and the current standard practice in parsing, and the default in Marpa.
LATM is Longest Acceptable Token Matching. It is activated by using the new forgiving adverb in your grammar.
See the announcement for V 2.080, and the docs for forgiving.
Marpa has built-in tracing facilities
Marpa has a 'thin' interface, for direct access to libmarpa
Marpa prunes all nulled subtrees back to their topmost nulled node
In Marpa parses, rules and symbols can be nulled. In other words they can derive the zero-length, or null, string.
See 'Bailing out of parse evaluation' in the docs for the Marpa::R2::Semantics package.
(Recall from Chapter 1, Marpa has 3 phases: Grammar pre-processing; Recognition; and Evaluation.)
This automatically leads us to greatly expand what problems now fit the definition of 'parseable'.