best grammar for handling optional symbols?

This is a discussion on best grammar for handling optional symbols? within the Compilers forums in Theory and Concepts category; I've been using BNF. up until now, things have been OK with the optional symbols. what is the proper way to implement an optional symbol where <code40> is optional? <b> ::= <code40> "r" | "r" but I would prefer <b> ::= <a> "r" <a> ::= <code40> | <nullsymbol???> I don't know. I don't think this is right when it occurs 0 or 1 times. I seem to remember there was a symbol in BNF used to represent the null terminal or something - it's not listed in wikipedia (lousy wikipedia). what was it? Anyway, the problem I am up against ...

Go Back   Application Development Forum > Theory and Concepts > Compilers

Object Mix

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 08-15-2008, 09:41 PM
jmichae3@yahoo.com
Guest
 
Default best grammar for handling optional symbols?

I've been using BNF.
up until now, things have been OK with the optional symbols.

what is the proper way to implement an optional symbol where <code40>
is optional?
<b> ::= <code40> "r" | "r"
but I would prefer
<b> ::= <a> "r"
<a> ::= <code40> | <nullsymbol???>

I don't know. I don't think this is right when it occurs 0 or 1
times. I seem to remember there was a symbol in BNF used to represent
the null terminal or something - it's not listed in wikipedia (lousy
wikipedia). what was it?

Anyway, the problem I am up against is that I have to define a symbol
that has 18 options. fully expanded, that makes 2^18 options, and I
have 200 more symbols to define for autocad, many with more options...

[Either is OK for optional symbols. For your Autocad problem, I would
suggest a much looser grammar that just accepts a list of symbols, and
check semantically to knock out duplicates. That both keeps the
grammar managable, and lets you produce much better errors, e.g.
"duplicate foo" or "foo not allowed after bar" rather than just
"syntax error". -John]

Reply With Quote
  #2  
Old 08-16-2008, 08:22 AM
Max Hailperin
Guest
 
Default Re: best grammar for handling optional symbols?

jmichae3@yahoo.com writes:

> I've been using BNF.
> up until now, things have been OK with the optional symbols.
>
> what is the proper way to implement an optional symbol where <code40>
> is optional?
> <b> ::= <code40> "r" | "r"
> but I would prefer
> <b> ::= <a> "r"
> <a> ::= <code40> | <nullsymbol???>
>
> I don't know. I don't think this is right when it occurs 0 or 1
> times. I seem to remember there was a symbol in BNF used to represent
> the null terminal or something - it's not listed in wikipedia (lousy
> wikipedia). what was it?


I suspect you may be remembering two different things, either of which
might be useful to you:

(1) If you use your second option, then you are looking for some
conventional symbol to use for the empty string, in the place where
you have <nullsymbol???>. In grammars that are written in relatively
mathematical contexts, the conventional symbol to use for this is the
Greek letter epsilon. On the other hand, people who use the Greek
letter also tend to use an arrow symbol where you have ::=, etc. So,
what do people do who are limiting themselves to the ASCII character
set? The answer is that most adopt some particular nonterminal like
<epsilon> or <empty> or <null> to mean the same thing -- but there is
no standardization of this. Whichever one you use, you don't need to
just rely on readers to understand it, nor do you need to include a
prose explanation. Instead, you can define your nonterminal using a
production with a totally empty right-hand side:

<nullsymbol> ::=

Of course, that may need a prose explanation too, depending on your
audience. But the good news is that it appears only once in your
grammar, and elsewhere you use the relatively self-documenting
<nullsymbol> (or whatever you call it) as opposed to having the whole
grammar strewn with productions with empty right-hand sides, as in

<a> ::= <code40> |

where the second alternative for <a> is empty, but looks like a mistake.

(2) The other thing you may be remembering is that there are extended
versions of BNF, known as EBNF, where additional notations can be
included on the right-hand side of productions in order to indicate
such matters as optional portions. This would allow a third
alternative way of writing your grammar, conceptually similar to your
third one but a little tidier, because it doesn't require introducing
the nonterminal <a>. Unfortunately, there is again no standardization
of EBNF notation. The two most common ways of indicating optional
elements are probably enclosing them in square brackets or following
them with question marks. So, your example would wind up looking like

<b> ::= [<code40>] "r"

or

<b> ::= <code40>? "r"

>
> Anyway, the problem I am up against is that I have to define a symbol
> that has 18 options. fully expanded, that makes 2^18 options, and I
> have 200 more symbols to define for autocad, many with more options...
>
> [Either is OK for optional symbols. For your Autocad problem, I would
> suggest a much looser grammar that just accepts a list of symbols, and
> check semantically to knock out duplicates. That both keeps the
> grammar managable, and lets you produce much better errors, e.g.
> "duplicate foo" or "foo not allowed after bar" rather than just
> "syntax error". -John]


John's use of the phrase "produce much better errors" sounds like he
thinks you are using your grammar to produce a parser, as opposed to
just writing the grammar down as documentation. I'm not clear whether
that is in fact the case. But even if you are just writing the
grammar down for humans to read, there's a lot to what John says.

John's sugestion is particularly apt if the options can appear in any
order -- which I suspect is the case. In that case, you have far more
than 2^18 fully-expanded possibilities -- not that you would want to
write even 2^18 productions. Moreover, there is no nice trick of BNF
or EBNF that would come to your rescue.

Your approach using <a> to represent an optional <code40>, defined
using <nullsymbol> (or some other notation for the empty string) would
work OK for 18 options, which need to appear in a fixed order if used.
It would be mildly clutzy, since you'd need 18 new nonterminals like
your <a>. (I'd name them <optionalCode40> and so forth.)

If you are fortunate enough to use EBNF, but again have a required
fixed order, the clutziness would go away. You could just write the
18 options out with each in brackets or followed by a question mark,
and life would be good. Or to adopt John's phrase, you would have
kept your grammar manageable.

But nothing like that is possible if the order is variable. In that
case, even if you are writing for humans, your best bet is to escape
from the BNF or EBNF into some more general means of communication --
which for a human, might well be a natural-language explanation.

-Max Hailperin
Professor of Computer Science
Gustavus Adolphus College
Reply With Quote
  #3  
Old 08-17-2008, 04:16 PM
Johannes
Guest
 
Default Re: best grammar for handling optional symbols?

On Aug 16, 2:22 pm, Max Hailperin <m...@gustavus.edu> wrote:>
> John's sugestion is particularly apt if the options can appear in any
> order -- which I suspect is the case. In that case, you have far more
> than 2^18 fully-expanded possibilities -- not that you would want to
> write even 2^18 productions. Moreover, there is no nice trick of BNF
> or EBNF that would come to your rescue.


Well, theoretically one could write a program to generate the output
but I think with 2^18 alternatives no one is going to read that file
anyway.

> Your approach using <a> to represent an optional <code40>, defined
> using <nullsymbol> (or some other notation for the empty string) would
> work OK for 18 options, which need to appear in a fixed order if used.
> It would be mildly clutzy, since you'd need 18 new nonterminals like
> your <a>. (I'd name them <optionalCode40> and so forth.)
>
> If you are fortunate enough to use EBNF, but again have a required
> fixed order, the clutziness would go away. You could just write the
> 18 options out with each in brackets or followed by a question mark,
> and life would be good. Or to adopt John's phrase, you would have
> kept your grammar manageable.
>
> But nothing like that is possible if the order is variable. In that
> case, even if you are writing for humans, your best bet is to escape
> from the BNF or EBNF into some more general means of communication --
> which for a human, might well be a natural-language explanation.


In other grammars I have seen the preferred method to deal with this
situation is to write something like this:

rule: modifier+;

modifier: A | B;

One has to check afterwards though, if a modifier is never repeated
(also add a note in the grammar). I actually would choose this
solution even for a fixed order, as one can provide a better error
message if you provide the input in the wrong order.

Johannes
Reply With Quote
  #4  
Old 08-19-2008, 01:49 AM
jmichae3@yahoo.com
Guest
 
Default Re: best grammar for handling optional symbols?

> If you are fortunate enough to use EBNF, but again have a required
> fixed order, the clutziness would go away. You could just write the
> 18 options out with each in brackets or followed by a question mark,
> and life would be good. Or to adopt John's phrase, you would have
> kept your grammar manageable.
>
> But nothing like that is possible if the order is variable. In that
> case, even if you are writing for humans, your best bet is to escape
> from the BNF or EBNF into some more general means of communication --
> which for a human, might well be a natural-language explanation.
>
> -Max Hailperin
> Professor of Computer Science
> Gustavus Adolphus College


I did not know about following an optional symbol with ? but I did
miss in wikipedia where they surround optionals with [].
My options, as far as I can tell from the reference, are in a sequence
- fortunately for me - I think.
otherwise I would write it as
<option> := "a" | "b" | "c" | "d" "e" | "f"
<sequence> ::= <option>*

I guess this means I am going to have to "look ahead"
Reply With Quote
  #5  
Old 08-19-2008, 05:07 PM
Chris F Clark
Guest
 
Default Re: best grammar for handling optional symbols?

On minor point is response to what Max Hailperin <max@gustavus.edu>
writes:

> But nothing like that is possible if the order is variable. In that
> case, even if you are writing for humans, your best bet is to escape
> from the BNF or EBNF into some more general means of communication --
> which for a human, might well be a natural-language explanation.


That's not completely true. I distinctly recall reading a paper where
the author introduced an EBNF notation for generating permuations,
which is exactly what this problem calls for. Now, permutations don't
have trivial tranformations into BNF, in the sense that to describe
permutations in BNF one must list each of the alternatives and that
grows quite rapidly as the number of symbols to be permuted increases.
However, it is quite feasible to generate code that recognizes
permutations, and thus adding an operator that does such to one's EBNF
is reasonable to my mind. Others may object that EBNF should be
restricted to allowing only things expressible as regular expressions
on the right-hand-side. However, as I remember the notation, it was
restricted to combining fixed lists of symbols, so it technically was
a regular expression, just not a traditional one.

I did a Google search looking for permutation grammars but didn't find
anything on the first page, although the link to Royal Holloway seemed
promising, as Adrian Johnstone and Elizabeth Scott could easily be
interested in such an extension.

Note, if someone find this paper, it would be nice to see the
reference posted here. I wouldn't mind reading that paper again.

Hope this helps,
-Chris

************************************************** ****************************
Chris Clark Internet: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc. or: compres@world.std.com
23 Bailey Rd Web Site: http://world.std.com/~compres
Berlin, MA 01503 voice: (508) 435-5016
USA fax: (978) 838-0263 (24 hours)
Reply With Quote
  #6  
Old 08-20-2008, 01:17 AM
Quinn Tyler Jackson
Guest
 
Default RE: best grammar for handling optional symbols?

> > But nothing like that is possible if the order is variable. In that
> > case, even if you are writing for humans, your best bet is to escape
> > from the BNF or EBNF into some more general means of communication --
> > which for a human, might well be a natural-language explanation.

>
> That's not completely true. I distinctly recall reading a paper where
> the author introduced an EBNF notation for generating permuations,
> which is exactly what this problem calls for. Now, permutations don't
> have trivial tranformations into BNF, in the sense that to describe
> permutations in BNF one must list each of the alternatives and that
> grows quite rapidly as the number of symbols to be permuted increases.
> However, it is quite feasible to generate code that recognizes
> permutations, and thus adding an operator that does such to one's EBNF
> is reasonable to my mind. Others may object that EBNF should be
> restricted to allowing only things expressible as regular expressions
> on the right-hand-side. However, as I remember the notation, it was
> restricted to combining fixed lists of symbols, so it technically was
> a regular expression, just not a traditional one.


I believe the reference Chris is referring to is:

[Cameron 1993] Robert D. Cameron, "Extending Context-Free Grammars with
Permutation Phrases," LOPLAS, Vol. 2 No. 1-4, pp. 85-94, 1993.

Meta-S grammars have both strict and relaxed permutation phrases.

S ::= <<A B C>>;
A ::= "amor";
B ::= "vincit";
C ::= "omnia";

Ignoring whitespace rules, the above would match any permutation of "amor
vincit omnia". A, B, and C must all generate strings, once and only once,
and must not generate the empty string, but the order of A, B, and C is
free.

S ::= <<{A B C}>>; // the { } syntax means that one of the terms can
generate lambda
A ::= "amor";
B ::= "vincit";
C ::= ["omnia"];

In the above, C is optional. So, the grammar can generate "vincit amor" (for
instance).

Several other grammar formalisms have allowed for permutation phrases, as
well. I am not certain if they allow for both the lambda-generating and
non-lambda stricter form.

Cheers,

--
Quinn Tyler Jackson
Reply With Quote
Reply


Thread Tools
Display Modes


All times are GMT -5. The time now is 06:35 AM.


Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
vB Ad Management by =RedTyger=

In an effort to better serve ads to our visitors, cookies are used on objectmix.com. For more information, check out our Privacy Policy.