Oracle grammar ambiguities and conflicts - Compilers
This is a discussion on Oracle grammar ambiguities and conflicts - Compilers ; >I am attempting to write a parser for Oracle 7.3.4 SQL/PLSQL using
>Visual Parse++ 4.0. I have run into a problem where non-reserved
>keywords must be included as identifiers, causing millions of conflicts
>in the grammar. Does anyone have experience ...
-
Oracle grammar ambiguities and conflicts
>I am attempting to write a parser for Oracle 7.3.4 SQL/PLSQL using
>Visual Parse++ 4.0. I have run into a problem where non-reserved
>keywords must be included as identifiers, causing millions of conflicts
>in the grammar. Does anyone have experience with either using VP++
>or with writing grammars for SQL?
>
>Thanks,
>
>John Fairhall
>Quest Software.
>[That's a famous problem. There's a variety of less than fabulous
>solutions, hacks in the lexer to guess when to return a keyword and
>when to return a symbol, parser backup if it gags on a keyword, or
>bushy parsers that put all of the allowed keywords into the grammar
>in places where they can be used as symbols. -John]
Sorry for resurrecting an old post from 2001, but I am wrestling with
this same famous problem and have had no luck (beyond the original
post) finding help on this famous problem. I'm currently working on
an oracle 10g sql parser using flex/bison. Of the three solutions
mentioned by the moderator (are there more?), here are the stumbling
blocks I've encountered:
1) "parser backup if it gags on a keyword" -- not feasible in
bison/flex (to my knowledge), and most tools capable of producing backups
use exception based processing, which results in unacceptable performance
for my task.
2) "hacks in the lexer" -- trying this currently, but I find the regex
I use matching too much, or not catching enough. Additionally, trying lexer
"states" seems to replicate all the work in the parser for determining the
context for determining whether the word is a keyword or symbol. Should I
have the parser switch the state of the lexer depending on which rule it is
pursuing? This seems messy, causing some instances where I'd have to back
up the lexer and re-lex.
3) "bushy parsers that put all of the allowed keywords into the
grammar in places where they can be used as symbols" -- I'm currently doing
this in my parser (I think), but it leads to some ambiguities in some cases,
particularly with words like DEFAULT.
Lastly, would a LL grammar be more befitting this problem than a LR?
Any feedback or pointers you could give me would be greatly appreciated as
my searching of this group, other groups, and the internet as a whole has
been fruitless.
Thanks,
Matt Blackmon
mattblackmon AT NOhotmailJUNK.com
-
Re: Oracle grammar ambiguities and conflicts
matt blackmon <mattblackmon@hotmail.com> wrote:
>
> ... I'm currently working on
> an oracle 10g sql parser using flex/bison. Of the three solutions
> mentioned by the moderator (are there more?), ...
We have written an SQL parser for ORACLE 10. It accepts all DML and DDL
statements including PL/SQL. The parser also accepts the SQL dialects
DB2 8, Informix 7 and Sybase 6. It is implemented using the Cocktail Toolbox.
We solve the problem of non-reserved keywords with a variant of
"parser backup if it gags on a keyword". The handling of syntax errors
can be configured in the parsers generated by the LALR parser generator
'lark'. We have done this using the following "hack in the parser":
At a syntax error we check whether the current token is a keyword.
If an identifier token would be acceptable at this location then
we convert the keyword token into an identifier token and continue parsing.
We are successfully using this hack in parsers for SQL, COBOL, NATURAL and
CICS.
Best regards
Dr. Josef Grosch
CoCoLab - Datenverarbeitung
Hoehenweg 6
77855 Achern
Germany
Phone : +49-7841-669144
Fax : +49-7841-669145
Email : grosch@cocolab.com
Internet: www.cocolab.com
-
Re: Oracle grammar ambiguities and conflicts
Matt,
> Sorry for resurrecting an old post from 2001, but I am wrestling with
> this same famous problem and have had no luck (beyond the original
> post) finding help on this famous problem. I'm currently working on
> an oracle 10g sql parser using flex/bison. Of the three solutions
> mentioned by the moderator (are there more?), here are the stumbling
> blocks I've encountered:
After spending several months I managed to create a grammar for SQL/2
that had a small number of reduce/reduce and shift/reduce conflicts.
Adding support for Oracle 7 (later 8) caused a few more conflicts to
be added.
My main problem with Oracle is all the undocumented constructs. Some
are simply stuff that their tool does not recognize and simply ignores
(very confusing since some developers have the attitude of "no tool
complaint -> it must do something; don't know what since I didn't write
the code"). Other constructs are simply supported because they were in
a very early release of the product.
I am happy to license you the source if you are interested in a
commercial solution.
Some background here: http://www.knosof.co.uk/sqlport.html
-
Re: Oracle grammar ambiguities and conflicts
Matt Blackmon reasked:
>I am attempting to write a parser for Oracle 7.3.4 SQL/PLSQL using
>Visual Parse++ 4.0. I have run into a problem where non-reserved
>keywords must be included as identifiers, causing millions of conflicts
>in the grammar. Does anyone have experience with either using VP++
>or with writing grammars for SQL?
Generally the cleanest solution is:
create a non-terminal id which accepts all the keywords (you may have
done this) and use it in place of your token ID.
e.g.
id: ID | KW1 | KW2 | KW3 | ... | FROM | ... | SELECT | ... ;
select: SELECT id FROM table;
Now, this will give you conflicts at several (many) points in the grammar.
For each such point in the grammar, look at the tables and conflict messages.
e.g.
conflict between shifting "FROM" and reducing id at state 3.
conflict between shifting "KW3" and reducing id at state 3.
That conflict means you need another id non-terminal, say id_1 which
does not allow FROM or KW3, as in:
id_1: ID | KW1 | KW2 | ... | SELECT | ... ;
You then have to use that in the proper spot(s) in the grammar. That
is not the rule which is shifting the grammar, but the rule that has
the id reference. Generally, when I am looking for such conflicts I
back-up in the grammar and find the state which shifted to this state
and in that state one of the items (dotted rules) will have a dot (I
think bison and byacc use _ for dot, Yacc++ uses ., other tools could
use other characters) before the id nonterminal that needs to be
changed to id_1. Fix that rule.
Repeat this process until it converges (i.e. you get no more
conflicts).
Alternately, if all you conflicts are shift-reduce conflicts between
keywords and the id non-terminal, you can simply ignore the conflicts
as the default rule of preferring to shift over reducing has exactly
the right semantics to resolve this problem. Of course, that can
leave you with a huge pile of warnings which you want to ignore. You
can improve on that with associativity and precedence directives, so
that the shifts are prefered to reduces of "id" by default and with no
warning.
Hope this helps,
-Chris
*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
------------------------------------------------------------------------------
-
Re: Oracle grammar ambiguities and conflicts
matt blackmon wrote:
> Lastly, would a LL grammar be more befitting this problem than a LR?
Yes, IMO. It's not so much a matter of the grammar, but a recursive
descent parser is easier to modify than the automatons, produced from LR
grammars. You may need an different parser generator?
DoDi
-
Re: Oracle grammar ambiguities and conflicts
Hi all,
Hi Matt,
"matt blackmon" <mattblackmon@hotmail.com> wrote
>Of the three solutions
>mentioned by the moderator (are there more?), here are the stumbling
>blocks I've encountered:
My opinion is you combine solution 3 with a GLR and avoid headaches. I
think it will work. At least you will not have to deal with one million
conflicts.
I've worked a little with SQL-92, not Oracle Syntax, but I believe the
problem with the non-reserved words is common.
Also, I hope Oracle Syntax is less ambiguous than SQL-92, otherwise you
have to make excessive restatements, if you want to solve them at the
syntax level.
Anyway, let us see solution-3.
"matt blackmon" wrote
>3) "bushy parsers that put all of the allowed keywords into the
>grammar in places where they can be used as symbols" -- I'm currently doing
>this in my parser (I think), but it leads to some ambiguities in some
>cases,
>particularly with words like DEFAULT.
I will try to give you an idea what to check again.
I think, even in Oracle, 'DEFAULT' is a reserved word.
Further, 'Bison' is a LALR Parser, according to the description at
http://www.gnu.org/software/bison/bi...l#introduction .
Thus, you possibly have encountered conflicts rather ambiguities due
to keywords.
What you could possibly try to deal with these problems?
Let us assume the rule that describes the Lexical Conventions of an
identifier that is not delimited is named "regular identifier", set:
(a) <regular identifier> ::= <_regular identifier>
| <non reserved word>
(b) <_regular identifier> ::=
"here place the old definition of <regular identifier>"
(c) All reserved & non-reserved words should be placed in flex
before the definition of <_regular identifier>.
According to the Flex documentation, when input matches multiple
declarations, it gives priority to the rule stated first.
So, such a restatement might work.
After, I am not certain what conflicts you might face with
Oracle Syntax. One solution is, you let a GLR deal with conflicts
and you just have to deal with ambiguities, if any.
I hope this helps,
Ev. Drikos
-
Re: Oracle grammar ambiguities and conflicts
Dr. Josef Grosch wrote:
> We solve the problem of non-reserved keywords with a variant of
> "parser backup if it gags on a keyword". The handling of syntax errors
> can be configured in the parsers generated by the LALR parser generator
> 'lark'. We have done this using the following "hack in the parser":
>
> At a syntax error we check whether the current token is a keyword.
> If an identifier token would be acceptable at this location then
> we convert the keyword token into an identifier token and continue parsing.
That "hack" should be fully general and work for any tool that you can
make the appropriate patches to. Moreover, it will work even if you
change the grammar so that the legal set of keywords at different
places change. It is about as elegant and robust a solution as one
could find. In fact, it's so ideal a solution, it's hard to consider
it a hack.
If you can do it this way, I recommend you do.
I'm going to add it to my bag-of-tricks.
The only "downside" of this solution is that it is outside the grammar
and involves patching the tool, so if you switch parser generators
(and very few people ever do), you have to reinvent the "hack" for the
new parser generator. However, if you switch parser generators, you
are likely to have to redo portions of the grammar and auxillary code
just to the idiosyncarcies of the new tool, such that this effort will
be negligible. In other words, there is no downside to this technique.
Hope this helps,
-Chris
*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
------------------------------------------------------------------------------
-
Re: Oracle grammar ambiguities and conflicts
John Fairhall wrote:
>> I am attempting to write a parser for Oracle 7.3.4 SQL/PLSQL using
>> Visual Parse++ 4.0. I have run into a problem where non-reserved
>> keywords must be included as identifiers, causing millions of conflicts
>> in the grammar. Does anyone have experience with either using VP++
>> or with writing grammars for SQL?
>
matt blackmon wrote:
> Lastly, would a LL grammar be more befitting this problem than a LR?
Don't know if this will help, but I wrote a top-down recursive-descent
parser in Java for SQL-like queries, at:
http://david.tribble.com/src/java/tr...eryParser.java
which is part of my 'tribble.parse.sql' package at:
http://david.tribble.com/src/java
The parse_operand() and parse_name_expr() methods treat a
keyword as an error when an operand (identifier or literal) is
expected, but they can be modified fairly easily (although you
may give up some syntax checking if you do this).
(My code is open source.)
-drt
-
Re: Oracle grammar ambiguities and conflicts
>Generally the cleanest solution is:
>create a non-terminal id which accepts all the keywords (you may have
>done this) and use it in place of your token ID.
And applicable to top-down parsing as well. I used this technique
successfully on a large commercial translator. But it does require some
parsing/grammar knowledge.
The SLK parser generator: http://home.earthlink.net/~slkpg/
-
Re: Oracle grammar ambiguities and conflicts
>At a syntax error we check whether the current token is a keyword.
>If an identifier token would be acceptable at this location then
>we convert the keyword token into an identifier token and continue parsing.
I also use this technique to handle typedef in C. SLK facilitates this with
the following retry rules on return from the user error routine:
If the same token is returned, the algorithm proceeds as if nothing
happened. The parse stack is popped. For mismatch(), this is one way to
insert a missing token.
If a different token is returned, the algorithm does a retry using the new
token and the same nonterminal.
If zero is returned, the algorithm retries using the old token/nonterminal
pair. This is useful when the action required is to change the symbol table
information.
The SLK parser generator: http://home.earthlink.net/~slkpg/
Similar Threads
-
By Application Development in forum c++
Replies: 2
Last Post: 08-24-2007, 09:50 AM
-
By Application Development in forum c++
Replies: 2
Last Post: 08-01-2007, 01:06 PM
-
By Application Development in forum Compilers
Replies: 1
Last Post: 01-17-2006, 09:41 PM
-
By Application Development in forum Compilers
Replies: 3
Last Post: 03-31-2005, 11:35 PM