Oracle grammar ambiguities and conflicts - Compilers

This is a discussion on Oracle grammar ambiguities and conflicts - Compilers ; >I am attempting to write a parser for Oracle 7.3.4 SQL/PLSQL using >Visual Parse++ 4.0. I have run into a problem where non-reserved >keywords must be included as identifiers, causing millions of conflicts >in the grammar. Does anyone have experience ...

+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 10 of 11

Oracle grammar ambiguities and conflicts

  1. Default Oracle grammar ambiguities and conflicts


    >I am attempting to write a parser for Oracle 7.3.4 SQL/PLSQL using
    >Visual Parse++ 4.0. I have run into a problem where non-reserved
    >keywords must be included as identifiers, causing millions of conflicts
    >in the grammar. Does anyone have experience with either using VP++
    >or with writing grammars for SQL?
    >
    >Thanks,
    >
    >John Fairhall
    >Quest Software.
    >[That's a famous problem. There's a variety of less than fabulous
    >solutions, hacks in the lexer to guess when to return a keyword and
    >when to return a symbol, parser backup if it gags on a keyword, or
    >bushy parsers that put all of the allowed keywords into the grammar
    >in places where they can be used as symbols. -John]


    Sorry for resurrecting an old post from 2001, but I am wrestling with
    this same famous problem and have had no luck (beyond the original
    post) finding help on this famous problem. I'm currently working on
    an oracle 10g sql parser using flex/bison. Of the three solutions
    mentioned by the moderator (are there more?), here are the stumbling
    blocks I've encountered:


    1) "parser backup if it gags on a keyword" -- not feasible in
    bison/flex (to my knowledge), and most tools capable of producing backups
    use exception based processing, which results in unacceptable performance
    for my task.

    2) "hacks in the lexer" -- trying this currently, but I find the regex
    I use matching too much, or not catching enough. Additionally, trying lexer
    "states" seems to replicate all the work in the parser for determining the
    context for determining whether the word is a keyword or symbol. Should I
    have the parser switch the state of the lexer depending on which rule it is
    pursuing? This seems messy, causing some instances where I'd have to back
    up the lexer and re-lex.

    3) "bushy parsers that put all of the allowed keywords into the
    grammar in places where they can be used as symbols" -- I'm currently doing
    this in my parser (I think), but it leads to some ambiguities in some cases,
    particularly with words like DEFAULT.

    Lastly, would a LL grammar be more befitting this problem than a LR?

    Any feedback or pointers you could give me would be greatly appreciated as
    my searching of this group, other groups, and the internet as a whole has
    been fruitless.

    Thanks,
    Matt Blackmon
    mattblackmon AT NOhotmailJUNK.com

  2. Default Re: Oracle grammar ambiguities and conflicts

    matt blackmon <mattblackmon@hotmail.com> wrote:
    >
    > ... I'm currently working on
    > an oracle 10g sql parser using flex/bison. Of the three solutions
    > mentioned by the moderator (are there more?), ...


    We have written an SQL parser for ORACLE 10. It accepts all DML and DDL
    statements including PL/SQL. The parser also accepts the SQL dialects
    DB2 8, Informix 7 and Sybase 6. It is implemented using the Cocktail Toolbox.

    We solve the problem of non-reserved keywords with a variant of
    "parser backup if it gags on a keyword". The handling of syntax errors
    can be configured in the parsers generated by the LALR parser generator
    'lark'. We have done this using the following "hack in the parser":

    At a syntax error we check whether the current token is a keyword.
    If an identifier token would be acceptable at this location then
    we convert the keyword token into an identifier token and continue parsing.

    We are successfully using this hack in parsers for SQL, COBOL, NATURAL and
    CICS.


    Best regards

    Dr. Josef Grosch

    CoCoLab - Datenverarbeitung
    Hoehenweg 6
    77855 Achern
    Germany

    Phone : +49-7841-669144
    Fax : +49-7841-669145
    Email : grosch@cocolab.com
    Internet: www.cocolab.com

  3. Default Re: Oracle grammar ambiguities and conflicts

    Matt,

    > Sorry for resurrecting an old post from 2001, but I am wrestling with
    > this same famous problem and have had no luck (beyond the original
    > post) finding help on this famous problem. I'm currently working on
    > an oracle 10g sql parser using flex/bison. Of the three solutions
    > mentioned by the moderator (are there more?), here are the stumbling
    > blocks I've encountered:


    After spending several months I managed to create a grammar for SQL/2
    that had a small number of reduce/reduce and shift/reduce conflicts.
    Adding support for Oracle 7 (later 8) caused a few more conflicts to
    be added.

    My main problem with Oracle is all the undocumented constructs. Some
    are simply stuff that their tool does not recognize and simply ignores
    (very confusing since some developers have the attitude of "no tool
    complaint -> it must do something; don't know what since I didn't write
    the code"). Other constructs are simply supported because they were in
    a very early release of the product.

    I am happy to license you the source if you are interested in a
    commercial solution.

    Some background here: http://www.knosof.co.uk/sqlport.html


  4. Default Re: Oracle grammar ambiguities and conflicts

    Matt Blackmon reasked:
    >I am attempting to write a parser for Oracle 7.3.4 SQL/PLSQL using
    >Visual Parse++ 4.0. I have run into a problem where non-reserved
    >keywords must be included as identifiers, causing millions of conflicts
    >in the grammar. Does anyone have experience with either using VP++
    >or with writing grammars for SQL?


    Generally the cleanest solution is:

    create a non-terminal id which accepts all the keywords (you may have
    done this) and use it in place of your token ID.

    e.g.

    id: ID | KW1 | KW2 | KW3 | ... | FROM | ... | SELECT | ... ;

    select: SELECT id FROM table;

    Now, this will give you conflicts at several (many) points in the grammar.

    For each such point in the grammar, look at the tables and conflict messages.

    e.g.
    conflict between shifting "FROM" and reducing id at state 3.
    conflict between shifting "KW3" and reducing id at state 3.

    That conflict means you need another id non-terminal, say id_1 which
    does not allow FROM or KW3, as in:

    id_1: ID | KW1 | KW2 | ... | SELECT | ... ;

    You then have to use that in the proper spot(s) in the grammar. That
    is not the rule which is shifting the grammar, but the rule that has
    the id reference. Generally, when I am looking for such conflicts I
    back-up in the grammar and find the state which shifted to this state
    and in that state one of the items (dotted rules) will have a dot (I
    think bison and byacc use _ for dot, Yacc++ uses ., other tools could
    use other characters) before the id nonterminal that needs to be
    changed to id_1. Fix that rule.

    Repeat this process until it converges (i.e. you get no more
    conflicts).

    Alternately, if all you conflicts are shift-reduce conflicts between
    keywords and the id non-terminal, you can simply ignore the conflicts
    as the default rule of preferring to shift over reducing has exactly
    the right semantics to resolve this problem. Of course, that can
    leave you with a huge pile of warnings which you want to ignore. You
    can improve on that with associativity and precedence directives, so
    that the shifts are prefered to reduces of "id" by default and with no
    warning.

    Hope this helps,
    -Chris

    *****************************************************************************
    Chris Clark Internet : compres@world.std.com
    Compiler Resources, Inc. Web Site : http://world.std.com/~compres
    23 Bailey Rd voice : (508) 435-5016
    Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
    ------------------------------------------------------------------------------

  5. Default Re: Oracle grammar ambiguities and conflicts

    matt blackmon wrote:

    > Lastly, would a LL grammar be more befitting this problem than a LR?


    Yes, IMO. It's not so much a matter of the grammar, but a recursive
    descent parser is easier to modify than the automatons, produced from LR
    grammars. You may need an different parser generator?

    DoDi


  6. Default Re: Oracle grammar ambiguities and conflicts

    Hi all,
    Hi Matt,

    "matt blackmon" <mattblackmon@hotmail.com> wrote
    >Of the three solutions
    >mentioned by the moderator (are there more?), here are the stumbling
    >blocks I've encountered:


    My opinion is you combine solution 3 with a GLR and avoid headaches. I
    think it will work. At least you will not have to deal with one million
    conflicts.

    I've worked a little with SQL-92, not Oracle Syntax, but I believe the
    problem with the non-reserved words is common.

    Also, I hope Oracle Syntax is less ambiguous than SQL-92, otherwise you
    have to make excessive restatements, if you want to solve them at the
    syntax level.

    Anyway, let us see solution-3.

    "matt blackmon" wrote
    >3) "bushy parsers that put all of the allowed keywords into the
    >grammar in places where they can be used as symbols" -- I'm currently doing
    >this in my parser (I think), but it leads to some ambiguities in some
    >cases,
    >particularly with words like DEFAULT.


    I will try to give you an idea what to check again.

    I think, even in Oracle, 'DEFAULT' is a reserved word.

    Further, 'Bison' is a LALR Parser, according to the description at
    http://www.gnu.org/software/bison/bi...l#introduction .

    Thus, you possibly have encountered conflicts rather ambiguities due
    to keywords.

    What you could possibly try to deal with these problems?

    Let us assume the rule that describes the Lexical Conventions of an
    identifier that is not delimited is named "regular identifier", set:

    (a) <regular identifier> ::= <_regular identifier>
    | <non reserved word>

    (b) <_regular identifier> ::=
    "here place the old definition of <regular identifier>"

    (c) All reserved & non-reserved words should be placed in flex
    before the definition of <_regular identifier>.

    According to the Flex documentation, when input matches multiple
    declarations, it gives priority to the rule stated first.

    So, such a restatement might work.

    After, I am not certain what conflicts you might face with
    Oracle Syntax. One solution is, you let a GLR deal with conflicts
    and you just have to deal with ambiguities, if any.


    I hope this helps,

    Ev. Drikos

  7. Default Re: Oracle grammar ambiguities and conflicts

    Dr. Josef Grosch wrote:
    > We solve the problem of non-reserved keywords with a variant of
    > "parser backup if it gags on a keyword". The handling of syntax errors
    > can be configured in the parsers generated by the LALR parser generator
    > 'lark'. We have done this using the following "hack in the parser":
    >
    > At a syntax error we check whether the current token is a keyword.
    > If an identifier token would be acceptable at this location then
    > we convert the keyword token into an identifier token and continue parsing.


    That "hack" should be fully general and work for any tool that you can
    make the appropriate patches to. Moreover, it will work even if you
    change the grammar so that the legal set of keywords at different
    places change. It is about as elegant and robust a solution as one
    could find. In fact, it's so ideal a solution, it's hard to consider
    it a hack.

    If you can do it this way, I recommend you do.

    I'm going to add it to my bag-of-tricks.

    The only "downside" of this solution is that it is outside the grammar
    and involves patching the tool, so if you switch parser generators
    (and very few people ever do), you have to reinvent the "hack" for the
    new parser generator. However, if you switch parser generators, you
    are likely to have to redo portions of the grammar and auxillary code
    just to the idiosyncarcies of the new tool, such that this effort will
    be negligible. In other words, there is no downside to this technique.

    Hope this helps,
    -Chris

    *****************************************************************************
    Chris Clark Internet : compres@world.std.com
    Compiler Resources, Inc. Web Site : http://world.std.com/~compres
    23 Bailey Rd voice : (508) 435-5016
    Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
    ------------------------------------------------------------------------------

  8. Default Re: Oracle grammar ambiguities and conflicts

    John Fairhall wrote:
    >> I am attempting to write a parser for Oracle 7.3.4 SQL/PLSQL using
    >> Visual Parse++ 4.0. I have run into a problem where non-reserved
    >> keywords must be included as identifiers, causing millions of conflicts
    >> in the grammar. Does anyone have experience with either using VP++
    >> or with writing grammars for SQL?

    >


    matt blackmon wrote:
    > Lastly, would a LL grammar be more befitting this problem than a LR?


    Don't know if this will help, but I wrote a top-down recursive-descent
    parser in Java for SQL-like queries, at:
    http://david.tribble.com/src/java/tr...eryParser.java
    which is part of my 'tribble.parse.sql' package at:
    http://david.tribble.com/src/java

    The parse_operand() and parse_name_expr() methods treat a
    keyword as an error when an operand (identifier or literal) is
    expected, but they can be modified fairly easily (although you
    may give up some syntax checking if you do this).

    (My code is open source.)

    -drt

  9. Default Re: Oracle grammar ambiguities and conflicts

    >Generally the cleanest solution is:

    >create a non-terminal id which accepts all the keywords (you may have
    >done this) and use it in place of your token ID.


    And applicable to top-down parsing as well. I used this technique
    successfully on a large commercial translator. But it does require some
    parsing/grammar knowledge.

    The SLK parser generator: http://home.earthlink.net/~slkpg/

  10. Default Re: Oracle grammar ambiguities and conflicts

    >At a syntax error we check whether the current token is a keyword.
    >If an identifier token would be acceptable at this location then
    >we convert the keyword token into an identifier token and continue parsing.


    I also use this technique to handle typedef in C. SLK facilitates this with
    the following retry rules on return from the user error routine:

    If the same token is returned, the algorithm proceeds as if nothing
    happened. The parse stack is popped. For mismatch(), this is one way to
    insert a missing token.

    If a different token is returned, the algorithm does a retry using the new
    token and the same nonterminal.

    If zero is returned, the algorithm retries using the old token/nonterminal
    pair. This is useful when the action required is to change the symbol table
    information.

    The SLK parser generator: http://home.earthlink.net/~slkpg/

+ Reply to Thread
Page 1 of 2 1 2 LastLast

Similar Threads

  1. Avoiding namespace ambiguities
    By Application Development in forum c++
    Replies: 2
    Last Post: 08-24-2007, 09:50 AM
  2. Avoiding ambiguities
    By Application Development in forum c++
    Replies: 2
    Last Post: 08-01-2007, 01:06 PM
  3. Oracle grammar ambiguities and conflicts
    By Application Development in forum Compilers
    Replies: 1
    Last Post: 01-17-2006, 09:41 PM
  4. grammar conflicts
    By Application Development in forum Compilers
    Replies: 3
    Last Post: 03-31-2005, 11:35 PM