Solved

Best Way to parse SQL

Posted on 2004-09-30
10
246 Views
Last Modified: 2010-04-15
Hi,

I was working on a SQL query generator, and was wondering what'd be the best way to parse SQL using ANSI C.

Right now what i do is:

Tokenize the query based on spaces.
Determine which SQL statement it is based on these tokens.
Evaluate it accordingly.I have different syntax check mechanisms for different statements,

I dont think this is the best way to do it.

What would be a better,faster method of parsing SQL?
0
Comment
Question by:ankuratvb
  • 3
  • 2
  • 2
  • +2
10 Comments
 
LVL 45

Accepted Solution

by:
Kdo earned 200 total points
ID: 12189201
Hi ankuratvb,

My, but this is an ambitious project.  (One that is certainly worthy of one of the board's stars!)

SQL is recursive, so tokenizing "on the fly" has its drawbacks.  Besides, it's pretty easy to generate a token list.  But you will have to base your tokens on more than spaces.  (Semicolons, parenthesis, equals, etc are all stand-alone tokens than don't require leading spaces.)

The next thing that you're going to need is a language definition.  Parsing the SQL will be much easier with a defined set of rules that your parser can traverse.  Again, the recursive process comes to mind.

In a lot of ways, there's some comfort in performing "brute force" syntax checking, but it gets very messy, particularly as language elements change.  It's much easier to modifhy the rules tables and insert exits at the appropriate spot for code generation.


Are you planning to support user functions and stored procedures, too?

Kent


0
 
LVL 55

Expert Comment

by:Jaime Olivares
ID: 12189507
If you can parse it correctly you can execute it! (at least in theory), so I recommend you to use a genuine SQL parser from any SQL implementation.
0
 
LVL 9

Author Comment

by:ankuratvb
ID: 12190709
Hi Kent,

Basically,i look at the first word of the query string to determine which statement it is,then i extract the substrings required and process them.

For e.g.:
char query[]="insert into emp values('as',1234)";

By looking at the first word, i know that its an insert command,so i extract the content between "insert into" and "values" to get the table name,validate it and then move on to validating the values within the parentheses.

Right now,it works fine but i need to support sub-queries as well.Thats where the problem comes.

For instance,

char query[]="select * from emp where empno=(select empno from emp where empno=1)";


>Are you planning to support user functions and stored procedures, too?
I might, but sub-queries is first priority right now.

jaime,
If i use an SQL parser written by somebody else, half my work would've been done by somebody else.Yes,i can definitely learn from their parser's implementation.
Do you have links to any simple SQL parser's implementation?
I've seen MySQL's code but it was way too complicated for me.
I've downloaded SQLite's code which is supposed to be a very small,simple and fast database engine(adopted by PHP in place of MySQL).I'll be studying the code in the next few days.
0
 
LVL 45

Expert Comment

by:Kdo
ID: 12190750

Hi Ankur,

That's what I was referring to when I said that SQL is recursive.  Just like evaluating the expression (A*(B+C)), subqueries force you to process outward from the middle.

Unless you want to get REALLY ugly, you're going to have to utilize a stack and recursive evaluation techniques.  It adds a bunch of work to your project, but in the long run it's the only practical way to process a recursive language.


Kent
0
 
LVL 55

Assisted Solution

by:Jaime Olivares
Jaime Olivares earned 100 total points
ID: 12191042
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 4

Expert Comment

by:alikoank
ID: 12200922
Parsers are generally written as "Finite State Machine" implementations. This is a very detailed subject. You may want to google abit.

0
 
LVL 1

Assisted Solution

by:gseidman
gseidman earned 200 total points
ID: 12201015
When creating a parser for use in a C program which parses any moderately complex language, you generally want to use lex and yacc. They are a lexical scanner and a parser generator, respectively, and are pretty standard for such things. I recommend using (and modifying, if necessary) the lex and yacc sources from PostgreSQL. It is under the BSD license, so they should be suitable for inclusion in your work. Looking at the source tree, you probably want scan.l and gram.y from src/backend/parser
0
 
LVL 9

Author Comment

by:ankuratvb
ID: 12201057
Hi gseidman,

I was looking at lex and yacc for doing the job,i even have the language definition for SQL for them.

But,i was aiming for a portable solution, and i guess there arent such tools as lex and yacc in Windoze.
0
 
LVL 45

Assisted Solution

by:Kdo
Kdo earned 200 total points
ID: 12202124

Hi Ankur,

lex(1) and yaxx(1) don't exist on Winblows, but that doesn't mean that your solution can't exist there.  Use the tools to build the definitions on *nix and then simply ?:) recompile it in Windows.

Kent
0
 
LVL 1

Assisted Solution

by:gseidman
gseidman earned 200 total points
ID: 12203409
Though lex and yacc may not be available on Windows by default, there are implementations available for Windows. In particular, flex and bison have been ported. See:

http://www.monmouth.com/~wstreett/lex-yacc/lex-yacc.html

or

http://gnuwin32.sourceforge.net/packages/flex.htm
http://gnuwin32.sourceforge.net/packages/bison.htm
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Summary: This tutorial covers some basics of pointer, pointer arithmetic and function pointer. What is a pointer: A pointer is a variable which holds an address. This address might be address of another variable/address of devices/address of fu…
This is a short and sweet, but (hopefully) to the point article. There seems to be some fundamental misunderstanding about the function prototype for the "main" function in C and C++, more specifically what type this function should return. I see so…
The goal of this video is to provide viewers with basic examples to understand how to use strings and some functions related to them in the C programming language.
Video by: Grant
The goal of this video is to provide viewers with basic examples to understand and use nested-loops in the C programming language.

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now