Link to home
Start Free TrialLog in
Avatar of headzoo
headzoo

asked on

RegExp Question

Howdy,
  I'm pretty comfortable with regexps, but I don't know how to do this:  Using PHP's preg_match(_all) to parse a SQL SELECT statement.  Specifically, a SELECT statement can pretty much have an unlimited number of AND/OR clauses.  So how do you write a regexp that can handle zero or more AND/OR statements?  Keep in mind I need to capture back references for each one.

So here is the regexp I use to get a one or none AND/OR statements:

((AND|OR) ([a-zA-Z0-9\-_]+) ([=<>\!]+) \'([a-zA-Z0-9\-_]+)\')?

This is part of a larger regexp, but it's the part that stumps me.  So that's fine for a SQL query like this:

SELECT column FROM table
WHERE column = 'something'
AND column = 'something'

But what if I have a query like this

SELECT column FROM table
WHERE column = 'something'
AND column = 'something'
AND column = 'something'
OR column = 'something'
AND column 'something'

That's a ridiculous query, but it gets the point across that the query may have zero, one, or more AND/OR statements.
Avatar of snoyes_jw
snoyes_jw
Flag of United States of America image

Change the ? at the end to a *, and add something just inside the opening bracket to handle whitespace.

Of course, it becomes even more complicated when you consider that you need to handle parentheses inside the query:
SELECT column FROM table WHERE condition1 AND (condition2 OR condition3);

In that case, you turn to something like BNF.
Avatar of headzoo
headzoo

ASKER

I tried that, but it didn't work.  The regexp matches, but it doesn't capture the back references in the second, third, etc, AND/OR statements.  I'm not worried about all that other wacky things that could be in an SQL query.  This is more about me just being curious how something like this works.
Avatar of Roonaan
You can also use preg_split with PREG_SPLIT_DELIM_CAPTURE. That always works out nice in this kind of situations.

-r-
Avatar of headzoo

ASKER

What I was thinking is using a regexp to bet all the AND/OR statements in one back reference, and then run another one to break it up.  So if the query looks like this:

SELECT column FROM table
WHERE column = 'something'
AND column = 'something'
OR column = 'something'
LIMIT 1

I could get just the "AND column = 'something' OR column = 'something'", and then run another regexp on that. I'll have to look into it.
how's about using preg_match_all y'all?

$regex = "((AND|OR)\\s+([a-zA-Z0-9\-_]+)\\s*([=<>\!]+)\\s*\'?([a-zA-Z0-9\-_]+)\'?(\\s+|$))";

$num_of_matches = preg_match_all($regex, $query, $matches);

print_r($matches);
of course you need to make the regex correct :S


$regex = "/((AND|OR)\\s+([a-zA-Z0-9\-_]+)\\s*([=<>\!]+)\\s*\'?([a-zA-Z0-9\-_]+)\'?(\\s+|$))/is";
here's my more complete version:

<pre>
<?php
$query = "SELECT * FROM `tablename` WHERE `foo`='bar' AND `bar`=5 OR `cheese`='nice' AND beer='too'";
$regex = "/(WHERE|AND|OR)\\s+`?(\\w+)`?\\s*=\\s*'?(\\w+)'?(\\s*|$)/is";

preg_match_all($regex, $query, $matches, PREG_SET_ORDER);

print_r($matches);

?>
</pre>

test it here:

http://www.keteracel.com/test/test681.php

cheers,

keteracel
ASKER CERTIFIED SOLUTION
Avatar of keteracel
keteracel
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Have you concidered working your code out locally before posting keteracel?

It is nice to view your progress and incremental code, but might be easier for the Questioner to keep track of what is happening. From the emails he received he could have deducted that there are 4 more alternative solutions, which isn't the case.

-r-
Well, now I have another email from your flame ;)
<?
die("lol");
?>