Solved

Implement Boolean Search with Regular Expression

Posted on 2014-09-04
10
286 Views
Last Modified: 2014-09-05
Hi RegEx Experts,

I want to implement a general Boolean search (AND, OR, NOT) by using a Regular Expression. Ideally, I'd like it to support parentheses to determine precedence, but if that's too tough, I can live with evaluating NOT first, AND second, OR third. I'm fine with saying that the Boolean search cannot search for AND, OR, NOT — that is, they are always interpreted as Boolean operators, not as search strings.

I understand that "|" is OR, but I don't know how to use RegEx to implement either NOT or AND. An example will be helpful. Let's say the Boolean search is:

microsoft AND windows OR vbscript NOT unix

Using default precedence of NOT first, AND second, OR third, this would get hits that do not have "unix" but do have either (1) both "microsoft" and "windows" or (2) "vbscript". What is the RegEx equivalent search?

Likewise, if the Boolean expression allows parentheses to determine precedence, let's say the Boolean search is:

microsoft AND (windows OR vbscript) NOT unix

Using this precedence, it would get hits that do not have "unix" but do have both "microsoft" and either "windows" or "vbscript". What is the RegEx equivalent search?

Thanks, Joe
0
Comment
Question by:Joe Winograd, EE MVE
  • 2
  • 2
  • 2
  • +3
10 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 250 total points
ID: 40305101
^(?=.*microsoft)(?=.*(windows|vbscript))(?!.*unix)
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 250 total points
ID: 40305107
If you want
((microsoft AND windows) OR vbscript) AND NOT unix
that would be
^(((?=.*microsoft)(?=.*windows))|.*vbscript)(?!.*unix)
0
 
LVL 34

Assisted Solution

by:Dan Craciun
Dan Craciun earned 100 total points
ID: 40305167
Joe, this can get very confusing very quickly. If you need to maintain that code, I suggest you read first about lookaheads and the fact that using alternation is not really an OR, because when you switch the terms you can get different results.
A simple example:
test string: aba
regex1: (a|b)  -> result: a
regex2: (b|a) -> result: b

HTH,
Dan
0
 
LVL 45

Assisted Solution

by:aikimark
aikimark earned 75 total points
ID: 40305168
for a well formed boolean expression you will need to connect the NOT unix condition with an AND operator.
0
 
LVL 75

Assisted Solution

by:käµfm³d 👽
käµfm³d   👽 earned 75 total points
ID: 40305237
@Dan Craciun

What regex engine are you using such that that is the case (concerning alternation)?
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 34

Assisted Solution

by:Dan Craciun
Dan Craciun earned 100 total points
ID: 40305248
Yup, I  messed up the example :)
test string: aba
regex1: (a|ab)  -> result: a
regex2: (ab|a) -> result: ab

PS: @kaufmed can you please use copy/paste for my name? Obviously Crucian means something to you (I think it's the 3rd time you spell it like that), but it's not my name. Thanks.
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 40305262
Just simply reading too fast. I usually try to copy/paste for that reason  = )
0
 
LVL 52

Author Closing Comment

by:Joe Winograd, EE MVE
ID: 40306726
First, my thanks to everyone who replied! And now the details:

o  Both RegEx searches from ozo worked perfectly! Well done!

o  I appreciate the links from Dan for lookaheads and alternation.

o  Thanks to aikimark for correcting my "NOT" Boolean syntax (which was also noted by ozo in his second post).

o  Thanks to kaufmed for the comment about Dan's alternation example, resulting in Dan's update to it.

Regards, Joe
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 40306936
This is why I love E-E.  Great question, great dialog and great learning example!

Best to all, ~Ray
0
 
LVL 52

Author Comment

by:Joe Winograd, EE MVE
ID: 40306953
Agree completely, Ray! Regards, Joe
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
PHP Healthcheck 2 85
phpmyadmin 3 33
Checking CSRF token within a function 36 13
str_replace not working in php script 4 9
Things That Drive Us Nuts Have you noticed the use of the reCaptcha feature at EE and other web sites?  It wants you to read and retype something that looks like this.Insanity!  It's not EE's fault - that's just the way reCaptcha works.  But it is …
Since pre-biblical times, humans have sought ways to keep secrets, and share the secrets selectively.  This article explores the ways PHP can be used to hide and encrypt information.
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

920 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now