Solved

Implement Boolean Search with Regular Expression

Posted on 2014-09-04
10
280 Views
Last Modified: 2014-09-05
Hi RegEx Experts,

I want to implement a general Boolean search (AND, OR, NOT) by using a Regular Expression. Ideally, I'd like it to support parentheses to determine precedence, but if that's too tough, I can live with evaluating NOT first, AND second, OR third. I'm fine with saying that the Boolean search cannot search for AND, OR, NOT — that is, they are always interpreted as Boolean operators, not as search strings.

I understand that "|" is OR, but I don't know how to use RegEx to implement either NOT or AND. An example will be helpful. Let's say the Boolean search is:

microsoft AND windows OR vbscript NOT unix

Using default precedence of NOT first, AND second, OR third, this would get hits that do not have "unix" but do have either (1) both "microsoft" and "windows" or (2) "vbscript". What is the RegEx equivalent search?

Likewise, if the Boolean expression allows parentheses to determine precedence, let's say the Boolean search is:

microsoft AND (windows OR vbscript) NOT unix

Using this precedence, it would get hits that do not have "unix" but do have both "microsoft" and either "windows" or "vbscript". What is the RegEx equivalent search?

Thanks, Joe
0
Comment
Question by:Joe Winograd, EE MVE
  • 2
  • 2
  • 2
  • +3
10 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 250 total points
Comment Utility
^(?=.*microsoft)(?=.*(windows|vbscript))(?!.*unix)
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 250 total points
Comment Utility
If you want
((microsoft AND windows) OR vbscript) AND NOT unix
that would be
^(((?=.*microsoft)(?=.*windows))|.*vbscript)(?!.*unix)
0
 
LVL 34

Assisted Solution

by:Dan Craciun
Dan Craciun earned 100 total points
Comment Utility
Joe, this can get very confusing very quickly. If you need to maintain that code, I suggest you read first about lookaheads and the fact that using alternation is not really an OR, because when you switch the terms you can get different results.
A simple example:
test string: aba
regex1: (a|b)  -> result: a
regex2: (b|a) -> result: b

HTH,
Dan
0
 
LVL 45

Assisted Solution

by:aikimark
aikimark earned 75 total points
Comment Utility
for a well formed boolean expression you will need to connect the NOT unix condition with an AND operator.
0
 
LVL 74

Assisted Solution

by:käµfm³d 👽
käµfm³d   👽 earned 75 total points
Comment Utility
@Dan Craciun

What regex engine are you using such that that is the case (concerning alternation)?
0
Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

 
LVL 34

Assisted Solution

by:Dan Craciun
Dan Craciun earned 100 total points
Comment Utility
Yup, I  messed up the example :)
test string: aba
regex1: (a|ab)  -> result: a
regex2: (ab|a) -> result: ab

PS: @kaufmed can you please use copy/paste for my name? Obviously Crucian means something to you (I think it's the 3rd time you spell it like that), but it's not my name. Thanks.
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
Comment Utility
Just simply reading too fast. I usually try to copy/paste for that reason  = )
0
 
LVL 51

Author Closing Comment

by:Joe Winograd, EE MVE
Comment Utility
First, my thanks to everyone who replied! And now the details:

o  Both RegEx searches from ozo worked perfectly! Well done!

o  I appreciate the links from Dan for lookaheads and alternation.

o  Thanks to aikimark for correcting my "NOT" Boolean syntax (which was also noted by ozo in his second post).

o  Thanks to kaufmed for the comment about Dan's alternation example, resulting in Dan's update to it.

Regards, Joe
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
This is why I love E-E.  Great question, great dialog and great learning example!

Best to all, ~Ray
0
 
LVL 51

Author Comment

by:Joe Winograd, EE MVE
Comment Utility
Agree completely, Ray! Regards, Joe
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now