Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Implement Boolean Search with Regular Expression

Posted on 2014-09-04
10
Medium Priority
?
310 Views
Last Modified: 2014-09-05
Hi RegEx Experts,

I want to implement a general Boolean search (AND, OR, NOT) by using a Regular Expression. Ideally, I'd like it to support parentheses to determine precedence, but if that's too tough, I can live with evaluating NOT first, AND second, OR third. I'm fine with saying that the Boolean search cannot search for AND, OR, NOT — that is, they are always interpreted as Boolean operators, not as search strings.

I understand that "|" is OR, but I don't know how to use RegEx to implement either NOT or AND. An example will be helpful. Let's say the Boolean search is:

microsoft AND windows OR vbscript NOT unix

Using default precedence of NOT first, AND second, OR third, this would get hits that do not have "unix" but do have either (1) both "microsoft" and "windows" or (2) "vbscript". What is the RegEx equivalent search?

Likewise, if the Boolean expression allows parentheses to determine precedence, let's say the Boolean search is:

microsoft AND (windows OR vbscript) NOT unix

Using this precedence, it would get hits that do not have "unix" but do have both "microsoft" and either "windows" or "vbscript". What is the RegEx equivalent search?

Thanks, Joe
0
Comment
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
  • 2
  • +3
10 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 1000 total points
ID: 40305101
^(?=.*microsoft)(?=.*(windows|vbscript))(?!.*unix)
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 1000 total points
ID: 40305107
If you want
((microsoft AND windows) OR vbscript) AND NOT unix
that would be
^(((?=.*microsoft)(?=.*windows))|.*vbscript)(?!.*unix)
0
 
LVL 35

Assisted Solution

by:Dan Craciun
Dan Craciun earned 400 total points
ID: 40305167
Joe, this can get very confusing very quickly. If you need to maintain that code, I suggest you read first about lookaheads and the fact that using alternation is not really an OR, because when you switch the terms you can get different results.
A simple example:
test string: aba
regex1: (a|b)  -> result: a
regex2: (b|a) -> result: b

HTH,
Dan
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
LVL 46

Assisted Solution

by:aikimark
aikimark earned 300 total points
ID: 40305168
for a well formed boolean expression you will need to connect the NOT unix condition with an AND operator.
0
 
LVL 75

Assisted Solution

by:käµfm³d 👽
käµfm³d   👽 earned 300 total points
ID: 40305237
@Dan Craciun

What regex engine are you using such that that is the case (concerning alternation)?
0
 
LVL 35

Assisted Solution

by:Dan Craciun
Dan Craciun earned 400 total points
ID: 40305248
Yup, I  messed up the example :)
test string: aba
regex1: (a|ab)  -> result: a
regex2: (ab|a) -> result: ab

PS: @kaufmed can you please use copy/paste for my name? Obviously Crucian means something to you (I think it's the 3rd time you spell it like that), but it's not my name. Thanks.
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 40305262
Just simply reading too fast. I usually try to copy/paste for that reason  = )
0
 
LVL 55

Author Closing Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 40306726
First, my thanks to everyone who replied! And now the details:

o  Both RegEx searches from ozo worked perfectly! Well done!

o  I appreciate the links from Dan for lookaheads and alternation.

o  Thanks to aikimark for correcting my "NOT" Boolean syntax (which was also noted by ozo in his second post).

o  Thanks to kaufmed for the comment about Dan's alternation example, resulting in Dan's update to it.

Regards, Joe
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 40306936
This is why I love E-E.  Great question, great dialog and great learning example!

Best to all, ~Ray
0
 
LVL 55

Author Comment

by:Joe Winograd, EE MVE 2015&2016
ID: 40306953
Agree completely, Ray! Regards, Joe
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
This article discusses how to create an extensible mechanism for linked drop downs.
The viewer will learn how to count occurrences of each item in an array.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

722 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question