Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 319
  • Last Modified:

Implement Boolean Search with Regular Expression

Hi RegEx Experts,

I want to implement a general Boolean search (AND, OR, NOT) by using a Regular Expression. Ideally, I'd like it to support parentheses to determine precedence, but if that's too tough, I can live with evaluating NOT first, AND second, OR third. I'm fine with saying that the Boolean search cannot search for AND, OR, NOT — that is, they are always interpreted as Boolean operators, not as search strings.

I understand that "|" is OR, but I don't know how to use RegEx to implement either NOT or AND. An example will be helpful. Let's say the Boolean search is:

microsoft AND windows OR vbscript NOT unix

Using default precedence of NOT first, AND second, OR third, this would get hits that do not have "unix" but do have either (1) both "microsoft" and "windows" or (2) "vbscript". What is the RegEx equivalent search?

Likewise, if the Boolean expression allows parentheses to determine precedence, let's say the Boolean search is:

microsoft AND (windows OR vbscript) NOT unix

Using this precedence, it would get hits that do not have "unix" but do have both "microsoft" and either "windows" or "vbscript". What is the RegEx equivalent search?

Thanks, Joe
0
Joe Winograd, EE MVE 2015&2016
Asked:
Joe Winograd, EE MVE 2015&2016
  • 2
  • 2
  • 2
  • +3
6 Solutions
 
ozoCommented:
^(?=.*microsoft)(?=.*(windows|vbscript))(?!.*unix)
0
 
ozoCommented:
If you want
((microsoft AND windows) OR vbscript) AND NOT unix
that would be
^(((?=.*microsoft)(?=.*windows))|.*vbscript)(?!.*unix)
0
 
Dan CraciunIT ConsultantCommented:
Joe, this can get very confusing very quickly. If you need to maintain that code, I suggest you read first about lookaheads and the fact that using alternation is not really an OR, because when you switch the terms you can get different results.
A simple example:
test string: aba
regex1: (a|b)  -> result: a
regex2: (b|a) -> result: b

HTH,
Dan
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
aikimarkCommented:
for a well formed boolean expression you will need to connect the NOT unix condition with an AND operator.
0
 
käµfm³d 👽Commented:
@Dan Craciun

What regex engine are you using such that that is the case (concerning alternation)?
0
 
Dan CraciunIT ConsultantCommented:
Yup, I  messed up the example :)
test string: aba
regex1: (a|ab)  -> result: a
regex2: (ab|a) -> result: ab

PS: @kaufmed can you please use copy/paste for my name? Obviously Crucian means something to you (I think it's the 3rd time you spell it like that), but it's not my name. Thanks.
0
 
käµfm³d 👽Commented:
Just simply reading too fast. I usually try to copy/paste for that reason  = )
0
 
Joe Winograd, EE MVE 2015&2016DeveloperAuthor Commented:
First, my thanks to everyone who replied! And now the details:

o  Both RegEx searches from ozo worked perfectly! Well done!

o  I appreciate the links from Dan for lookaheads and alternation.

o  Thanks to aikimark for correcting my "NOT" Boolean syntax (which was also noted by ozo in his second post).

o  Thanks to kaufmed for the comment about Dan's alternation example, resulting in Dan's update to it.

Regards, Joe
0
 
Ray PaseurCommented:
This is why I love E-E.  Great question, great dialog and great learning example!

Best to all, ~Ray
0
 
Joe Winograd, EE MVE 2015&2016DeveloperAuthor Commented:
Agree completely, Ray! Regards, Joe
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

  • 2
  • 2
  • 2
  • +3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now