Link to home
Start Free TrialLog in
Avatar of sscotti
sscottiFlag for United States of America

asked on

Regular Expression to find search text between parentheses

I am looking for a regular expression to use in a PHP preg_match function call that will find a search text anywhere in the search string where the search text is between open and close parentheses, i.e. like:

$postVal = "find me"
$pdfdata = "This is a test of a string that (will find me between) parentheses.

I have this for starter, but not quite correct.  I basically want to find whole words or phrases that are between the parentheses, as this how it is formatted in the pdf document that I am searching.

preg_match('/\([^\(\r\n]'.$postVal.'\)/i', $pdfdata)
Avatar of Terry Woods
Terry Woods
Flag of New Zealand image

Something like this?

$postVal = "find me";
$pdfdata = "This is a test of a string that (will find me between) parentheses.";
preg_match('/\([^\(\r\n]*'.$postVal.'[^\(\r\n]*\)/i', $pdfdata, $matches);
print_r($matches);

Output:
Array
(
    [0] => (will find me between)
)
If you want do that with javascript, then it is very simple
var str = "This is test string (which you are looking for)";
alert(str.replace(/^.*\((.*)\).*$/m, '$1'));

Open in new window


Live Demo :

http://jsfiddle.net/R8WGt/
An interesting wrinkle on this question... What if the parenthetical expression contains a parenthetical expression?
ASKER CERTIFIED SOLUTION
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of sscotti

ASKER

Thanks for the input.  Will award points shortly.  Just curious.  The  application here is that I am searching for keywords or text in a converted PDF document that has been OCR'ed or saved with Adobe Acrobat from Powerpoint.  As an example, there is a lot of formatting data in the document and other data that looks like the text that I am searching for.  e.g.


Following looks like text that appears on my ppt slides.


Q
BT
0.19 0.147 0.152 0 k
/TT0 1 Tf
24 0 0 24 195.873 160.2 Tm
[(Submitted by: Dr. Gay, M.D. )250( )]TJ
3.861 -1.167 Td
[(Professor of Radiology)250( )]TJ
3.333 -1.208 Td
(8/25/11)Tj
0.149 0.113 0.118 0 k
/TT1 1 Tf
2.806 0 Td
( )Tj
ET


......


BT
0 0 0 0 k
/TT0 1 Tf
44 0 0 44 99.125 491.7999 Tm
[(Based on these images, what is)233( )]TJ
1.05 -1.182 Td
[(the most likely diagnosis?)250( )]TJ
0.029 0 0.342 0 k
32 0 0 32 131.125 366.5 Tm
(1.)Tj
0.022 0 0.281 0 k
/C2_0 1 Tf
<0001>Tj
0 0 0 0 k
/TT0 1 Tf
1.361 0 Td
(Invasive ductal carcinoma )Tj

following looks like text elsewhere in the document:

(6y´Î"†ú9¿•#Ngì´…°úécè-ïIüñyà·ÿPåÛEné]s˜x¿›/´ÚvªYï)Ò—lF°œp+Å¿¿€3SJ°Û,N¿¿F8F!J[LëÌŸ/ج Gäuuu9Ò˜µC¿p†Áz.MT»±oY<K*S°2o´ª]±Ï€°~=6¿Ä`Í2Ë!/§£›.~¿´=RJTòä*ja.®Ô#i[  ÿÓÏf‚¿‹ôL¶¿ {qÕ› WHpn€is”ƒQë/¥_7*ëKsÿgj¿¿¶5æåµ÷¿ÆOBz7Á•àS#ÁÇ–F Y»0é(óv‡/\g1õ^¿}º)ôœmgaY¿?.ãí4râ)


If I search for "gay" in this case there is a match in the ppt text portion and a match in the "elsewhere" data display above (gaY) at the end of the string.

Just wondering if there is some standard method to search for text words or  phrases within a pdf document, which is what I am trying to do.  The solution above works for the most part since what I am search for (e.g. idiopathic, lipoma, etc.) is probably only going to show up in the text data, but some short words may be in the encoded text as well.  I am sure there are tools or methods already out there for doing that sort of thing.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial