sscotti
asked on
Regular Expression to find search text between parentheses
I am looking for a regular expression to use in a PHP preg_match function call that will find a search text anywhere in the search string where the search text is between open and close parentheses, i.e. like:
$postVal = "find me"
$pdfdata = "This is a test of a string that (will find me between) parentheses.
I have this for starter, but not quite correct. I basically want to find whole words or phrases that are between the parentheses, as this how it is formatted in the pdf document that I am searching.
preg_match('/\([^\(\r\n]'. $postVal.' \)/i', $pdfdata)
$postVal = "find me"
$pdfdata = "This is a test of a string that (will find me between) parentheses.
I have this for starter, but not quite correct. I basically want to find whole words or phrases that are between the parentheses, as this how it is formatted in the pdf document that I am searching.
preg_match('/\([^\(\r\n]'.
If you want do that with javascript, then it is very simple
Live Demo :
http://jsfiddle.net/R8WGt/
var str = "This is test string (which you are looking for)";
alert(str.replace(/^.*\((.*)\).*$/m, '$1'));
Live Demo :
http://jsfiddle.net/R8WGt/
An interesting wrinkle on this question... What if the parenthetical expression contains a parenthetical expression?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks for the input. Will award points shortly. Just curious. The application here is that I am searching for keywords or text in a converted PDF document that has been OCR'ed or saved with Adobe Acrobat from Powerpoint. As an example, there is a lot of formatting data in the document and other data that looks like the text that I am searching for. e.g.
Following looks like text that appears on my ppt slides.
Q
BT
0.19 0.147 0.152 0 k
/TT0 1 Tf
24 0 0 24 195.873 160.2 Tm
[(Submitted by: Dr. Gay, M.D. )250( )]TJ
3.861 -1.167 Td
[(Professor of Radiology)250( )]TJ
3.333 -1.208 Td
(8/25/11)Tj
0.149 0.113 0.118 0 k
/TT1 1 Tf
2.806 0 Td
( )Tj
ET
......
BT
0 0 0 0 k
/TT0 1 Tf
44 0 0 44 99.125 491.7999 Tm
[(Based on these images, what is)233( )]TJ
1.05 -1.182 Td
[(the most likely diagnosis?)250( )]TJ
0.029 0 0.342 0 k
32 0 0 32 131.125 366.5 Tm
(1.)Tj
0.022 0 0.281 0 k
/C2_0 1 Tf
<0001>Tj
0 0 0 0 k
/TT0 1 Tf
1.361 0 Td
(Invasive ductal carcinoma )Tj
following looks like text elsewhere in the document:
(6y´Î"†ú9¿•#Ngì´…°úé cè-ïIüñyà· ÿPåÛEné]s ˜x¿›/´ÚvªY ï)Ò—lF°œp +Å¿¿€3SJ°Û ,N¿¿F8F!J [LëÌŸ/ج Gäuuu9Ò˜µC¿p†Áz.MT»±oY<K *S°2o´ª]±Ï€°~=6¿Ä`Í2Ë!/§£›.~ ¿´=RJTòä*j a.®Ô#i[ÿÓÏf‚¿‹ôL¶¿{qÕ›WHpn€is”ƒQë/¥_7*ëKsÿgj ¿¿¶5æå µ÷¿ÆOBz7Á •àS#ÁÇ–FY»0é(óv‡/\g1õ^¿}º)ôœmg aY¿?.ãí4r â)
If I search for "gay" in this case there is a match in the ppt text portion and a match in the "elsewhere" data display above (gaY) at the end of the string.
Just wondering if there is some standard method to search for text words or phrases within a pdf document, which is what I am trying to do. The solution above works for the most part since what I am search for (e.g. idiopathic, lipoma, etc.) is probably only going to show up in the text data, but some short words may be in the encoded text as well. I am sure there are tools or methods already out there for doing that sort of thing.
Following looks like text that appears on my ppt slides.
Q
BT
0.19 0.147 0.152 0 k
/TT0 1 Tf
24 0 0 24 195.873 160.2 Tm
[(Submitted by: Dr. Gay, M.D. )250( )]TJ
3.861 -1.167 Td
[(Professor of Radiology)250( )]TJ
3.333 -1.208 Td
(8/25/11)Tj
0.149 0.113 0.118 0 k
/TT1 1 Tf
2.806 0 Td
( )Tj
ET
......
BT
0 0 0 0 k
/TT0 1 Tf
44 0 0 44 99.125 491.7999 Tm
[(Based on these images, what is)233( )]TJ
1.05 -1.182 Td
[(the most likely diagnosis?)250( )]TJ
0.029 0 0.342 0 k
32 0 0 32 131.125 366.5 Tm
(1.)Tj
0.022 0 0.281 0 k
/C2_0 1 Tf
<0001>Tj
0 0 0 0 k
/TT0 1 Tf
1.361 0 Td
(Invasive ductal carcinoma )Tj
following looks like text elsewhere in the document:
(6y´Î"†ú9¿•#Ngì´…°úé
If I search for "gay" in this case there is a match in the ppt text portion and a match in the "elsewhere" data display above (gaY) at the end of the string.
Just wondering if there is some standard method to search for text words or phrases within a pdf document, which is what I am trying to do. The solution above works for the most part since what I am search for (e.g. idiopathic, lipoma, etc.) is probably only going to show up in the text data, but some short words may be in the encoded text as well. I am sure there are tools or methods already out there for doing that sort of thing.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
$postVal = "find me";
$pdfdata = "This is a test of a string that (will find me between) parentheses.";
preg_match('/\([^\(\r\n]*'
print_r($matches);
Output:
Array
(
[0] => (will find me between)
)