Solved

Regular Expression to find search text between parentheses

Posted on 2011-09-12
7
559 Views
Last Modified: 2012-08-14
I am looking for a regular expression to use in a PHP preg_match function call that will find a search text anywhere in the search string where the search text is between open and close parentheses, i.e. like:

$postVal = "find me"
$pdfdata = "This is a test of a string that (will find me between) parentheses.

I have this for starter, but not quite correct.  I basically want to find whole words or phrases that are between the parentheses, as this how it is formatted in the pdf document that I am searching.

preg_match('/\([^\(\r\n]'.$postVal.'\)/i', $pdfdata)
0
Comment
Question by:sscotti
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
7 Comments
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36526830
Something like this?

$postVal = "find me";
$pdfdata = "This is a test of a string that (will find me between) parentheses.";
preg_match('/\([^\(\r\n]*'.$postVal.'[^\(\r\n]*\)/i', $pdfdata, $matches);
print_r($matches);

Output:
Array
(
    [0] => (will find me between)
)
0
 
LVL 17

Expert Comment

by:sonawanekiran
ID: 36527426
If you want do that with javascript, then it is very simple
var str = "This is test string (which you are looking for)";
alert(str.replace(/^.*\((.*)\).*$/m, '$1'));

Open in new window


Live Demo :

http://jsfiddle.net/R8WGt/
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 36528697
An interesting wrinkle on this question... What if the parenthetical expression contains a parenthetical expression?
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 110

Accepted Solution

by:
Ray Paseur earned 450 total points
ID: 36528882
This seems to work OK for at least some of the edge cases.  When you try to parse strings with REGEX the result almost always contains some surprises.  I often like to use explode() first to limit the exposure to unpredictable external data.

You can see the output of this script on my server here.  As you will see, the absence of a closing paren creates ambiguity about the string we want to find.  You might also want to decide whether the search should be case-sensitive or not.  See line 39.
http://www.laprbass.com/RAY_temp_sscotti.php

Best of luck with your project, ~Ray
<?php // RAY_temp_sscotti.php
error_reporting(E_ALL);
echo "<pre>";


// SOME TEST DATA
$pdfdatas = array
( "This is a test of a string that (will find me between) parentheses."
, "This has parentheses but (not the search argument) and also (find me here) the search arg."
, "Here is (a layered (parenthetical (find me expression) in an unbalanced) string"
, "(find me)"
, "This ought to ((find me, too!))"
, "This has multiples of (find me, one) and (find me, too)."
, "This has nothing."
, "This has nothing ()."
, "This has nothing (of value)."
, "This is interesting (I am curious - find me here at the end?"
)
;

// THE SEARCH STRING IS PROBABLY EXTERNAL DATA
$postVal = "find me";

// PREPARE THE EXTERNAL DATA
$s = preg_quote($postVal);


// CONSTRUCT A REGEX
$r
= '#'           // REGEX DELIMITER
. '[(]{1}'      // A CHARACTER CLASS OF OPEN PAREN
. '('           // START A GROUP
. '.*?'         // ANYTHING OR NOTHING
. $s            // THE PREPARED SEARCH STRING
. '.*?'         // ANYTHING OR NOTHING
. ')'           // ENDOF A GROUP
. '[)]{1}'      // A CHARACTER CLASS OF CLOSING PAREN
. '#'           // REGEX DELIMITER
. 'i'           // CASE-INSENSITIVE
;


// TEST EACH OF THE STRINGS
foreach ($pdfdatas as $p)
{
    // SHOW THE ORIGINAL STRING
    echo PHP_EOL;
    echo htmlentities($p);
    echo PHP_EOL;

    // MAKE THE MATCH
    preg_match_all($r, $p, $m);

    // IF THERE IS A FINDING IT IS IN THE GROUP AT $m[1]
    if (!empty($m[1]))
    {
        // THERE MIGHT BE MULTIPLES
        foreach ($m[1] as $n)
        {
            // HANDLES UNBALANCED PARENTHESES
            $a = explode('(', $n);
            $f = end($a);

            // SHOW THE STRING WE FOUND
            var_dump($f);
        }
    }
    else
    {
        echo "NO MATCH" . PHP_EOL;
    }
}

Open in new window

0
 
LVL 5

Author Comment

by:sscotti
ID: 36531873
Thanks for the input.  Will award points shortly.  Just curious.  The  application here is that I am searching for keywords or text in a converted PDF document that has been OCR'ed or saved with Adobe Acrobat from Powerpoint.  As an example, there is a lot of formatting data in the document and other data that looks like the text that I am searching for.  e.g.


Following looks like text that appears on my ppt slides.


Q
BT
0.19 0.147 0.152 0 k
/TT0 1 Tf
24 0 0 24 195.873 160.2 Tm
[(Submitted by: Dr. Gay, M.D. )250( )]TJ
3.861 -1.167 Td
[(Professor of Radiology)250( )]TJ
3.333 -1.208 Td
(8/25/11)Tj
0.149 0.113 0.118 0 k
/TT1 1 Tf
2.806 0 Td
( )Tj
ET


......


BT
0 0 0 0 k
/TT0 1 Tf
44 0 0 44 99.125 491.7999 Tm
[(Based on these images, what is)233( )]TJ
1.05 -1.182 Td
[(the most likely diagnosis?)250( )]TJ
0.029 0 0.342 0 k
32 0 0 32 131.125 366.5 Tm
(1.)Tj
0.022 0 0.281 0 k
/C2_0 1 Tf
<0001>Tj
0 0 0 0 k
/TT0 1 Tf
1.361 0 Td
(Invasive ductal carcinoma )Tj

following looks like text elsewhere in the document:

(6y´Î"†ú9¿•#Ngì´…°úécè-ïIüñyà·ÿPåÛEné]s˜x¿›/´ÚvªYï)Ò—lF°œp+Å¿¿€3SJ°Û,N¿¿F8F!J[LëÌŸ/ج Gäuuu9Ò˜µC¿p†Áz.MT»±oY<K*S°2o´ª]±Ï€°~=6¿Ä`Í2Ë!/§£›.~¿´=RJTòä*ja.®Ô#i[  ÿÓÏf‚¿‹ôL¶¿ {qÕ› WHpn€is”ƒQë/¥_7*ëKsÿgj¿¿¶5æåµ÷¿ÆOBz7Á•àS#ÁÇ–F Y»0é(óv‡/\g1õ^¿}º)ôœmgaY¿?.ãí4râ)


If I search for "gay" in this case there is a match in the ppt text portion and a match in the "elsewhere" data display above (gaY) at the end of the string.

Just wondering if there is some standard method to search for text words or  phrases within a pdf document, which is what I am trying to do.  The solution above works for the most part since what I am search for (e.g. idiopathic, lipoma, etc.) is probably only going to show up in the text data, but some short words may be in the encoded text as well.  I am sure there are tools or methods already out there for doing that sort of thing.
0
 
LVL 35

Assisted Solution

by:Terry Woods
Terry Woods earned 50 total points
ID: 36532714
You could use a positive lookahead to ensure that x many characters following the word you're looking for are within an expected set of characters. If you're still wanting to find a particular word within parentheses, you can add a lookahead to it like this:

preg_match('/\([^\(\r\n]*'.$postVal.'(?=[\w\s!@#$%^&*()\-=+\[\]{};\':",.\/<>?\\|`~]{3})[^\(\r\n]*\)/i', $pdfdata, $matches);

This bit was added:
(?=[\w\s!@#$%^&*()\-=+\[\]{};\':",.\/<>?\\|`~]{3})

The given pattern requires any 3 keyboard characters (also including tab or CR/LF) to follow the given value.
0
 
LVL 110

Assisted Solution

by:Ray Paseur
Ray Paseur earned 450 total points
ID: 36535855
When you try to parse strings with REGEX the result almost always contains some surprises.  I often like to use explode() first to limit the exposure to unpredictable external data. ... and ... You might also want to decide whether the search should be case-sensitive or not.  See line 39.

Parsing text for meaning is difficult enough without the added complexity of PDF and PPT markup, formatting and layout.  That's why I rarely try to do everything in a single statement.  A few extra lines of code add a lot of power and flexibility.

Two suggestions... One, if you want help searching a PDF document, you will get better results if you post a representative PDF document (or three) for us to work with.  And Two, there are many prefabricated search machines that can search PDF documents very capably.  The Atomz engine is one that I have used successfully for many years.  The Wrensoft Zoom indexer does a good job, too.

Best of luck with your project, ~Ray
0

Featured Post

Revamp Your Training Process

Drastically shorten your training time with WalkMe's advanced online training solution that Guides your trainees to action.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

This article demonstrates how to create a simple responsive confirmation dialog with Ok and Cancel buttons using HTML, CSS, jQuery and Promises
Introduction This article is intended for those who are new to PHP error handling (https://www.experts-exchange.com/articles/11769/And-by-the-way-I-am-New-to-PHP.html).  It addresses one of the most common problems that plague beginning PHP develop…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

710 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question