Solved

Regular Expression for specific matches

Posted on 2013-05-17
15
432 Views
Last Modified: 2013-05-24
Hi Experts,

I have the initial regexp that does certain matches:

\{\s*(__)\s*([^\}]*)\}([^\{]*)\{\/\1\}

Open in new window


Here is an example of what it is doing:

Source string:
{__ some_text1="some_value1" some_text2=some_value2}Some text inside{/__}

Open in new window


Matches result:
arr[0] = '{__ some_text1="some_value1" some_text2=some_value2}Som<wbr ></wbr>e text inside{/__}';
arr[1] = '__';
arr[2] = 'some_text1="some_value1" some_text2=some_value2';
arr[3] = 'Some text inside';

Open in new window



But I need to rebuild the existing regexp to be able to do matches with such source string:
{__ some_text1="{any_text1}some_value{/any_text1} {$any_text2}" some_text2=some_value2}Some {any_text3}text{/any_text3} inside{/__}

Open in new window


Matches result should be:
arr[0] = '{__ some_text1="{any_text1}some_value{/any_text1} {$any_text2}" some_text2=some_value2}Some {any_text3}text{/any_text3} inside{/__}';
arr[1] = '__';
arr[2] = 'some_text1="{any_text1}some_value{/any_text1} {$any_text2}" some_text2=some_value2';
arr[3] = 'Some {any_text3}text{/any_text3} inside';

Open in new window


In other words - I need a way to allow the nested { and } symbols to be inside the matches array.

I've already built one regexp pattern:
\{\s*(__)\s*((?!\{\/\1\})[\s\S]*)*\{\/\1\}

Open in new window


But it is needed to implement the correct matches capture, as at current moment matches result for my pattern is:
arr[0] = '{__ some_text1="some_value1" some_text2=some_value2}Some text inside{/__}';
arr[1] = '__';
arr[2] = 'some_text1="{any_text1}some_value{/any_text1} {$any_text2}" some_text2=some_value2}Some {any_text3}text{/any_text3} inside';

Open in new window


I will be very thankful if somebody will help me to complete with this.
0
Comment
Question by:lightspeedvt
  • 7
  • 4
  • 2
  • +1
15 Comments
 
LVL 76

Expert Comment

by:arnold
ID: 39176968
Trying too.

Could you post the pattern and the resulting match results you want?

The items in () that match are assigned in order.

Your current pattern match on the {} is mostly outside the retain option.
0
 
LVL 6

Author Comment

by:lightspeedvt
ID: 39177026
Actually I've already posted the example of text and the results in order that I need to get (see my question: "But I need to rebuild the existing regexp to be able to do matches with such source string:" and "Matches result should be:").

Coming from logical scheme - it is needed to look for all {__} ... {/__}, but match to certain groups (see bolded):
1. {__ some={text} some2="text"} Here {goes}some{/goes} text {/__}
2. {__ some={text} some2="text"} Here {goes}some{/goes} text {/__}
3. {__ some={text} some2="text"} Here {goes}some{/goes} text {/__}

For #1 - just match to "__"
For #2 - match to all that goes after "{__" but before "}". It is needed to keep the possible nested "{}" and "{/}"
For #3 - almost same as #2, but match only after "}" and before "{/__}".
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39177109
This looks highly theoretical and possibly unnecessarily complicated, almost as if it were an academic assignment.  In particular, the semantic meaning of the right curly brace appears to be ambiguous and therefore would require a contextual analysis that may be made more difficult by the insistence on regular expressions.  Can you do two things for us, please?

First, step back from the technical details and just describe the application in business terms.

Second, please post some of the actual data and desired results, not the generalized test data.

Thanks and regards, ~Ray
0
 
LVL 76

Expert Comment

by:arnold
ID: 39177338
Wat language is this being implemented in?
The matching is a two step process where the first component deals with extracting the entries of interest while the second confirms that the extracted items I.e. item3 and item4 are identical that will confirm the item delineators.
0
 
LVL 6

Author Comment

by:lightspeedvt
ID: 39177536
It is for PHP.

To Ray_Paseur: I don't think that it is complicated for regular expressions, because there are a lot of ways to build the regexp logic. Example of regexp that I've provided is not one that should be extended as to be able to reach the end result. I saw really complicated requirements earlier and how simple regexp was looking. I am not guru in regular expressions, so that is why asking some times others to help me with them.

As for explanation details - I think that had already explained quite detailed. I can provide more data with results:
{__}Some text {text}inside{/text} inside{/__}

arr[0] = '{__}Some text {text}inside{/text} inside{/__}';
arr[1] = '__';
arr[2] = '';
arr[3] = 'Some text {text}inside{/text} inside';

Open in new window

{__ some_text="{some}Text{/some}"}Some text inside{/__}

arr[0] = '{__}Some text {text}inside{/text} inside{/__}';
arr[1] = '__';
arr[2] = 'some_text="{some}Text{/some}"';
arr[3] = 'Some text inside';

Open in new window

{__ some_text="{some}"}Some text inside{/__}

arr[0] = '{__ some_text="{some}"}Some text inside{/__}';
arr[1] = '__';
arr[2] = 'some_text="{some}"';
arr[3] = 'Some text inside';

Open in new window

{__ some_text={some}}Some text inside{/__}

arr[0] = '{__ some_text={some}}Some text inside{/__}';
arr[1] = '__';
arr[2] = 'some_text={some}';
arr[3] = 'Some text inside';

Open in new window

{__ some_text={some} some_other_text="Some text"}Some text inside{/__}

arr[0] = '{__ some_text={some} some_other_text="Some text"}Some text inside{/__}';
arr[1] = '__';
arr[2] = 'some_text={some} some_other_text="Some text"';
arr[3] = 'Some text inside';

Open in new window

{__ some_text={some} some_other_text="Some {text}text{/text}"}Some text inside{/__}

arr[0] = '{__ some_text={some} some_other_text="Some {text}text{/text}"}Some text inside{/__}';
arr[1] = '__';
arr[2] = 'some_text={some} some_other_text="Some {text}text{/text}"';
arr[3] = 'Some text inside';

Open in new window

0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 39177623
OK with me if you don't think it is complicated.  I'll sign off and let someone else deal with it.  Best of luck with your project, ~Ray
0
 
LVL 6

Author Comment

by:lightspeedvt
ID: 39177680
Ray, thanks for your time.
0
What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

 
LVL 45

Expert Comment

by:aikimark
ID: 39177742
This parses the two major sections.
{__\s*(.*?)="(.*)"\s*(.*?)=(.*){\/__}

Open in new window


Using http://Myregextester.com I got the following parsed results.
$matches Array:
(
    [0] => Array
        (
            [0] => {__ some_text1="{any_text1}some_value{/any_text1} {$any_text2}" some_text2=some_value2}Some {any_text3}text{/any_text3} inside{/__}
        )

    [1] => Array
        (
            [0] => some_text1
        )

    [2] => Array
        (
            [0] => {any_text1}some_value{/any_text1} {$any_text2}
        )

    [3] => Array
        (
            [0] => some_text2
        )

    [4] => Array
        (
            [0] => some_value2}Some {any_text3}text{/any_text3} inside
        )

)

Open in new window


As Ray suggested, you would be better off if you parsed the individual matches with a different pattern.
0
 
LVL 76

Expert Comment

by:arnold
ID: 39177753
You can continue using the pattern the issue is that you want to match the {key}entry{/key}which deals with a match on the first portion and then a comparison on the

/pattern \{(match)\}(item)\{\/(match\}/ && $1 == $2

This way if the items are not the same, the conditional will fail.
Are you dealing with XML type data?
What is the source of this data?

I'll take a look later at the string and.
0
 
LVL 6

Author Comment

by:lightspeedvt
ID: 39177764
It is PHP Smarty Templates data format, so it can't be used as XML data type, as there could be some non-XML formatting things inside.

Source of those data - I've build myself. It just follows the Smarty Templates syntax. I can provide more examples.

In PHP I am just using preg_match_all:
preg_match_all(
	$pattern,
	$content,
	$matches
);

Open in new window


Also, we can avoid equality comparison for text inside {} and {/}, because they should be equal by the standard. So, they are always equal.
0
 
LVL 6

Author Comment

by:lightspeedvt
ID: 39177812
@aikimark: Last result is not correctly matched (#4), as there could be no "" characters. Parsing individual matches with different pattern may overload server too much.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39178091
Why don't you try it and see.  You've got a few different patterns that you could iterate through and test for a pattern match before actually doing the parsing.  I think you will need to use some backreferences in your patterns.
0
 
LVL 76

Expert Comment

by:arnold
ID: 39178693
To accomplish what you want using regex, you have to use sequential comparisons.
There is no single regex expression that will match all while providing context which I believe is important and will likely be down the line.

Are you trying to reverse engineer/mimic what smarty template does?
0
 
LVL 6

Accepted Solution

by:
lightspeedvt earned 0 total points
ID: 39179509
Well, seems I was right. Very often tasks seems to be complicated, but solution in regexp could be very simple. I've build pattern with very basic logic on my own:

\{\s*(__)\s*((?:[^\{\}]*\{[^\}]*\})*[^\}]*)\}(.*?)\{\/\1\}

Open in new window

0
 
LVL 6

Author Closing Comment

by:lightspeedvt
ID: 39193729
Resolved myself.
0

Featured Post

Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

Join & Write a Comment

Part of the Global Positioning System A geocode (https://developers.google.com/maps/documentation/geocoding/) is the major subset of a GPS coordinate (http://en.wikipedia.org/wiki/Global_Positioning_System), the other parts being the altitude and t…
How to remove superseded packages in windows w60 or w61 installation media (.wim) or online system to prevent unnecessary space. w60 means Windows Vista or Windows Server 2008. w61 means Windows 7 or Windows Server 2008 R2. There are various …
The goal of the video will be to teach the user the concept of local variables and scope. An example of a locally defined variable will be given as well as an explanation of what scope is in C++. The local variable and concept of scope will be relat…
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now