Link to home
Start Free TrialLog in
Avatar of lightspeedvt
lightspeedvt

asked on

Regular Expression for specific matches

Hi Experts,

I have the initial regexp that does certain matches:

\{\s*(__)\s*([^\}]*)\}([^\{]*)\{\/\1\}

Open in new window


Here is an example of what it is doing:

Source string:
{__ some_text1="some_value1" some_text2=some_value2}Some text inside{/__}

Open in new window


Matches result:
arr[0] = '{__ some_text1="some_value1" some_text2=some_value2}Som<wbr ></wbr>e text inside{/__}';
arr[1] = '__';
arr[2] = 'some_text1="some_value1" some_text2=some_value2';
arr[3] = 'Some text inside';

Open in new window



But I need to rebuild the existing regexp to be able to do matches with such source string:
{__ some_text1="{any_text1}some_value{/any_text1} {$any_text2}" some_text2=some_value2}Some {any_text3}text{/any_text3} inside{/__}

Open in new window


Matches result should be:
arr[0] = '{__ some_text1="{any_text1}some_value{/any_text1} {$any_text2}" some_text2=some_value2}Some {any_text3}text{/any_text3} inside{/__}';
arr[1] = '__';
arr[2] = 'some_text1="{any_text1}some_value{/any_text1} {$any_text2}" some_text2=some_value2';
arr[3] = 'Some {any_text3}text{/any_text3} inside';

Open in new window


In other words - I need a way to allow the nested { and } symbols to be inside the matches array.

I've already built one regexp pattern:
\{\s*(__)\s*((?!\{\/\1\})[\s\S]*)*\{\/\1\}

Open in new window


But it is needed to implement the correct matches capture, as at current moment matches result for my pattern is:
arr[0] = '{__ some_text1="some_value1" some_text2=some_value2}Some text inside{/__}';
arr[1] = '__';
arr[2] = 'some_text1="{any_text1}some_value{/any_text1} {$any_text2}" some_text2=some_value2}Some {any_text3}text{/any_text3} inside';

Open in new window


I will be very thankful if somebody will help me to complete with this.
Avatar of arnold
arnold
Flag of United States of America image

Trying too.

Could you post the pattern and the resulting match results you want?

The items in () that match are assigned in order.

Your current pattern match on the {} is mostly outside the retain option.
Avatar of lightspeedvt
lightspeedvt

ASKER

Actually I've already posted the example of text and the results in order that I need to get (see my question: "But I need to rebuild the existing regexp to be able to do matches with such source string:" and "Matches result should be:").

Coming from logical scheme - it is needed to look for all {__} ... {/__}, but match to certain groups (see bolded):
1. {__ some={text} some2="text"} Here {goes}some{/goes} text {/__}
2. {__ some={text} some2="text"} Here {goes}some{/goes} text {/__}
3. {__ some={text} some2="text"} Here {goes}some{/goes} text {/__}

For #1 - just match to "__"
For #2 - match to all that goes after "{__" but before "}". It is needed to keep the possible nested "{}" and "{/}"
For #3 - almost same as #2, but match only after "}" and before "{/__}".
This looks highly theoretical and possibly unnecessarily complicated, almost as if it were an academic assignment.  In particular, the semantic meaning of the right curly brace appears to be ambiguous and therefore would require a contextual analysis that may be made more difficult by the insistence on regular expressions.  Can you do two things for us, please?

First, step back from the technical details and just describe the application in business terms.

Second, please post some of the actual data and desired results, not the generalized test data.

Thanks and regards, ~Ray
Wat language is this being implemented in?
The matching is a two step process where the first component deals with extracting the entries of interest while the second confirms that the extracted items I.e. item3 and item4 are identical that will confirm the item delineators.
It is for PHP.

To Ray_Paseur: I don't think that it is complicated for regular expressions, because there are a lot of ways to build the regexp logic. Example of regexp that I've provided is not one that should be extended as to be able to reach the end result. I saw really complicated requirements earlier and how simple regexp was looking. I am not guru in regular expressions, so that is why asking some times others to help me with them.

As for explanation details - I think that had already explained quite detailed. I can provide more data with results:
{__}Some text {text}inside{/text} inside{/__}

arr[0] = '{__}Some text {text}inside{/text} inside{/__}';
arr[1] = '__';
arr[2] = '';
arr[3] = 'Some text {text}inside{/text} inside';

Open in new window

{__ some_text="{some}Text{/some}"}Some text inside{/__}

arr[0] = '{__}Some text {text}inside{/text} inside{/__}';
arr[1] = '__';
arr[2] = 'some_text="{some}Text{/some}"';
arr[3] = 'Some text inside';

Open in new window

{__ some_text="{some}"}Some text inside{/__}

arr[0] = '{__ some_text="{some}"}Some text inside{/__}';
arr[1] = '__';
arr[2] = 'some_text="{some}"';
arr[3] = 'Some text inside';

Open in new window

{__ some_text={some}}Some text inside{/__}

arr[0] = '{__ some_text={some}}Some text inside{/__}';
arr[1] = '__';
arr[2] = 'some_text={some}';
arr[3] = 'Some text inside';

Open in new window

{__ some_text={some} some_other_text="Some text"}Some text inside{/__}

arr[0] = '{__ some_text={some} some_other_text="Some text"}Some text inside{/__}';
arr[1] = '__';
arr[2] = 'some_text={some} some_other_text="Some text"';
arr[3] = 'Some text inside';

Open in new window

{__ some_text={some} some_other_text="Some {text}text{/text}"}Some text inside{/__}

arr[0] = '{__ some_text={some} some_other_text="Some {text}text{/text}"}Some text inside{/__}';
arr[1] = '__';
arr[2] = 'some_text={some} some_other_text="Some {text}text{/text}"';
arr[3] = 'Some text inside';

Open in new window

OK with me if you don't think it is complicated.  I'll sign off and let someone else deal with it.  Best of luck with your project, ~Ray
Ray, thanks for your time.
This parses the two major sections.
{__\s*(.*?)="(.*)"\s*(.*?)=(.*){\/__}

Open in new window


Using http://Myregextester.com I got the following parsed results.
$matches Array:
(
    [0] => Array
        (
            [0] => {__ some_text1="{any_text1}some_value{/any_text1} {$any_text2}" some_text2=some_value2}Some {any_text3}text{/any_text3} inside{/__}
        )

    [1] => Array
        (
            [0] => some_text1
        )

    [2] => Array
        (
            [0] => {any_text1}some_value{/any_text1} {$any_text2}
        )

    [3] => Array
        (
            [0] => some_text2
        )

    [4] => Array
        (
            [0] => some_value2}Some {any_text3}text{/any_text3} inside
        )

)

Open in new window


As Ray suggested, you would be better off if you parsed the individual matches with a different pattern.
You can continue using the pattern the issue is that you want to match the {key}entry{/key}which deals with a match on the first portion and then a comparison on the

/pattern \{(match)\}(item)\{\/(match\}/ && $1 == $2

This way if the items are not the same, the conditional will fail.
Are you dealing with XML type data?
What is the source of this data?

I'll take a look later at the string and.
It is PHP Smarty Templates data format, so it can't be used as XML data type, as there could be some non-XML formatting things inside.

Source of those data - I've build myself. It just follows the Smarty Templates syntax. I can provide more examples.

In PHP I am just using preg_match_all:
preg_match_all(
	$pattern,
	$content,
	$matches
);

Open in new window


Also, we can avoid equality comparison for text inside {} and {/}, because they should be equal by the standard. So, they are always equal.
@aikimark: Last result is not correctly matched (#4), as there could be no "" characters. Parsing individual matches with different pattern may overload server too much.
Why don't you try it and see.  You've got a few different patterns that you could iterate through and test for a pattern match before actually doing the parsing.  I think you will need to use some backreferences in your patterns.
To accomplish what you want using regex, you have to use sequential comparisons.
There is no single regex expression that will match all while providing context which I believe is important and will likely be down the line.

Are you trying to reverse engineer/mimic what smarty template does?
ASKER CERTIFIED SOLUTION
Avatar of lightspeedvt
lightspeedvt

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Resolved myself.