Link to home
Start Free TrialLog in
Avatar of Christian de Bellefeuille
Christian de BellefeuilleFlag for Canada

asked on

Stripping piece of text with RegEx (Part 2)

My previous question was over simplified.  So here is what my real situation is.

I need to strip some part of a text using RegEx.

Example:
"[XXXX_BONJOUR]Everything here should be removed as well[/XXXX_BONJOUR]This part should remain"

As a result, i need "This part should remain".

Now what's particular about this version is that "XXXX" could be anything.  Don't worry about the begin & end matching.  Assume that they will match.

I've tried this expression, but it doesn't work:
\[.+?_BONJOUR\].+?\[\/.+?_BONJOUR\]

Thank you
Avatar of ddrudik
ddrudik
Flag of United States of America image

using System;
using System.Text.RegularExpressions;
namespace myapp
{
  class Class1
    {
      static void Main(string[] args)
        {
          String sourcestring = "[XXXX_BONJOUR]Everything here should be removed as well[/XXXX_BONJOUR]This part should remain";
          String matchpattern = @"\[([^_]+_BONJOUR)\].*?\[/\1\]";
          String replacementpattern = @"";
          Console.WriteLine(Regex.Replace(sourcestring,matchpattern,replacementpattern));
        }
    }
}

Open in new window

I've tested your regex on RegexBuddy and here (http://www.myregextester.com/index.php): it works like a charm. Can you show the code you're using?
I used my regex tester site to test the pattern and generate the code, here's a link to your example:

http://www.myregextester.com/?r=a54afc16

Thanks.
-Doug.
Avatar of Christian de Bellefeuille

ASKER

@marqusG: As soon as i use a more complex source text, it doesn't work.

Here's my code in C++.  I also use the same web site as you have mentionned:
// ioText contain my text
boost::regex wExp("\\[.+?_BONJOUR\\].+?\\[\\/.+?_BONJOUR\\]");
cout << boost::regex_replace(ioText, wExp, "");

Open in new window


When i test it on myregextester, i get "NO MATCHES.  SOURCE TEXT UNCHANGED.  CHECK FOR DELIMITER COLLISION".  Of course i don't use "double backslash".

Here's a source text example
Hi,

Here is some text with $PlaceHolders$.

[DETAIL]
To reach me: http://$PortalURL$
[/DETAIL]

[SOMETHING_BONJOUR]
Some other text here#
[/SOMETHING_BONJOUR]

Thank you
@ddrudik: Same problem using your pattern (i've tried with the web site, and the sample i've just provided to marqusG).
ASKER CERTIFIED SOLUTION
Avatar of Marco Gasi
Marco Gasi
Flag of Spain image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Interresting.  I didn't knew that we could save the request.  How did you do that?

I see one difference in what you have done and what i've tested.  You seems to have checked the "S" flag (Dot matches all characters including newline)

Do you have any idea if i can set these flags with boost?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks for these precisions ddrudik.  I was able to find the match_not_dot_newline flag, but couldn't find the opposite... so like you say, it's like this by default.  I didn't noticed the Save Example box too.

I've tried it with my application, and it seems to work perfectly.  

I wished i was able to understand regex like you guys, because that expression that marqus left, i have no clue what it mean.  Especially the bold part bellow

\[[^\]]+?_BONJOUR\].+?\[\/.+?_BONJOUR\]

It would mean to me that there must be a "[", then "]" must be found right after it...
I've difficulties to understand the ^ and $ (Start and End of string) in regex.
Copy and paste that pattern into the pattern box on myregextester.com, check the "Explain" box and click "Submit".

From the "Explain" feature:
[^\]]+? any character except: '\]' (1 or more
times (matching the least amount
possible))
I really got to look at this web site better next time :).  Thanks
Thank you.