Solved

Stripping piece of text with RegEx

Posted on 2013-11-21
18
215 Views
Last Modified: 2013-11-25
I would like to be able to remove a part of text using RegEx

Example:
"[BONJOUR]Everything here should be removed as well[/BONJOUR]This part should remain"
Woud become "This part should remain".

I've tried many things, but none work:
[BONJOUR].+[/BONJOUR]
\x5bBONJOUR\x5d.+\x5b/BONJOUR\x5d

Thanks for your help.
0
Comment
Question by:cdebel
  • 6
  • 5
  • 4
  • +1
18 Comments
 
LVL 74

Assisted Solution

by:käµfm³d 👽
käµfm³d   👽 earned 150 total points
ID: 39666302
Try using the non-greedy version of dot-star:  .*?

Also, you need to escape the brackets since they are special characters in regex land.

e.g.

\[BONJOUR\].+?\[/BONJOUR\]

Open in new window

0
 
LVL 28

Expert Comment

by:pepr
ID: 39666452
As kaufmed said. Anyway, regular expressions are not powerful enough for more general cases like this because there is no way to describe nested pair structures.
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 39666478
@pepr

Actually, some regex libraries support balancing groups, which can be used to match nested structures--albeit in a more complicated fashion. I think the Boost regex engine does, but I'm not 100% on that.
0
 
LVL 10

Author Comment

by:cdebel
ID: 39666690
@kaufmed:  It's close, but it still not it.  The result i get with your expression is:
[R]This part should remain
0
 
LVL 9

Assisted Solution

by:Derek Jensen
Derek Jensen earned 150 total points
ID: 39666863
@cdebel, have you tried escaping the slash?

\[BONJOUR\].+?\[\/BONJOUR\]

Open in new window

0
 
LVL 10

Author Comment

by:cdebel
ID: 39666934
@bigdogdman: Yes i've tried.  The only difference between your version and Kaufmed version is the \ right before the /BONJOUR.  But i get the same result.

But there is something different if i test your expression with this website.  It seems to work there.

But does it have anything to do with boost?  I'm using this library for RegEx.  I thought RegEx was a standard... no mather which language or library i use, i was expecting the same results.

Here's my test code:
void testBoostRegex()
{
    std::string wStr = "[BONJOUR]Everything here should be removed as well[/BONJOUR]This part should remain";
    boost::regex wExp("\[BONJOUR\].+?\[\/BONJOUR\]");
    cout << boost::regex_replace(wStr, wExp, "") << endl;
    return;
}

Open in new window

0
 
LVL 10

Author Comment

by:cdebel
ID: 39666955
@bigdogdman: My bad.  I've forgot to double the \.  With the following expression, it work:

    boost::regex wExp("\\[BONJOUR\\].+?\\[\\/BONJOUR\\]");

Thanks a lot!
0
 
LVL 28

Accepted Solution

by:
pepr earned 200 total points
ID: 39667047
@cdebel: The problem is that you have to double backslashes in the C++ literals. It should be:
    boost::regex wExp("\\[BONJOUR\\].+?\\[/BONJOUR\\]");

Open in new window

This is one of the reasons why C++11 introduced raw string literals. Now you can write:
boost::regex wExp(R"(\[BONJOUR\].+?\[/BONJOUR\])");

Open in new window

Anyway, when you use C++11, the regex became the standard library.

As boost is actually a testbed for C++ standards, the C++11 regex is syntactically probably almost identical. (I have used it only the simple way, and it was identical.)

(The Microsoft Visual C++ 2013 is fairly good compiler these days, so I dare to point out to their documentation.)

For <regex> http://msdn.microsoft.com/en-us/library/vstudio/bb982382.aspx
and the regex_constants::syntax_option_type http://msdn.microsoft.com/en-us/library/vstudio/bb982516.aspx -- you can set what syntax you want to use. Anyway, the more advanced syntax also causes regular expressions be more complex in the sense of computational complexity -- see http://swtch.com/~rsc/regexp/regexp1.html

The slash need not to be escaped.
0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 39667943
I'm not sure that bigdogdman's post should be the accepted answer here, since it's really just a copy of what I posted. If anything, pepr's last comment should be the answer since it goes into detail about the need for double-escaping of the backslash--something I mistakenly assumed would be understood given the target language of C++.
0
 
LVL 28

Expert Comment

by:pepr
ID: 39668221
@kaufmed: I was late :) Anyway, things are as they are, and it is OK.
0
 
LVL 9

Expert Comment

by:Derek Jensen
ID: 39668971
Sorry kaufmed, I didn't mean to steal any poins from you; my bad. :">
I'll try to remember to credit you (or anyone) from now on when I'm merely offering an adjustment to their suggestion.
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 39669065
It's not so much the points as the correctness. The OP stated that my suggestion didn't work (which we now know was due to lack of escaping). You're suggestion assumes that the issue is with the forward slash, which it is not. The only languages that require a forward slash to be escaped are those which use pattern delimiters (e.g. PHP, Perl, Javascipt, etc.). So in a technical sense, your answer is a repeat of my answer.

I see now that the OP posted info regarding the double-backslash as well.
0
 
LVL 10

Author Comment

by:cdebel
ID: 39669067
I just want to point out that:
pepr reply happend after i've accepted the answer.  I would have gave him some points for the additionnal information that he have provided
bigdogdman didn't just "copy&paste" your version, but he have shown the exact version that work.  In kaufmed version, a \ was missing before the [\/BONJOUR].  I wouldn't feel correct for E-E readers if i accepted a "partially working" version just based on the fact that you answered first.
I've found the double backslash myself, you can see it with the timestamps.

I'll ask the moderator to split the point equally between the 3 of you, or reopen the question so i can accept the answer properly.

By the way, there will be a similar question in the next hour.  This question was "over simplified".  If i try to adapt to the real situation i'm facing, it still doesn't work
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 39669079
Don't sweat it. I've said my peace, and you have your answer. Nobody lost a limb. It's a good day for everyone  ; )
0
 
LVL 10

Author Comment

by:cdebel
ID: 39669129
I just want that everyone feel ok with the answer.  I didn't know that it was different for PHP/Perl/Javascript.  When i've posted this question, i've written in the tags "boost" because this is the library that i use (boost::regex_replace to be precise).  And i've posted this in regular expression, and "C++ Languages".  

But ... i was testing it in myregextester web site, because i thought i've made a mistake with my usage of boost library.

I know how it feel when someone accept the wrong answer or when the OP didn't specified things.
0
 
LVL 10

Author Comment

by:cdebel
ID: 39669246
There's no way to ask a related question anymore, so here's the link for those who are interrested to give precisions:
http://www.experts-exchange.com/Programming/Languages/Regular_Expressions/Q_28301459.html
0
 
LVL 28

Expert Comment

by:pepr
ID: 39676136
Well, thanks for the points. Anyway, you should know I am not doing it for the points. (It is a game.) I am learning and repeating via searching for the answer. That's it. :)

Have a nice time (all of you).
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

Article by: SunnyDark
This article's goal is to present you with an easy to use XML wrapper for C++ and also present some interesting techniques that you might use with MS C++. The reason I built this class is to ease the pain of using XML files with C++, since there is…
This article shows you how to optimize memory allocations in C++ using placement new. Applicable especially to usecases dealing with creation of large number of objects. A brief on problem: Lets take example problem for simplicity: - I have a G…
The viewer will learn how to use the return statement in functions in C++. The video will also teach the user how to pass data to a function and have the function return data back for further processing.
The viewer will be introduced to the member functions push_back and pop_back of the vector class. The video will teach the difference between the two as well as how to use each one along with its functionality.

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now