Solved

Using RexEx to replace Whitespace / Tabs / Spaces / Carriage return etc.

Posted on 2008-10-17
10
3,355 Views
Last Modified: 2012-05-05
I need to remove Whitespace / Tabs / Spaces / Carriage returns from  a String of text - as in the two examples below.  The RegEx should only remove this from the outside (leading AND trailing) of the data, and not within the data itself - sort of like when you remove indentation in a text editor.

Example 1 BEFORE:

"<message>
  <UNB>
      <S001>
            <E0001>     {UNB0001}     </E0001>
            <E0002>{UNB0002}</E0002>
      </S001>
      <S002>
            <E0004>{UNB0004}     </E0004>
      </S002>
  </UNB>
</message>"

Example 1 DESIRED AFTER:

"<message>
<UNB>
<S001>
<E0001>     {UNB0001}     </E0001>
<E0002>{UNB0002}</E0002>
</S001>
<S002>
<E0004>{UNB0004}     </E0004>
</S002>
</UNB>
</message>"

I know there is a character in Regylar Expression that you can use (\s), but I'm not sure how to do the replace and I'm also worried it replaces whitespace in the actual xmldata, which it shouldn't.

Any ideas?

Thanks


0
Comment
Question by:djcheeky
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
10 Comments
 
LVL 20

Expert Comment

by:informaniac
ID: 22738901
JS function is this if it helps

function Trim(str){return str.replace(/^\s+|\s+$/g,''); }  
0
 

Author Comment

by:djcheeky
ID: 22739034
Hi - I tried that but didn't seem to have any luck:

Regex ws = new Regex(@"/^\s+|\s+$/g");
ws.Replace(myString,"");
0
 
LVL 1

Expert Comment

by:kmcghee
ID: 22740960
You just replace any spaces between \n and <

This should work:

#!/usr/bin/perl

my $string = "<message>
  <UNB>
      <S001>
            <E0001>     {UNB0001}     </E0001>
            <E0002>{UNB0002}</E0002>
      </S001>
      <S002>
            <E0004>{UNB0004}     </E0004>
      </S002>
  </UNB>
</message>";

$string =~ s/\n\s+</\n</g;

print $string;
0
Revamp Your Training Process

Drastically shorten your training time with WalkMe's advanced online training solution that Guides your trainees to action.

 
LVL 1

Expert Comment

by:kmcghee
ID: 22740975
oops didn't realsie its C# regex.. this is perl.. should be something similar in C#
0
 

Author Comment

by:djcheeky
ID: 22741053
Hi - wouldn't that just strip the front indentation out? I need it to do the same for trailing spaces / tabs / whitespace etc.

Thanks
0
 
LVL 1

Expert Comment

by:kmcghee
ID: 22741205
yes that results in:

<message>
<UNB>
<S001>
<E0001>     {UNB0001}     </E0001>
<E0002>{UNB0002}</E0002>
</S001>
<S002>
<E0004>{UNB0004}     </E0004>
</S002>
</UNB>
</message>

to remove trailing you can use:

$string =~ s/>\s+\n</>\n/g;

oh and if any of the strings have any of those chars before the first tag eg

"   <message>...."

you can do this too:

$string =~ s/^\s+<//;

or the last tag:

$string =~ s/\s+$<//;
0
 
LVL 1

Expert Comment

by:kmcghee
ID: 22741246
sorry the last 2 examples should be:

$string =~ s/^\s+</</;

$string =~ s/>\s+$/>/;
0
 

Author Comment

by:djcheeky
ID: 22755576
Hi kmcghee

Those two did not work.
0
 
LVL 1

Accepted Solution

by:
kmcghee earned 500 total points
ID: 22755767
How have you used them?

you need to do them all:

#!/usr/bin/perl

my $string = "<message>
  <UNB>
      <S001>
            <E0001>     {UNB0001}     </E0001>
            <E0002>{UNB0002}</E0002>
      </S001>
      <S002>
            <E0004>{UNB0004}     </E0004>
      </S002>
  </UNB>
</message>";

$string =~ s/\n\s+</\n</g;    # remove leading spaces/tabs etc.
$string =~ s/>\s+\n</>\n/g;  # remove trailing spaces/tabs etc.
$string =~ s/^\s+</</;          # remove paces/tabs at beginning of string.
$string =~ s/>\s+$/>/;          # remove paces/tabs at end of string.

print $string;
0
 

Author Comment

by:djcheeky
ID: 22755864
Hi Mcghee!

Ok, it took a bit of messing around with but i managed to get it to work:

Regex precedingWS = new Regex(@"\n\s+<");
myString = precedingWS.Replace(myString , "\n<");
Regex trailingWS = new Regex(@"\s+\n");
myString  = trailingWS.Replace(myString , "\n");

But your example nudged me in the right direction - so thanks.
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This article series is supposed to shed some light on the use of IDisposable and objects that inherit from it. In essence, a more apt title for this article would be: using (IDisposable) {}. I’m just not sure how many people would ge…
This article is for Object-Oriented Programming (OOP) beginners. An Interface contains declarations of events, indexers, methods and/or properties. Any class which implements the Interface should provide the concrete implementation for each Inter…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

695 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question