Solved

Using RexEx to replace Whitespace / Tabs / Spaces / Carriage return etc.

Posted on 2008-10-17
10
3,350 Views
Last Modified: 2012-05-05
I need to remove Whitespace / Tabs / Spaces / Carriage returns from  a String of text - as in the two examples below.  The RegEx should only remove this from the outside (leading AND trailing) of the data, and not within the data itself - sort of like when you remove indentation in a text editor.

Example 1 BEFORE:

"<message>
  <UNB>
      <S001>
            <E0001>     {UNB0001}     </E0001>
            <E0002>{UNB0002}</E0002>
      </S001>
      <S002>
            <E0004>{UNB0004}     </E0004>
      </S002>
  </UNB>
</message>"

Example 1 DESIRED AFTER:

"<message>
<UNB>
<S001>
<E0001>     {UNB0001}     </E0001>
<E0002>{UNB0002}</E0002>
</S001>
<S002>
<E0004>{UNB0004}     </E0004>
</S002>
</UNB>
</message>"

I know there is a character in Regylar Expression that you can use (\s), but I'm not sure how to do the replace and I'm also worried it replaces whitespace in the actual xmldata, which it shouldn't.

Any ideas?

Thanks


0
Comment
Question by:djcheeky
  • 5
  • 4
10 Comments
 
LVL 20

Expert Comment

by:informaniac
ID: 22738901
JS function is this if it helps

function Trim(str){return str.replace(/^\s+|\s+$/g,''); }  
0
 

Author Comment

by:djcheeky
ID: 22739034
Hi - I tried that but didn't seem to have any luck:

Regex ws = new Regex(@"/^\s+|\s+$/g");
ws.Replace(myString,"");
0
 
LVL 1

Expert Comment

by:kmcghee
ID: 22740960
You just replace any spaces between \n and <

This should work:

#!/usr/bin/perl

my $string = "<message>
  <UNB>
      <S001>
            <E0001>     {UNB0001}     </E0001>
            <E0002>{UNB0002}</E0002>
      </S001>
      <S002>
            <E0004>{UNB0004}     </E0004>
      </S002>
  </UNB>
</message>";

$string =~ s/\n\s+</\n</g;

print $string;
0
 
LVL 1

Expert Comment

by:kmcghee
ID: 22740975
oops didn't realsie its C# regex.. this is perl.. should be something similar in C#
0
 

Author Comment

by:djcheeky
ID: 22741053
Hi - wouldn't that just strip the front indentation out? I need it to do the same for trailing spaces / tabs / whitespace etc.

Thanks
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 1

Expert Comment

by:kmcghee
ID: 22741205
yes that results in:

<message>
<UNB>
<S001>
<E0001>     {UNB0001}     </E0001>
<E0002>{UNB0002}</E0002>
</S001>
<S002>
<E0004>{UNB0004}     </E0004>
</S002>
</UNB>
</message>

to remove trailing you can use:

$string =~ s/>\s+\n</>\n/g;

oh and if any of the strings have any of those chars before the first tag eg

"   <message>...."

you can do this too:

$string =~ s/^\s+<//;

or the last tag:

$string =~ s/\s+$<//;
0
 
LVL 1

Expert Comment

by:kmcghee
ID: 22741246
sorry the last 2 examples should be:

$string =~ s/^\s+</</;

$string =~ s/>\s+$/>/;
0
 

Author Comment

by:djcheeky
ID: 22755576
Hi kmcghee

Those two did not work.
0
 
LVL 1

Accepted Solution

by:
kmcghee earned 500 total points
ID: 22755767
How have you used them?

you need to do them all:

#!/usr/bin/perl

my $string = "<message>
  <UNB>
      <S001>
            <E0001>     {UNB0001}     </E0001>
            <E0002>{UNB0002}</E0002>
      </S001>
      <S002>
            <E0004>{UNB0004}     </E0004>
      </S002>
  </UNB>
</message>";

$string =~ s/\n\s+</\n</g;    # remove leading spaces/tabs etc.
$string =~ s/>\s+\n</>\n/g;  # remove trailing spaces/tabs etc.
$string =~ s/^\s+</</;          # remove paces/tabs at beginning of string.
$string =~ s/>\s+$/>/;          # remove paces/tabs at end of string.

print $string;
0
 

Author Comment

by:djcheeky
ID: 22755864
Hi Mcghee!

Ok, it took a bit of messing around with but i managed to get it to work:

Regex precedingWS = new Regex(@"\n\s+<");
myString = precedingWS.Replace(myString , "\n<");
Regex trailingWS = new Regex(@"\s+\n");
myString  = trailingWS.Replace(myString , "\n");

But your example nudged me in the right direction - so thanks.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction Although it is an old technology, serial ports are still being used by many hardware manufacturers. If you develop applications in C#, Microsoft .NET framework has SerialPort class to communicate with the serial ports.  I needed to…
It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

929 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now