Solved

Using RexEx to replace Whitespace / Tabs / Spaces / Carriage return etc.

Posted on 2008-10-17
10
3,353 Views
Last Modified: 2012-05-05
I need to remove Whitespace / Tabs / Spaces / Carriage returns from  a String of text - as in the two examples below.  The RegEx should only remove this from the outside (leading AND trailing) of the data, and not within the data itself - sort of like when you remove indentation in a text editor.

Example 1 BEFORE:

"<message>
  <UNB>
      <S001>
            <E0001>     {UNB0001}     </E0001>
            <E0002>{UNB0002}</E0002>
      </S001>
      <S002>
            <E0004>{UNB0004}     </E0004>
      </S002>
  </UNB>
</message>"

Example 1 DESIRED AFTER:

"<message>
<UNB>
<S001>
<E0001>     {UNB0001}     </E0001>
<E0002>{UNB0002}</E0002>
</S001>
<S002>
<E0004>{UNB0004}     </E0004>
</S002>
</UNB>
</message>"

I know there is a character in Regylar Expression that you can use (\s), but I'm not sure how to do the replace and I'm also worried it replaces whitespace in the actual xmldata, which it shouldn't.

Any ideas?

Thanks


0
Comment
Question by:djcheeky
  • 5
  • 4
10 Comments
 
LVL 20

Expert Comment

by:informaniac
ID: 22738901
JS function is this if it helps

function Trim(str){return str.replace(/^\s+|\s+$/g,''); }  
0
 

Author Comment

by:djcheeky
ID: 22739034
Hi - I tried that but didn't seem to have any luck:

Regex ws = new Regex(@"/^\s+|\s+$/g");
ws.Replace(myString,"");
0
 
LVL 1

Expert Comment

by:kmcghee
ID: 22740960
You just replace any spaces between \n and <

This should work:

#!/usr/bin/perl

my $string = "<message>
  <UNB>
      <S001>
            <E0001>     {UNB0001}     </E0001>
            <E0002>{UNB0002}</E0002>
      </S001>
      <S002>
            <E0004>{UNB0004}     </E0004>
      </S002>
  </UNB>
</message>";

$string =~ s/\n\s+</\n</g;

print $string;
0
Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
LVL 1

Expert Comment

by:kmcghee
ID: 22740975
oops didn't realsie its C# regex.. this is perl.. should be something similar in C#
0
 

Author Comment

by:djcheeky
ID: 22741053
Hi - wouldn't that just strip the front indentation out? I need it to do the same for trailing spaces / tabs / whitespace etc.

Thanks
0
 
LVL 1

Expert Comment

by:kmcghee
ID: 22741205
yes that results in:

<message>
<UNB>
<S001>
<E0001>     {UNB0001}     </E0001>
<E0002>{UNB0002}</E0002>
</S001>
<S002>
<E0004>{UNB0004}     </E0004>
</S002>
</UNB>
</message>

to remove trailing you can use:

$string =~ s/>\s+\n</>\n/g;

oh and if any of the strings have any of those chars before the first tag eg

"   <message>...."

you can do this too:

$string =~ s/^\s+<//;

or the last tag:

$string =~ s/\s+$<//;
0
 
LVL 1

Expert Comment

by:kmcghee
ID: 22741246
sorry the last 2 examples should be:

$string =~ s/^\s+</</;

$string =~ s/>\s+$/>/;
0
 

Author Comment

by:djcheeky
ID: 22755576
Hi kmcghee

Those two did not work.
0
 
LVL 1

Accepted Solution

by:
kmcghee earned 500 total points
ID: 22755767
How have you used them?

you need to do them all:

#!/usr/bin/perl

my $string = "<message>
  <UNB>
      <S001>
            <E0001>     {UNB0001}     </E0001>
            <E0002>{UNB0002}</E0002>
      </S001>
      <S002>
            <E0004>{UNB0004}     </E0004>
      </S002>
  </UNB>
</message>";

$string =~ s/\n\s+</\n</g;    # remove leading spaces/tabs etc.
$string =~ s/>\s+\n</>\n/g;  # remove trailing spaces/tabs etc.
$string =~ s/^\s+</</;          # remove paces/tabs at beginning of string.
$string =~ s/>\s+$/>/;          # remove paces/tabs at end of string.

print $string;
0
 

Author Comment

by:djcheeky
ID: 22755864
Hi Mcghee!

Ok, it took a bit of messing around with but i managed to get it to work:

Regex precedingWS = new Regex(@"\n\s+<");
myString = precedingWS.Replace(myString , "\n<");
Regex trailingWS = new Regex(@"\s+\n");
myString  = trailingWS.Replace(myString , "\n");

But your example nudged me in the right direction - so thanks.
0

Featured Post

Active Directory Webinar

We all know we need to protect and secure our privileges, but where to start? Join Experts Exchange and ManageEngine on Tuesday, April 11, 2017 10:00 AM PDT to learn how to track and secure privileged users in Active Directory.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Keyboard 2 41
Hey Audio! 9 45
Most efficient JavaScript way to verify if variable matches any value in the list? 6 31
c# - Best approach for objects in functions 3 24
This article introduced a TextBox that supports transparent background.   Introduction TextBox is the most widely used control component in GUI design. Most GUI controls do not support transparent background and more or less do not have the…
Introduction Hi all and welcome to my first article on Experts Exchange. A while ago, someone asked me if i could do some tutorials on object oriented programming. I decided to do them on C#. Now you may ask me, why's that? Well, one of the re…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

830 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question