Solved

Using RexEx to replace Whitespace / Tabs / Spaces / Carriage return etc.

Posted on 2008-10-17
10
3,348 Views
Last Modified: 2012-05-05
I need to remove Whitespace / Tabs / Spaces / Carriage returns from  a String of text - as in the two examples below.  The RegEx should only remove this from the outside (leading AND trailing) of the data, and not within the data itself - sort of like when you remove indentation in a text editor.

Example 1 BEFORE:

"<message>
  <UNB>
      <S001>
            <E0001>     {UNB0001}     </E0001>
            <E0002>{UNB0002}</E0002>
      </S001>
      <S002>
            <E0004>{UNB0004}     </E0004>
      </S002>
  </UNB>
</message>"

Example 1 DESIRED AFTER:

"<message>
<UNB>
<S001>
<E0001>     {UNB0001}     </E0001>
<E0002>{UNB0002}</E0002>
</S001>
<S002>
<E0004>{UNB0004}     </E0004>
</S002>
</UNB>
</message>"

I know there is a character in Regylar Expression that you can use (\s), but I'm not sure how to do the replace and I'm also worried it replaces whitespace in the actual xmldata, which it shouldn't.

Any ideas?

Thanks


0
Comment
Question by:djcheeky
  • 5
  • 4
10 Comments
 
LVL 20

Expert Comment

by:informaniac
ID: 22738901
JS function is this if it helps

function Trim(str){return str.replace(/^\s+|\s+$/g,''); }  
0
 

Author Comment

by:djcheeky
ID: 22739034
Hi - I tried that but didn't seem to have any luck:

Regex ws = new Regex(@"/^\s+|\s+$/g");
ws.Replace(myString,"");
0
 
LVL 1

Expert Comment

by:kmcghee
ID: 22740960
You just replace any spaces between \n and <

This should work:

#!/usr/bin/perl

my $string = "<message>
  <UNB>
      <S001>
            <E0001>     {UNB0001}     </E0001>
            <E0002>{UNB0002}</E0002>
      </S001>
      <S002>
            <E0004>{UNB0004}     </E0004>
      </S002>
  </UNB>
</message>";

$string =~ s/\n\s+</\n</g;

print $string;
0
 
LVL 1

Expert Comment

by:kmcghee
ID: 22740975
oops didn't realsie its C# regex.. this is perl.. should be something similar in C#
0
 

Author Comment

by:djcheeky
ID: 22741053
Hi - wouldn't that just strip the front indentation out? I need it to do the same for trailing spaces / tabs / whitespace etc.

Thanks
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 1

Expert Comment

by:kmcghee
ID: 22741205
yes that results in:

<message>
<UNB>
<S001>
<E0001>     {UNB0001}     </E0001>
<E0002>{UNB0002}</E0002>
</S001>
<S002>
<E0004>{UNB0004}     </E0004>
</S002>
</UNB>
</message>

to remove trailing you can use:

$string =~ s/>\s+\n</>\n/g;

oh and if any of the strings have any of those chars before the first tag eg

"   <message>...."

you can do this too:

$string =~ s/^\s+<//;

or the last tag:

$string =~ s/\s+$<//;
0
 
LVL 1

Expert Comment

by:kmcghee
ID: 22741246
sorry the last 2 examples should be:

$string =~ s/^\s+</</;

$string =~ s/>\s+$/>/;
0
 

Author Comment

by:djcheeky
ID: 22755576
Hi kmcghee

Those two did not work.
0
 
LVL 1

Accepted Solution

by:
kmcghee earned 500 total points
ID: 22755767
How have you used them?

you need to do them all:

#!/usr/bin/perl

my $string = "<message>
  <UNB>
      <S001>
            <E0001>     {UNB0001}     </E0001>
            <E0002>{UNB0002}</E0002>
      </S001>
      <S002>
            <E0004>{UNB0004}     </E0004>
      </S002>
  </UNB>
</message>";

$string =~ s/\n\s+</\n</g;    # remove leading spaces/tabs etc.
$string =~ s/>\s+\n</>\n/g;  # remove trailing spaces/tabs etc.
$string =~ s/^\s+</</;          # remove paces/tabs at beginning of string.
$string =~ s/>\s+$/>/;          # remove paces/tabs at end of string.

print $string;
0
 

Author Comment

by:djcheeky
ID: 22755864
Hi Mcghee!

Ok, it took a bit of messing around with but i managed to get it to work:

Regex precedingWS = new Regex(@"\n\s+<");
myString = precedingWS.Replace(myString , "\n<");
Regex trailingWS = new Regex(@"\s+\n");
myString  = trailingWS.Replace(myString , "\n");

But your example nudged me in the right direction - so thanks.
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

In order to hide the "ugly" records selectors (triangles) in the rowheaders, here are some suggestions. Microsoft doesn't have a direct method/property to do it. You can only hide the rowheader column. First solution, the easy way The first sol…
Calculating holidays and working days is a function that is often needed yet it is not one found within the Framework. This article presents one approach to building a working-day calculator for use in .NET.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now