[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Regex Question.

Posted on 2005-05-16
25
Medium Priority
?
154 Views
Last Modified: 2010-04-23
If I wanted to replace two or more spaces with Regex.Replace...what would I use?

Regex.Replace(str, "\s{2}", " ") ????
0
Comment
Question by:addicktz
  • 13
  • 7
  • 3
  • +1
25 Comments
 
LVL 9

Expert Comment

by:william007
ID: 14016299
 str = Regex.Replace(str, "\s{2}", String.Empty)
0
 
LVL 9

Expert Comment

by:william007
ID: 14016308
sorry misunderstand your question, below is the answer
 str = Regex.Replace(str, "\s{2,}", " ")
0
 
LVL 9

Expert Comment

by:william007
ID: 14016311
{2,}-->2 and more
{2}-->2 only
{2,3}-->2 to 3
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 14

Expert Comment

by:Shiju Sasidharan
ID: 14016566
hi

\s Matches any white space character including space, tab, form-feed, and so on. Equivalent to [ \f\n\r\t\v].
so it is better to use like this
'==========================
           str = Regex.Replace(str, "[ ]{2,}", "")              ' To replace  spaces >= with empty
           str = Regex.Replace(str, "[ ]{2,}", " ")             ' To replace  spaces >= with single blank space

'==========================
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/script56/html/js56jsgrpregexpsyntax.asp

;-)
Shiju
0
 
LVL 14

Expert Comment

by:Shiju Sasidharan
ID: 14016571
correction , i mean
>> ' To replace  spaces >=2 with empty
>> ' To replace  spaces >=2  with single blank space
0
 
LVL 34

Expert Comment

by:Sancler
ID: 14017152
Or even
           str = Regex.Replace(str, " {2,}", "")
and
           str = Regex.Replace(str, " {2,}", " ")

The square brackets are not needed

Roger
0
 
LVL 1

Author Comment

by:addicktz
ID: 14022913
ok well my problem is, i am using

 Public Sub Log(ByVal logMessage As String, ByVal w As TextWriter)
            w.Write(logMessage)
 End Sub

this sub to log my tcpclient data so that I can parse it when its done downloading. But I keep getting huge areas of whitespace, at least thats what it looks like in notepad, whatever it is, its coming in on the logMessage and I need to put the data back together to parse it....I have posted this question before, http://www.experts-exchange.com/Programming/Programming_Languages/Dot_Net/VB_DOT_NET/Q_21390993.html , I am now using the same log function for another program, and I am having that same problem again, I need to fix the data before I write it to the file.....any ideas?
0
 
LVL 34

Expert Comment

by:Sancler
ID: 14023182
OK, I've looked at your earlier question.  In the light of that, yes, your original suggestion for getting rid of white space was nearly the right one.  You were right to use \s as you want to get rid of all white space characters and not just spaces themselves, but you needed the quantifier to be {2,} rather than {2} as you wanted replacement of two or more, and not just of two, such characters.  It might also be wise to include the regex option for singleline.  I don't think that is strictly necessary with \s, but it should do no harm.  Putting all that together, what you want is

           str = Regex.Replace(str, "\s{2,}", " ", RegexOptions.Singleline)

It is not quite right, because a tab looks like two or more spaces but would be treated by the regex pattern as one.  So, if you had a single tab between two non-white-space characters it would not be stripped out and replaced by a single space.  If that would cause problems, we can refine the regex pattern to deal with it.  I've left it for now as it's simpler.

Or if you find in parsing that, having reduced all white space blocks to single spaces, some useful markers have been destroyed, we might be able to refine the pattern to overcome that.

Roger
0
 
LVL 1

Author Comment

by:addicktz
ID: 14023268
ok, I have tried that line, and it seems to take the newline characters out, which I need to find each new line of data. On top of that, the whitespace I am talking about is still there.
0
 
LVL 1

Author Comment

by:addicktz
ID: 14023359
Public Sub Log(ByVal logMessage As String, ByVal w As TextWriter)
Dim lm As String = Regex.Replace(logMessage, "\s{2,}", " ", RegexOptions.Singleline)
 w.Write(lm)
 End Sub

is what i used.
0
 
LVL 34

Expert Comment

by:Sancler
ID: 14025087
Sorry, there's been confusion re "space" and "white space".  I looked at the example in your earlier question and it looked as though it contained a number of linebreaks within it.  In fact, I find on going back to it, there were just so many spaces that the line word-wrapped a number of times.  So back to " " rather than "\s" and get rid of the singleline option.

Here's a sub that I have tested.  "e:\test.txt" is what I downloaded from http://www.3nter.net/sample.txt which you referred to in your earlier question.

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click

        Dim sr As New System.IO.StreamReader("e:\test.txt")
        Dim sw As New System.IO.StreamWriter("e:\test2.txt")
        Dim myText As String = sr.ReadToEnd

        myText = System.Text.RegularExpressions.Regex.Replace(myText, " {2,}", " ")

        sr.Close()
        sw.Write(myText)
        sw.Close()

    End Sub

Here's the first few lines of my result file "e:\test2.txt"

221 subject fields follow
 1444326 Gilbert O'Sullivan>>As Requested>> "1987@Unforgettable-16 Golden Classics@224Kbps-08-Oooh Baby.mp3" yEnc (12/14)
1444327 Gilbert O'Sullivan>>As Requested>> "1987@Unforgettable-16 Golden Classics@224Kbps-08-Oooh Baby.mp3" yEnc (09/14)
1444328 Gilbert O'Sullivan>>As Requested>> "1987@Unforgettable-16 Golden Classics@224Kbps-08-Oooh Baby.mp3" yEnc (10/14)
1444329 Gilbert O'Sullivan>>As Requested>> "1987@Unforgettable-16 Golden Classics@224Kbps-08-Oooh Baby.mp3" yEnc (11/14)

Is that what you want?

I am surprised that the previous approach left the whitespace you are talking about still there.  "\s" should have been as effective for that purpose as " ".  The only possibility I can immediately think of is if the input is in html and some of the (apparent) spaces are the tokens ";&nbsp".  If you try the above, and it doesn't work, we can investigate that further.

Roger
0
 
LVL 1

Author Comment

by:addicktz
ID: 14033648
using that regex, there is still the white space

0
 
LVL 1

Author Comment

by:addicktz
ID: 14033651
im using notepad with no word wrap....
0
 
LVL 1

Author Comment

by:addicktz
ID: 14033694
ok, im sorry, i see what you are saying now, yes i want that line to disappear, but for somereason, its not doing it for me......

im going to post a new sample....

http://www.3nter.net/newsample.txt

Code:
        Public Sub Log(ByVal logMessage As String, ByVal w As TextWriter)
            Dim lm As String = Regex.Replace(logMessage, " {2,}", "")
            w.Write(lm)
        End Sub

and I tried your Sub

 Public Sub cleanfile(ByVal path As String, ByVal path2 As String)
        Dim sr As New System.IO.StreamReader(path)
        Dim sw As New System.IO.StreamWriter(path2)
        Dim myText As String = sr.ReadToEnd

        myText = System.Text.RegularExpressions.Regex.Replace(myText, " {2,}", " ")

        sr.Close()
        sw.Write(myText)
        sw.Close()

    End Sub
0
 
LVL 34

Expert Comment

by:Sancler
ID: 14034192
I downloaded the new sample file.

Here are the start and end of my results (those in the middle are similar) with the cleanfile() sub.  I copied and pasted it so any differences between my results and yours could not be the results of mistypes at this end.  The lines will wrap on the EE display, but if you copy and paste from there into notepad with no word wrap, you will see that each record is on one line.

3052795 * SVCDGUY - lloR 'N' kcoR of yrotsiH ehT - 10 hour documentary in 10 episodes - Read NFO - CD 10 of 10 - yEnc "THORNR CD 10.part38.rar" (13/34)
3052796 * SVCDGUY - lloR 'N' kcoR of yrotsiH ehT - 10 hour documentary in 10 episodes - Read NFO - CD 10 of 10 - yEnc "THORNR CD 10.part38.rar" (14/34)
3052797 * SVCDGUY - lloR 'N' kcoR of yrotsiH ehT - 10 hour documentary in 10 episodes - Read NFO - CD 10 of 10 - yEnc "THORNR CD 10.part38.rar" (15/34)
3052798 * SVCDGUY - lloR 'N' kcoR of yrotsiH ehT - 10 hour documentary in 10 episodes - Read NFO - CD 10 of 10 - yEnc "THORNR CD 10.part38.rar" (16/34)
3052799 * SVCDGUY - lloR 'N' kcoR of yrotsiH ehT - 10 hour documentary in 10 episodes - Read NFO - CD 10 of 10 - yEnc "THORNR CD 10.part38.rar" (17/34)
3052800 * SVCDGUY - lloR 'N' kcoR of yrotsiH ehT - 10 hour documentary in 10 episodes - Read NFO - CD 10 of 10 - yEnc "THORNR CD 10.part38.rar" (18/34)
...
3052953 * SVCDGUY - lloR 'N' kcoR of yrotsiH ehT - 10 hour documentary in 10 episodes - Read NFO - CD 10 of 10 - yEnc "THORNR CD 10.part43.rar" (01/34)
3052954 * SVCDGUY - lloR 'N' kcoR of yrotsiH ehT - 10 hour documentary in 10 episodes - Read NFO - CD 10 of 10 - yEnc "THORNR CD 10.part43.rar" (02/34)
3052955 * SVCDGUY - lloR 'N' kcoR of yrotsiH ehT - 10 hour documentary in 10 episodes - Read NFO - CD 10 of 10 - yEnc "THORNR CD 10.part43.rar" (03/34)
3052956 * SVCDGUY - lloR 'N' kcoR of yrotsiH ehT - 10 hour documentary in 10 episodes - Read NFO - CD 10 of 10 - yEnc "THORNR CD 10.part43.rar" (04/34)
3052957 * SVCDGUY - lloR 'N' kcoR of yrotsiH ehT - 10 hour documentary in 10 episodes - Read NFO - CD 10 of 10 - yEnc "THORNR CD 10.part43.rar" (05/34)

I notice that the regex in your Log() sub is different.  The replacement is an empty string rather than a space.  So I tried it with that, too.  So far as I can see, the results are identical.

Unless I am still misunderstanding what you want, my results look to match it.  It they are not what you want, can you please point out - using one of the lines from my results above - precisely what is wrong with it?

If my results are what you want, but identical code is not getting them for you, I am not sure at the moment what to suggest.  But I'll think round the problem until you post again.

Roger
0
 
LVL 1

Author Comment

by:addicktz
ID: 14037757
The following numbers are the lines in the newsample with the whitespace.....The ones posted above, are ok before the cleanfile sub....

3052822
3052832
3052879
0
 
LVL 34

Expert Comment

by:Sancler
ID: 14038442
As I said, all lines in my results were similar, but here are the results for those specific lines copied and pasted from those results.  First, with the single space as the replacement

[...]
3052822 * SVCDGUY - lloR 'N' kcoR of yrotsiH ehT - 10 hour documentary in 10 e pisodes - Read NFO - CD 10 of 10 - yEnc "THORNR CD 10.part39.rar" (06/34)
[...]
3052832 * SVCDGUY - lloR 'N' kcoR of yrotsiH ehT - 10 hour documentary in 10 episodes - Read NFO - CD 10 of 10 - yEnc "THORNR CD 10.part39.rar" (16/34)
[...]
3052879 * SVCDGUY - lloR 'N' kcoR of yrotsiH ehT - 10 hour documentary in 10 episodes - Read NFO - CD 10 of 10 - yEnc "THORNR CD 10.part40.rar" (29/34)
[...]

Now, with the "" replacement.

[...]
3052822 * SVCDGUY - lloR 'N' kcoR of yrotsiH ehT - 10 hour documentary in 10 episodes - Read NFO - CD 10 of 10 - yEnc "THORNR CD 10.part39.rar" (06/34)
[...]
3052832* SVCDGUY - lloR 'N' kcoR of yrotsiH ehT - 10 hour documentary in 10 episodes - Read NFO - CD 10 of 10 - yEnc "THORNR CD 10.part39.rar" (16/34)
[...]
3052879 * SVCDGUY - lloR 'N' kcoR of yrotsiH ehT - 10 hour documentary in 10 episodes - Read NFO - CD 10 of 10 - yEnc"THORNR CD 10.part40.rar" (29/34)
[...]

Are those not what you get?  Or if they are what you get, how do you want them altered?

Roger
0
 
LVL 1

Author Comment

by:addicktz
ID: 14049840
Those are not the results I get. I am not sure why, I have posted my code.....
0
 
LVL 1

Author Comment

by:addicktz
ID: 14049879
Im not sure if you notice, but in FireFox, those lines look Identical.....Do you see the whitespaces in the lines before you run cleanfile?
0
 
LVL 1

Author Comment

by:addicktz
ID: 14049901
ok I just used this line here

        myText = System.Text.RegularExpressions.Regex.Replace(myText, " ", "")

And it Did not affect the whitespace I am refering to......

        myText = System.Text.RegularExpressions.Regex.Replace(myText, "\s", "")

did not remove the whitespace ethier....

0
 
LVL 1

Author Comment

by:addicktz
ID: 14050136
Yet another sample to try....

http://www.3nter.net/newersample.txt

I would really like to get this done in log if possible.....

0
 
LVL 1

Author Comment

by:addicktz
ID: 14050137
The code worked with a smaller file, I had been using larger files than the sample posted...
0
 
LVL 34

Accepted Solution

by:
Sancler earned 2000 total points
ID: 14051132
We seem to be in different time-zones.  Those last postings were all while I was fast asleep.

I don't have FireFox, but you saying that that shows some results differently gave me (I hope!) the clue that I needed.  What I had been doing with your sample files - both the old and the new - was bringing them up on IE and saving them from that.  But your comment made me appreciate that IE may be altering them in some way, so I used your link to "Save Target As ...", and the new file then - whilst it did not crash the procedure - produced results with all the white space still there.

So I ran a check on precisely what characters were in the raw file and (as by then I half-expected) found non-printing characters that are not white space: namely Chr(0).  I have to say I don't understand why that should cause the problem but the answer - if I have now correctly identified the whole problem - is to change those to "" or spaces before running the original regex.  I don't think it really matters which.  It should be possible to include both operations in one regex pattern, but doing two sets of replacements does not take long and it is much simpler to see what it happening.  Here's a revision to your code.

    Public Sub Log(ByVal logMessage As String, ByVal w As System.IO.TextWriter)
        logMessage = Regex.Replace(logMessage, Chr(0), "")
        Dim lm As String = Regex.Replace(logMessage, " {2,}", "")
        w.Write(lm)
    End Sub

I adapted my test program so as to use precisely the same approach as you - that is, passing in the string rather than a file name and passing in a text writer for the output - and what is above is a precise copy and paste from a version that is now working for me.

Just as a final check, I will mention that the input file length was 655,250 bytes and the output file length was 143,360 bytes.  Both values read from "Properties" in Windows Explorer.

Here's hoping you get the same results.

Roger
0
 
LVL 34

Expert Comment

by:Sancler
ID: 14051654
This is not strictly relevant to your problem, but it has thrown up some fascinating questions about IE's behaviour.  

You asked whether I could "see the whitespaces in the lines before you run cleanfile?"  On the shorter, original, files the answer was yes, however I got hold of them and however I displayed them.  On the longer, later, file the answer was "no" if I opened it in IE by clicking on the link but "yes" if I downloaded it separately by "Save Target As ...".  If, when I had it open in IE, I then saved it and opened the saved file in NotePad there was no visible white space.  But if I went to the cached version - in my case in C:\Documents and Settings\roger\Local Settings\Temporary Internet Files - and opened it in NotePad from there - I got the full version, including all the white space.  I've experimented with various methods of getting and displaying the files - making sure that the cache was clear so that there would definitely be a new download - in case I'd earlier done something I hadn't remembered.  But what I have described above always seems to be the pattern.  So it looks as if, for some reason and only for longer files, IE must "in the background" automatically itself be doing something like the job you were looking for.

Roger
0
 
LVL 1

Author Comment

by:addicktz
ID: 14052330
the log sub works like a charm, thank you very much for your time and effort...
0

Featured Post

[Webinar] Cloud and Mobile-First Strategy

Maybe you’ve fully adopted the cloud since the beginning. Or maybe you started with on-prem resources but are pursuing a “cloud and mobile first” strategy. Getting to that end state has its challenges. Discover how to build out a 100% cloud and mobile IT strategy in this webinar.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

If you're writing a .NET application to connect to an Access .mdb database and use pre-existing queries that require parameters, you've come to the right place! Let's say the pre-existing query(qryCust) in Access takes a Date as a parameter and l…
1.0 - Introduction Converting Visual Basic 6.0 (VB6) to Visual Basic 2008+ (VB.NET). If ever there was a subject full of murkiness and bad decisions, it is this one!   The first problem seems to be that people considering this task of converting…
Integration Management Part 2
Please read the paragraph below before following the instructions in the video — there are important caveats in the paragraph that I did not mention in the video. If your PaperPort 12 or PaperPort 14 is failing to start, or crashing, or hanging, …
Suggested Courses

834 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question