Link to home
Start Free TrialLog in
Avatar of LD147
LD147Flag for United States of America

asked on

Cleaning up text file with Visual Basic .net

I have a text file which contains the following:

<start>
-------------------------------------------------------
ID Num: BF00000000  Readings: 0000  Records: 000
Interval: 1 hour
------------------------------------------------------
Date & Time, Sample, Packet, ID, Reading, Status
------------------------------------------------------
ID Num: BF00000000  Readings: 0001  Records: 000
Interval: 1 hour
------------------------------------------------------
Date & Time, Sample, Packet, ID, Reading, Status
------------------------------------------------------
ID Num: CA07268139  Readings: 0004  Records: 001
Interval: 1 hour
------------------------------------------------------
Date & Time, Sample, Packet, ID, Reading, Status
2010-02-08 10:02, 0001, 001, 0007268139, 00000972, 063
2010-02-08 11:02, 0002, 001, 0007268139, 00000972, 063
2010-02-08 12:02, 0003, 001, 0007268139, 00000972, 063
2010-02-08 13:02, 0004, 001, 0007268139, 00000972, 063
2010-02-08 14:02, 0005, 001, 0007268139, 00000000, 000
2010-02-08 15:02, 0006, 001, 0007268139, 00000000, 000
2010-02-08 16:02, 0007, 001, 0007268139, 00000000, 000
2010-02-08 17:02, 0008, 001, 0007268139, 00000000, 000
<end>

Basically, any line that begins with a date needs to be kept.  All other information has to be discarded, so I will end up with a text file that looks like this:

2010-02-08 10:02, 0001, 001, 0007268139, 00000972, 063
2010-02-08 11:02, 0002, 001, 0007268139, 00000972, 063
2010-02-08 12:02, 0003, 001, 0007268139, 00000972, 063
2010-02-08 13:02, 0004, 001, 0007268139, 00000972, 063
2010-02-08 14:02, 0005, 001, 0007268139, 00000000, 000
2010-02-08 15:02, 0006, 001, 0007268139, 00000000, 000
2010-02-08 16:02, 0007, 001, 0007268139, 00000000, 000
2010-02-08 17:02, 0008, 001, 0007268139, 00000000, 000

What's the best way to do this?  I tried removing the junk lines with the attached code but it doesn't seem to work.  This is while reading the file line by line.  I've only included what I deem to be the relevant code (that does the cleaning).  I don't need to keep the header lines either.  Thanks a bunch!


If Microsoft.VisualBasic.Left(ioLine, 7) = "ID Num:" Then
                ioLine = ""
            End If
            If Microsoft.VisualBasic.Left(ioLine, 3) = "---" Then
                ioLine = ""
            End If
            If Microsoft.VisualBasic.Left(ioLine, 4) = "Date" Then
                ioLine = ""
            End If

Open in new window

Avatar of hes
hes
Flag of United States of America image

Can you just look only for the year 2010

If Microsoft.VisualBasic.Left(ioLine, 4) = "2010" Then
'write it back out
Avatar of LD147

ASKER

Well, until December 31, I can look just for 2010.  Next year will be different ;)  i guess I could do something like If Microsoft.VisualBasic.Left(ioLine, 4) = "2010" or If Microsoft.VisualBasic.Left(ioLine, 4) = "2011" or If Microsoft.VisualBasic.Left(ioLine, 4) = "2012", etc....and just put a few years on, but I'm sure there's a more elegant way to do it.
ASKER CERTIFIED SOLUTION
Avatar of 13598
13598
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Microsoft.VisualBasic.Left(ioLine, 4) = Date.Now.Year.ToString
That would only work if the program runs over the file the same day.
Based on the sample file I didn't think it did but maybe it does.
What I mean is that if the program runs January 1st over the previous day Dec 31st then checking for the current year won't work.
If that is the case and you don't want to just check for numeric, then you could check for current or previous year:
mid(ioLine,1,4) = Now.Date.Year.ToString or mid(ioLine,1,4) = DateAdd(DateInterval.Year, -1, Now.Date.Year).ToString
Avatar of LD147

ASKER

That seems to work wonderfully well!  Thanks a ton.
Avatar of LD147

ASKER

13598:  your solution from earlier works well.  Only one thing, I always end up with a row of hyphens on the top ( seems to be the very first row in the file), but I'll figure out how to remove them....  thanks :)
Without the code it would be hard to help. Maybe you can step through your code and see where, how and why the first line is being written.
Avatar of LD147

ASKER

The other dotted lines are removed, no problem, just not the first line.  i always end up with this:

------------------------------------------------------
2010-02-08 10:02, 0001, 001, 0007268139, 00000972, 063
2010-02-08 11:02, 0002, 001, 0007268139, 00000972, 063
2010-02-08 12:02, 0003, 001, 0007268139, 00000972, 063
2010-02-08 13:02, 0004, 001, 0007268139, 00000972, 063
2010-02-08 14:02, 0005, 001, 0007268139, 00000000, 000
2010-02-08 15:02, 0006, 001, 0007268139, 00000000, 000
2010-02-08 16:02, 0007, 001, 0007268139, 00000000, 000
2010-02-08 17:02, 0008, 001, 0007268139, 00000000, 000

Not a biggie, although if you come up with a solution before me, feel free to post it ;)
' Load log file, clean, and show...
        Dim ioFile As New StreamReader("C:\probe\LOG_FILE.CSV")
        Dim ioLine As String ' Going to hold one line at a time
        Dim ioLines As String ' Going to hold whole file
        ioLine = ioFile.ReadLine
        ioLines = ioLine

        While Not ioLine = ""
            ioLine = ioFile.ReadLine

            If IsNumeric(Mid(ioLine, 1, 4)) Then ' only keep lines beginning with numbers (ie, the date)
                ioLines = ioLines & vbCrLf & ioLine
            End If

        End While

        txtMain.Text = ioLines ' show clean log file in window

Open in new window

Without knowing the rest of your code I would just make sure things are clear and there is no garbage left. Try this (you can never do too much cleaning). Give your string variables a value of blank in the declaration.Plus you are skipping your very  first line. You read it outside your while loop and then read again the next line without analyzing the first line read:
  Dim ioFile As New StreamReader("C:\probe\LOG_FILE.CSV")
        Dim ioLine As String = ""          ' Going to hold one line at a time
        Dim ioLines As String  = ""       ' Going to hold whole file
        ioLine = ioFile.ReadLine
        ioLines = ioLine

        While Not ioLine = ""
           
            If IsNumeric(Mid(ioLine, 1, 4)) Then ' only keep lines beginning with numbers (ie, the date)
                ioLines = ioLines & vbCrLf & ioLine
            End If
 ioLine = ioFile.ReadLine

        End While
 txtMain.Text.clear
        txtMain.Text = ioLines ' show clean log file in window
 
Avatar of LD147

ASKER

Many thanks.  It's ok now :)