LD147
asked on
Cleaning up text file with Visual Basic .net
I have a text file which contains the following:
<start>
-------------------------- ---------- ---------- ---------
ID Num: BF00000000 Readings: 0000 Records: 000
Interval: 1 hour
-------------------------- ---------- ---------- --------
Date & Time, Sample, Packet, ID, Reading, Status
-------------------------- ---------- ---------- --------
ID Num: BF00000000 Readings: 0001 Records: 000
Interval: 1 hour
-------------------------- ---------- ---------- --------
Date & Time, Sample, Packet, ID, Reading, Status
-------------------------- ---------- ---------- --------
ID Num: CA07268139 Readings: 0004 Records: 001
Interval: 1 hour
-------------------------- ---------- ---------- --------
Date & Time, Sample, Packet, ID, Reading, Status
2010-02-08 10:02, 0001, 001, 0007268139, 00000972, 063
2010-02-08 11:02, 0002, 001, 0007268139, 00000972, 063
2010-02-08 12:02, 0003, 001, 0007268139, 00000972, 063
2010-02-08 13:02, 0004, 001, 0007268139, 00000972, 063
2010-02-08 14:02, 0005, 001, 0007268139, 00000000, 000
2010-02-08 15:02, 0006, 001, 0007268139, 00000000, 000
2010-02-08 16:02, 0007, 001, 0007268139, 00000000, 000
2010-02-08 17:02, 0008, 001, 0007268139, 00000000, 000
<end>
Basically, any line that begins with a date needs to be kept. All other information has to be discarded, so I will end up with a text file that looks like this:
2010-02-08 10:02, 0001, 001, 0007268139, 00000972, 063
2010-02-08 11:02, 0002, 001, 0007268139, 00000972, 063
2010-02-08 12:02, 0003, 001, 0007268139, 00000972, 063
2010-02-08 13:02, 0004, 001, 0007268139, 00000972, 063
2010-02-08 14:02, 0005, 001, 0007268139, 00000000, 000
2010-02-08 15:02, 0006, 001, 0007268139, 00000000, 000
2010-02-08 16:02, 0007, 001, 0007268139, 00000000, 000
2010-02-08 17:02, 0008, 001, 0007268139, 00000000, 000
What's the best way to do this? I tried removing the junk lines with the attached code but it doesn't seem to work. This is while reading the file line by line. I've only included what I deem to be the relevant code (that does the cleaning). I don't need to keep the header lines either. Thanks a bunch!
<start>
--------------------------
ID Num: BF00000000 Readings: 0000 Records: 000
Interval: 1 hour
--------------------------
Date & Time, Sample, Packet, ID, Reading, Status
--------------------------
ID Num: BF00000000 Readings: 0001 Records: 000
Interval: 1 hour
--------------------------
Date & Time, Sample, Packet, ID, Reading, Status
--------------------------
ID Num: CA07268139 Readings: 0004 Records: 001
Interval: 1 hour
--------------------------
Date & Time, Sample, Packet, ID, Reading, Status
2010-02-08 10:02, 0001, 001, 0007268139, 00000972, 063
2010-02-08 11:02, 0002, 001, 0007268139, 00000972, 063
2010-02-08 12:02, 0003, 001, 0007268139, 00000972, 063
2010-02-08 13:02, 0004, 001, 0007268139, 00000972, 063
2010-02-08 14:02, 0005, 001, 0007268139, 00000000, 000
2010-02-08 15:02, 0006, 001, 0007268139, 00000000, 000
2010-02-08 16:02, 0007, 001, 0007268139, 00000000, 000
2010-02-08 17:02, 0008, 001, 0007268139, 00000000, 000
<end>
Basically, any line that begins with a date needs to be kept. All other information has to be discarded, so I will end up with a text file that looks like this:
2010-02-08 10:02, 0001, 001, 0007268139, 00000972, 063
2010-02-08 11:02, 0002, 001, 0007268139, 00000972, 063
2010-02-08 12:02, 0003, 001, 0007268139, 00000972, 063
2010-02-08 13:02, 0004, 001, 0007268139, 00000972, 063
2010-02-08 14:02, 0005, 001, 0007268139, 00000000, 000
2010-02-08 15:02, 0006, 001, 0007268139, 00000000, 000
2010-02-08 16:02, 0007, 001, 0007268139, 00000000, 000
2010-02-08 17:02, 0008, 001, 0007268139, 00000000, 000
What's the best way to do this? I tried removing the junk lines with the attached code but it doesn't seem to work. This is while reading the file line by line. I've only included what I deem to be the relevant code (that does the cleaning). I don't need to keep the header lines either. Thanks a bunch!
If Microsoft.VisualBasic.Left(ioLine, 7) = "ID Num:" Then
ioLine = ""
End If
If Microsoft.VisualBasic.Left(ioLine, 3) = "---" Then
ioLine = ""
End If
If Microsoft.VisualBasic.Left(ioLine, 4) = "Date" Then
ioLine = ""
End If
ASKER
Well, until December 31, I can look just for 2010. Next year will be different ;) i guess I could do something like If Microsoft.VisualBasic.Left (ioLine, 4) = "2010" or If Microsoft.VisualBasic.Left (ioLine, 4) = "2011" or If Microsoft.VisualBasic.Left (ioLine, 4) = "2012", etc....and just put a few years on, but I'm sure there's a more elegant way to do it.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Microsoft.VisualBasic.Left (ioLine, 4) = Date.Now.Year.ToString
That would only work if the program runs over the file the same day.
Based on the sample file I didn't think it did but maybe it does.
What I mean is that if the program runs January 1st over the previous day Dec 31st then checking for the current year won't work.
If that is the case and you don't want to just check for numeric, then you could check for current or previous year:
mid(ioLine,1,4) = Now.Date.Year.ToString or mid(ioLine,1,4) = DateAdd(DateInterval.Year, -1, Now.Date.Year).ToString
Based on the sample file I didn't think it did but maybe it does.
What I mean is that if the program runs January 1st over the previous day Dec 31st then checking for the current year won't work.
If that is the case and you don't want to just check for numeric, then you could check for current or previous year:
mid(ioLine,1,4) = Now.Date.Year.ToString or mid(ioLine,1,4) = DateAdd(DateInterval.Year,
ASKER
That seems to work wonderfully well! Thanks a ton.
ASKER
13598: your solution from earlier works well. Only one thing, I always end up with a row of hyphens on the top ( seems to be the very first row in the file), but I'll figure out how to remove them.... thanks :)
Without the code it would be hard to help. Maybe you can step through your code and see where, how and why the first line is being written.
ASKER
The other dotted lines are removed, no problem, just not the first line. i always end up with this:
-------------------------- ---------- ---------- --------
2010-02-08 10:02, 0001, 001, 0007268139, 00000972, 063
2010-02-08 11:02, 0002, 001, 0007268139, 00000972, 063
2010-02-08 12:02, 0003, 001, 0007268139, 00000972, 063
2010-02-08 13:02, 0004, 001, 0007268139, 00000972, 063
2010-02-08 14:02, 0005, 001, 0007268139, 00000000, 000
2010-02-08 15:02, 0006, 001, 0007268139, 00000000, 000
2010-02-08 16:02, 0007, 001, 0007268139, 00000000, 000
2010-02-08 17:02, 0008, 001, 0007268139, 00000000, 000
Not a biggie, although if you come up with a solution before me, feel free to post it ;)
--------------------------
2010-02-08 10:02, 0001, 001, 0007268139, 00000972, 063
2010-02-08 11:02, 0002, 001, 0007268139, 00000972, 063
2010-02-08 12:02, 0003, 001, 0007268139, 00000972, 063
2010-02-08 13:02, 0004, 001, 0007268139, 00000972, 063
2010-02-08 14:02, 0005, 001, 0007268139, 00000000, 000
2010-02-08 15:02, 0006, 001, 0007268139, 00000000, 000
2010-02-08 16:02, 0007, 001, 0007268139, 00000000, 000
2010-02-08 17:02, 0008, 001, 0007268139, 00000000, 000
Not a biggie, although if you come up with a solution before me, feel free to post it ;)
' Load log file, clean, and show...
Dim ioFile As New StreamReader("C:\probe\LOG_FILE.CSV")
Dim ioLine As String ' Going to hold one line at a time
Dim ioLines As String ' Going to hold whole file
ioLine = ioFile.ReadLine
ioLines = ioLine
While Not ioLine = ""
ioLine = ioFile.ReadLine
If IsNumeric(Mid(ioLine, 1, 4)) Then ' only keep lines beginning with numbers (ie, the date)
ioLines = ioLines & vbCrLf & ioLine
End If
End While
txtMain.Text = ioLines ' show clean log file in window
Without knowing the rest of your code I would just make sure things are clear and there is no garbage left. Try this (you can never do too much cleaning). Give your string variables a value of blank in the declaration.Plus you are skipping your very first line. You read it outside your while loop and then read again the next line without analyzing the first line read:
Dim ioFile As New StreamReader("C:\probe\LOG _FILE.CSV" )
Dim ioLine As String = "" ' Going to hold one line at a time
Dim ioLines As String = "" ' Going to hold whole file
ioLine = ioFile.ReadLine
ioLines = ioLine
While Not ioLine = ""
If IsNumeric(Mid(ioLine, 1, 4)) Then ' only keep lines beginning with numbers (ie, the date)
ioLines = ioLines & vbCrLf & ioLine
End If
ioLine = ioFile.ReadLine
End While
txtMain.Text.clear
txtMain.Text = ioLines ' show clean log file in window
Dim ioFile As New StreamReader("C:\probe\LOG
Dim ioLine As String = "" ' Going to hold one line at a time
Dim ioLines As String = "" ' Going to hold whole file
ioLine = ioFile.ReadLine
ioLines = ioLine
While Not ioLine = ""
If IsNumeric(Mid(ioLine, 1, 4)) Then ' only keep lines beginning with numbers (ie, the date)
ioLines = ioLines & vbCrLf & ioLine
End If
ioLine = ioFile.ReadLine
End While
txtMain.Text.clear
txtMain.Text = ioLines ' show clean log file in window
ASKER
Many thanks. It's ok now :)
If Microsoft.VisualBasic.Left
'write it back out