Code to split up a dat file into seperate dat files

Hello experts
I have a unique issue and need a method to fix it.
We work with dat files some large, some small.  I have converted one of the dat files to text so you can see an example here: (The brackets are not in the file they are just used here to show the start and end of each line.)
(000001,01020000000000000000000018000000000000000000000000000000000000000000000e00,000000000000,UP-33xx_060104__UP-33xx_060104__)
(042530,63020100000000000000000018000000000000000000000000000000001200000010020e04,000000000000,KIWI PNAPL______KIWI PNAPL______)
(043205,63020100000000000000000018000000000000000000000000000000001200000010020e04,000000000000,MANGO___________MANGO___________)
(043243,63020100000000000000000018000000000000000000000000000000001200000010020e04,000000000000,MINTOPIA________MINTOPIA________)

The first 6 numbers are the key these are product numbers.  I need to break this dat file up into 4 seperate dat files with the following naming convention: UP%_PLUddmmyyyyhhmm.DAT

So in the end
000001,01020000000000000000000018000000000000000000000000000000000000000000000e00,000000000000,UP-33xx_060104__UP-33xx_060104__
is in one file named UP1_PLUddmmyyyyhhmm.DAT
The next line
042530,63020100000000000000000018000000000000000000000000000000001200000010020e04,000000000000,KIWI PNAPL______KIWI PNAPL______
is in one file named UP2_PLUddmmyyyyhhmm.DAT

There is one catch I only need seperate files where the numbers are not continuous.  So if there is something like this
064053,56020100000000000000001518000000000000000000000000000000000800000000000180,000199000000,B-XL-FOUNTAIN___B-XL-FOUNTAIN___
064054,56020100000000000000000918000000000000000000000000000000000800000000000180,000199000000,B-XL NC FOUNTAINB-XL NC FOUNTAIN
These should be saved into one file.  I only need seperate files for ranges that are not contiguous

Right now the process to do this is not very affective so my hope is to have a script or some code to do this for me
Thanks in advance



irishmanjbAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

nepostojeci_emailCommented:
there are numerous ways to do this, but personally I think you should
create the list of the items (because we don't know if the product numbers
are contigous in the file or not), then sort it, and then iterate through them
to make sure which ones are contigous and which ones are not.

which language you use?
irishmanjbAuthor Commented:
We are currently using a perl program but I want to get away from that.
The product ID field is consistent in the dat file it is always 6 digits.  The number range is
000001-999999.
The key is breaking up the ranges into seperate files ie

000001 and 000002 1 file
000005 1 file

I do not have the ability to sort I just get a Dat file and then I split it up using the current program we have.

Thanks

_iskywalker_Commented:
Hi! it would be nice, if you say which tools you have (windows, c compile, etc.)
It would also be nice if you tel us the format exactly but it seems a c code could do that pretty fast. (well a shell code could do that also, but i think it is more complicated)
C++ 11 Fundamentals

This course will introduce you to C++ 11 and teach you about syntax fundamentals.

irishmanjbAuthor Commented:
Thanks for your response
I am not a programmer I am an Analyst and my range is limited to some VBS, batch and scripting.  I may know enough just to get myself in trouble.

As to the format this is one line.  These are binary dat files that are generated from a perl program. The perl program creates one large dat file and that is the format it is give to me in.  

Looking down all of the lines are just consistent lenght (this is converted to text)
205215,56020100000000000000006718000000000000000000000000000000004900000030000a10,000349,RG CRMY BREEZE__

Thanks
_iskywalker_Commented:
So you can use perl?
irishmanjbAuthor Commented:
No just some scripting.  I am working on getting the correct data format so the header information is broken out correctly would that help? I
_iskywalker_Commented:
The problem is i dont know in what language you want the solucion...
irishmanjbAuthor Commented:
Ok sorry,
Any scripted language that is turnkey for me would be fine.  I was hoping that the code could be posted and then I could put into some type of script. I do not have a compiler or and way to take create an executable.

Thanks
nepostojeci_emailCommented:
you can use some Text Editors like UltraEdit32 or similar to perform the regular expression
search and replace on the file, and I think that would be the quickest solution, but since your
file is a binary file, I'm not sure if that is possible.. :(
_iskywalker_Commented:
i didnt get the  I only need seperate files for ranges that are not contiguous, i am almost finished
_iskywalker_Commented:
argh, i wanted to know what should be on the same file, since product number changed, the named changed, almost all was different in your example (quite difficult to see they belong to same group).
mvidasCommented:
jb,

Not sure if you want to use VBS for this (I know you mentioned it earlier), but if you do, paste the following into a .vbs file:

 If WScript.Arguments.Count = 0 Then
  MsgBox "You must send DAT file(s) to this script to parse them. Exiting."
  WScript.Quit
 End If
 Dim FSO, ts, tArr, DatFile(), DateTime, Cnt, strFile, i, CurProd, vPath, fNum
 DateTime = Right("0" & Day(Now), 2) & Right("0" & Month(Now), 2) & Year(Now) & Right("0" & Hour(Now), 2) & Right("0" & Minute(Now), 2)
 Set FSO = CreateObject("scripting.filesystemobject")
 For Each strFile In WScript.Arguments
  Cnt = 0
  ReDim DatFile(Cnt)
  Set ts = FSO.GetFile(strFile)
  vPath = Left(ts.Path, Len(ts.Path) - Len(ts.Name))
  Set ts = FSO.OpenTextFile(strFile)
  Do Until ts.AtEndOfStream
   ReDim Preserve DatFile(Cnt)
   DatFile(Cnt) = ts.ReadLine
   Cnt = Cnt + 1
  Loop
  ts.Close
  TriQuickSortString DatFile
  fNum = 1
  CurProd = ""
  For i = 0 To Cnt - 1
   tArr = Split(DatFile(i), ",")
   If Len(CurProd) = 0 Then
    CurProd = tArr(0)
    Set ts = FSO.CreateTextFile(vPath & "UP" & CStr(fNum) & "_PLU" & DateTime & ".DAT")
   ElseIf Abs(CLng(CurProd) - CLng(tArr(0))) > 1 Then
    ts.Close
    fNum = fNum + 1
    CurProd = tArr(0)
    Set ts = FSO.CreateTextFile(vPath & "UP" & CStr(fNum) & "_PLU" & DateTime & ".DAT")
   Else
    CurProd = tArr(0)
   End If
   ts.WriteLine DatFile(i)
  Next
  ts.Close
 Next
 MsgBox "Done!"
Public Sub TriQuickSortString(ByRef sArray())
 Dim iLB, iUB, i, j, sTemp
 iLB = LBound(sArray)
 iUB = UBound(sArray)
 TriQuickSortString2 sArray, 4, iLB, iUB
 InsertionSortString sArray, iLB, iUB
End Sub
Private Sub TriQuickSortString2(ByRef sArray(), ByVal iSplit, ByVal iMin, ByVal iMax)
 Dim i, j, sTemp
 If (iMax - iMin) > iSplit Then
  i = (iMax + iMin) / 2
  If sArray(iMin) > sArray(i) Then SwapStrings sArray(iMin), sArray(i)
  If sArray(iMin) > sArray(iMax) Then SwapStrings sArray(iMin), sArray(iMax)
  If sArray(i) > sArray(iMax) Then SwapStrings sArray(i), sArray(iMax)
  j = iMax - 1
  SwapStrings sArray(i), sArray(j)
  i = iMin
  sTemp = sArray(j)
  Do
   Do
    i = i + 1
   Loop While sArray(i) < sTemp
   Do
    j = j - 1
   Loop While sArray(j) > sTemp
   If j < i Then Exit Do
   SwapStrings sArray(i), sArray(j)
  Loop
  SwapStrings sArray(i), sArray(iMax - 1)
  TriQuickSortString2 sArray, iSplit, iMin, j
  TriQuickSortString2 sArray, iSplit, i + 1, iMax
 End If
 i = 0
End Sub
Private Sub InsertionSortString(ByRef sArray(), ByVal iMin, ByVal iMax)
 Dim i, j, sTemp
 For i = iMin + 1 To iMax
  sTemp = sArray(i)
  j = i
  Do While j > iMin
   If sArray(j - 1) <= sTemp Then Exit Do
   sArray(j) = sArray(j - 1)
   j = j - 1
  Loop
  sArray(j) = sTemp
 Next 'i
 i = 0
End Sub
Private Sub SwapStrings(ByRef s1, ByRef s2)
 Dim i
 i = s1
 s1 = s2
 s2 = i
End Sub

Then either click/drag your DAT file(s) into the .vbs file, or put the .vbs file in C:\Documents and Settings\irishmanjb\SendTo   folder and then right-click your DAT files and choose Send To then the .vbs file name.

Matt
mvidasCommented:
My apologies, I forgot to set the variables to nothing at the end of each section.

Just before the 'Msgbox "Done!"' line, paste in:
 Set FSO = Nothing
 Set ts = Nothing
 Set tArr = Nothing
 Set DatFile = Nothing
 Set DateTime = Nothing
 Set Cnt = Nothing
 Set strFile = Nothing
 Set i = Nothing
 Set CurProd = Nothing
 Set vPath = Nothing
 Set fNum = Nothing

At the end of TriQuickSortString, paste in:
 Set iLB = Nothing
 Set iUB = Nothing
 Set i = Nothing
 Set j = Nothing
 Set sTemp = Nothing

At the end of TriQuickSortString2 and the end of InsertionSortString, paste in:
 Set i = Nothing
 Set j = Nothing
 Set sTemp = Nothing

And at the end of SwapStrings, paste in:
 Set i = Nothing

Sorry about that
Matt
irishmanjbAuthor Commented:
Matt
 I will try it now thanks.
irishmanjbAuthor Commented:
Matt
Where is the end of SwapStrings,?
Thanks
mvidasCommented:
SwapStrings is the last sub there, you'd change it to:

Private Sub SwapStrings(ByRef s1, ByRef s2)
 Dim i
 i = s1
 s1 = s2
 s2 = i
 Set i = Nothing
End Sub

Though you don't technically need to set the variables to nothing, it is just better for memory that way.
irishmanjbAuthor Commented:
Matt
I get error
Row 28 Character 4 type mismatch clng
mvidasCommented:
Do you have any lines in the file that are not in the format you've explained? The lines you've shown us have all been the same format, are there header lines in there or anything? I wrote it assuming the file was just a data dump and nothing else.  If there are other lines, are they 127 characters long as well?

If not, you could add a new variable (tempStr):
 Dim FSO, ts, tArr, DatFile(), DateTime, Cnt, strFile, i, CurProd, vPath, fNum, tempStr

Then verify that each line is 127 characters long:
  Do Until ts.AtEndOfStream
   tempStr = ts.ReadLine
   If Len(tempStr) = 127 then
    ReDim Preserve DatFile(Cnt)
    DatFile(Cnt) = ts.ReadLine
    Cnt = Cnt + 1
   End If
  Loop

And clear that memory at the end (just before the msgbox) too:
 Set tempStr = Nothing

If they could be 127 characters long, we can verify the pattern using Regular Expressions.  Just let me know and I can add it in.
irishmanjbAuthor Commented:
I am trying to track down the data file documentation now.  This should tell me of any Header Lines.
Please keep in mind that the line I gave you is the text format.
When I use the dat2text program to convert it that is the line I get.

Thanks
irishmanjbAuthor Commented:
Matt
Actually when I convert my DAT file to text and drag it into the script it does exactly what you wrote it to do.  However
the format is not correct.  There must be a header in the Binary that does not get converted to text
irishmanjbAuthor Commented:
I attempted to use one of the files and get an error
"File to download has no header"

I need to figure out what the header is
mvidasCommented:
Hmm, in all honesty I thought it was just a text file with a .dat extension.  VBS/FSO work well with text files, I'll have to play around with it to figure out about your binary files.  Is there any way you can provide a sample file?

_iskywalker_, any luck with your perl version? Isn't there a standalone freeware app that lets you use perl scripts like vbscript?
irishmanjbAuthor Commented:
I would be happy to provide a sample file.

Please advise
mvidasCommented:
If you want to email the file to me (my address is in my profile), I can upload it to my webserver and post a link here for everyone to get it
irishmanjbAuthor Commented:
I sent the whole file and the broken up files.
 Basically the broekn up files are the desired result.  What I need is a new process to get to that result.
The way it is done now is much to time consuming.

Thanks for everyone input
mvidasCommented:
Email:
"Matt
Thanks for taking a look at this.

2 zip files.  The first is the whole file.  How I receive it.
The 2nd is separate files the end result I need.

The process for braking up the whole file into smaller files needs to change.  The process takes to long.
We are hoping to have a simple script or exe to do the job.

Thanks"

Files:
http://ee.hastalavidas.com/Q_21776363/SeperateFiles.zip
http://ee.hastalavidas.com/Q_21776363/Wholefile.zip

I'm going to start to take a look now.  Just to clarify, if _iskywalker_ or someone could come up with an executable for you, that would be ok?
irishmanjbAuthor Commented:
Sure
mvidasCommented:
After playing around and looking at the binary files multiple times, I can see the data in there but a few bytes are different in some of the outputted files that should be the same.  Any method I gave you would just be me guessing at how you wanted it, I wouldn't be able to guarantee it to be correct so I will be stepping away from this question for now.  Perhaps the above method would work if need be, but hopefully someone else can chime in with an answer.  Also, if it helps you or anyone else, the following functions allow you to work with binary files like these from vbscript:

'The follow methods slightly modified from those at motobit.com, specifically these pages
'http://www.motobit.com/tips/detpg_BinASP/
'http://www.motobit.com/tips/detpg_read-write-binary-files/
Sub goahead()
 Dim tStr
 tStr = BinaryToString(ReadBinaryFile("C:\jb\Wholefile\DDspring06.dat"))
 
 'method one
 With CreateObject("scripting.filesystemobject").CreateTextFile("C:\jb\Wholefile\DDspring06-1.dat")
  .Write tStr
  .Close
 End With
 
 'method two
 SaveBinaryDataTextStream "C:\jb\Wholefile\DDspring06-2.dat", Stream_StringToBinary(tStr, "")
End Sub

Function SaveBinaryDataTextStream(FileName, ByteArray)
 Dim FS, TS
 Set FS = CreateObject("Scripting.FileSystemObject")
 Set TS = FS.CreateTextFile(FileName)
 TS.Write BinaryToString(ByteArray)
End Function

Function BinaryToString(Binary)
 Dim i, tStr
 For i = 1 To LenB(Binary)
  tStr = tStr & Chr(AscB(MidB(Binary, i, 1)))
 Next
 BinaryToString = tStr
End Function

Function ReadBinaryFile(FileName)
 Const adTypeBinary = 1
 
 'Create Stream object
 Dim BinaryStream
 Set BinaryStream = CreateObject("ADODB.Stream")
 
 'Specify stream type - we want To get binary data.
 BinaryStream.type = adTypeBinary
 
 'Open the stream
 BinaryStream.Open
 
 'Load the file data from disk To stream object
 BinaryStream.LoadFromFile FileName
 
 'Open the stream And get binary data from the object
 ReadBinaryFile = BinaryStream.Read
End Function

Function Stream_StringToBinary(Text, CharSet)
 Const adTypeText = 2
 Const adTypeBinary = 1
 
 'Create Stream object
 Dim BinaryStream 'As New Stream
 Set BinaryStream = CreateObject("ADODB.Stream")
 
 'Specify stream type - we want To save text/string data.
 BinaryStream.type = adTypeText
 
 'Specify charset For the source text (unicode) data.
 If Len(CharSet) > 0 Then
  BinaryStream.CharSet = CharSet
 Else
  BinaryStream.CharSet = "us-ascii"
 End If
 
 'Open the stream And write text/string data To the object
 BinaryStream.Open
 BinaryStream.WriteText Text
 
 'Change stream type To binary
 BinaryStream.Position = 0
 BinaryStream.type = adTypeBinary
 
 'Ignore first two bytes - sign of
 BinaryStream.Position = 0
 
 'Open the stream And get binary data from the object
 Stream_StringToBinary = BinaryStream.Read
End Function

Sorry I could not give you your answer.
Matt

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
irishmanjbAuthor Commented:
Matt
I am going to try a different approach.  For your efforts I am going to award you the pionts.

Regards
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Programming

From novice to tech pro — start learning today.