Solved

VBSCRIPT Parse Large file into smaller files based on header info

Posted on 2009-04-08
7
616 Views
Last Modified: 2012-05-06
I need help creating a script that will read a large file that contains multiple records (no CR/LF) and, based on a known header ("1MHG"),  will create individual files.  The records can be long so I'm not sure using arrays will work.
Example: the file "ALL.TXT" contains 45 records in one long string. They all start with the header info "1MHG". Knowing that, we can find the start of the next record, because it will also start with "1MHG".  We do not know how the record ends.  I need to be able to strip each record, from the beginning of the "1MHG" occurance to the next occurance, and write 45 files. Each file will be named sequentially (file1,txt, file2.txt, etc). I have the code that opens the file and creates the output and names them sequentially. I just cannot get the part that parses the string and strips out the records.
Thanks
0
Comment
Question by:mannyms
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
7 Comments
 
LVL 25

Accepted Solution

by:
kevp75 earned 500 total points
ID: 24106205
post your code?

basically what you'd be looking to do is a split on the string, and then loop through the array created.  So long as your machine doesn't have only 256M of RAM you should be all set with this method ;)

strVar = Split(objFile.ReadAll, "1MHG")
If IsArray(strVar) Then
     For i = 0 To Ubound(strVar) - 1
         set blah = blahblah.CreateTextFile(named....txt)
             blah.WriteLine(strVat(i))
         set blah = Nothing
      Next : i = Null
End If


(this is untested code, and the variables are unknow...please change what is necessary)
0
 
LVL 25

Expert Comment

by:kevp75
ID: 24106222
is the header for the files always on the same line?   You can read in the first line for the value of the split...
0
 

Author Comment

by:mannyms
ID: 24106813
The file is one Loooooong physical record with multiple logical records embedded, but they always start with the string "1MHG". They can be really large as well. My test file has only 45 records but is 1MB. We are expecting files that could potentially contain 100s if not 1000s or logical records. So the importance of doing this optimally is important.  
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 25

Expert Comment

by:kevp75
ID: 24107313
can you post your existing code?   Let's see if we can modify it to handle this.

Take out (or x out) anything incriminating :)
0
 

Author Comment

by:mannyms
ID: 24108119
This is working well except that the "1MHG" is being stripped out of the array.

Dim objFSO, objFolder, objShell, objInputFile, objOutput, objFile, count_r
Dim strDirectory, strInputFile, strOutputFile
count_r = 1
tmpDate = Format(Date, "mmddyy")
tmpTime = Format(Time, "hhmmss")
'---determine actual input/output locations later---
strInputFile = "c:\all.al3"
strDirectory = "c:\AL3_" & tmpDate & tmpTime
strOutputFile = "\file" & count_r & ".al3"
'---------------------------------------------------
Set objFSO = CreateObject("Scripting.FileSystemObject")
 
If objFSO.FolderExists(strDirectory) Then
   Set objFolder = objFSO.GetFolder(strDirectory)
Else
   Set objFolder = objFSO.CreateFolder(strDirectory)
End If
 
Set objInputFile = Nothing
Set objOutputFile = Nothing
Set objFolder = Nothing
 
Const ForAppending = 8, ForReading = 1, ForWriting = 2
 
Set objInputFile = objFSO.OpenTextFile(strInputFile, ForReading)
 
strVar = Split(objInputFile.ReadAll, "1MHG")
If IsArray(strVar) Then
     For i = 1 To UBound(strVar)
        strOutputFile = "\file" & count_r & ".al3"
        If objFSO.FileExists(strDirectory & strOutputFile) Then
            '----add code = delete the file and recreate it.
            Set objFolder = objFSO.GetFolder(strDirectory)
        Else
            
            Set objOutputFile = objFSO.CreateTextFile(strDirectory & strOutputFile)
        End If
             objOutputFile.WriteLine (strVar(i))
             objOutputFile.Close
            Set objOutputFile = Nothing
            count_r = count_r + 1
      Next: i = Null
End If

Open in new window

0
 

Author Comment

by:mannyms
ID: 24108390
Just got updated to a spec issue:(will create separate post if needed)
within the array for each record, starting at position 69 for 6 bytes, I must replace the existing data (almost always spaces) with "AWF   ". Positions 1-68 and positions 72 thru the end (variable) are written as is.

Example
(See code snippet for fixed length font)
1MHG161...IBM732PROFORM.....IBM9084212NJ................APPPAC............107X11
1MHG161...IBM732PROFORM.....IBM9084212NJ................APPPAC......AFW...107X11
--------------------------------------------------------------------^^^---------

Open in new window

0
 

Author Closing Comment

by:mannyms
ID: 31568181
Provided the direction needed to complete. Excellent contributor.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This script will sweep a range of IP addresses (class c only, 255.255.255.0) and report to a log the version of office installed. What it does: 1.)      Creates log file in the directory the script is run from (if it doesn't already exist) 2.)      Sweep…
Deploying a Microsoft Access application in a Citrix environment is not difficult but takes a few steps. However, Citrix system people are often of little help, as they typically know next to nothing about Access. The script provided here will take …
The Email Laundry PDF encryption service allows companies to send confidential encrypted  emails to anybody. The PDF document can also contain attachments that are embedded in the encrypted PDF. The password is randomly generated by The Email Laundr…
Finds all prime numbers in a range requested and places them in a public primes() array. I've demostrated a template size of 30 (2 * 3 * 5) but larger templates can be built such 210  (2 * 3 * 5 * 7) or 2310  (2 * 3 * 5 * 7 * 11). The larger templa…

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question