Link to home
Create AccountLog in
Avatar of elwayisgod
elwayisgodFlag for United States of America

asked on

Another splice up of a bunch of files

DOS runs out of memory thus was going to see if easy VB way.  Source files are 2GB each.

I have 10 source files that are ! delimited.  The first row is a header row.  The first column in each file is the Year column.  It appears as:  "2012"!"January"!

I need a script that will read every .txt file in the directory and create new files based on the column 1 value.  Each new file will only contain the rows of the same year.  For Example all rows that begin with "2012" will be a new file named the same as the original but prefixed with '2012'.  So if the original file name was 'file.txt' new filename would be '2012file.txt' and so on.  I need the header row in each new file too.

Avatar of -Mystique-

Perhaps the information at one of the below links will help you.

vbscript to split very large text files

Function Creates Multidimensional Arrays from Delimited Text Files
 This VBScript user-defined function can help streamline many text-based processes

Split text fiel searching for specific string of text and saving in mutltiple directories

Working With Arrays in VBScript

Scripts to manage Text Files

Topics for Writing or Appending to a File with VBScript

 SciTE A free source code editing component for Win32 and GTK+, including VB & VBScript..  
SciTE is a SCIntilla based Text Edito
Text editing in SciTE works similarly to most Macintosh or Windows editors with the added feature of automatic syntax styling.

HxD - Freeware Hex Editor and Disk Editor
•Available as a portable and installable edition
•RAM-Editor ¿To edit the main memory
¿Memory sections are tagged with data-folds
•Disk-Editor (Hard disks, floppy disks, ZIP-disks, USB flash drives, CDs, ...) ¿RAW reading and writing of disks and drives
¿for Win9x, WinNT and higher
•Instant opening regardless of file-size ¿Up to 8GB, opening and editing is very fast
•Liberal but safe file sharing with other programs
•Flexible and fast searching/replacing for several data types ¿Data types: text (including Unicode), hex-values, integers and floats
¿Search direction: Forward, Backwards, All (starting from the beginning)
•File compare (simple)
•View data in Ansi, DOS, EBCDIC and Macintosh character sets
•Checksum-Generator: Checksum, CRCs, Custom CRC, SHA-1, SHA-512, MD5, ...
•Exporting of data to several formats ¿Source code (Pascal, C, Java, C#, VB.NET)
¿Formatted output (plain text, HTML, Richtext, TeX)
¿Hex files (Intel HEX, Motorola S-record)
•Insertion of byte patterns
•File tools ¿File shredder for safe file deletion
¿Splitting or concatenating of files
•Basic data analysis (statistics) ¿Graphical representation of the byte/character distribution
¿Helps to identify the data type of a selection
•Byte grouping ¿1, 2, 4, 8 or 16 bytes packed together into one column
•"Hex only" or "text only"-modes
•Progress-window for lengthy operations ¿Shows the remaining time
¿Button to cancel
•Modified data is highlighted
•Unlimited undo
•"Find updates..."-function
•Easy to use and modern interface
•Goto address
•Overwrite or insert mode
•Cut, copy, paste insert, paste write
•Clipboard support for other hex editors ¿Visual Studio/Visual C++, WinHex, HexWorkshop, ...
•Bookmarks ¿Ctrl+Shift+Number (0-9) sets a bookmark
¿Ctrl+Number (0-9) goes to a bookmark
•Navigating to nibbles with Ctrl+Left or Ctrl+Right
•Flicker free display and fast drawing
Avatar of aikimark
Does year data appear in more than one of these files?  That is, might we find 2012 lines in more than one file?
Avatar of elwayisgod


Yes.  no telling where the 2012 is
So, solutions need to check for an existing target (year) file and only append the line data, rather than including the header.

Will the target files ever exceed 2GB?
I doubt it but hard to say.
If you can work it into your process, you would simplify all this processing by eliminating the header row from the front of each data file.  You would only need to append the data to the appropriate year file.  If you needed the header at a later time, you could use a simple copy command or convert the header file into a schema.ini file.
Is PowerShell an option here? The average speed I got with my test data is about 1MB per second. That is 35 minutes per 2GB file.
Its Windows Server 2003.  Is.Powershell built into the OS?
No, you will have to download it. All necessary info can be found at .
What extra software do you have on your server?  For instance, if you have Syncsort, you should get much better performance due to its I/O optimization.
Nothing that I know of.  We are looking possibly to try Cygwin.  But not sure.  Thinking Linux could do this file manipulation faster than DOS Batch?????
Avatar of aikimark
Flag of United States of America image

Link to home
Create an account to see this answer
Signing up is free. No credit card required.
Create Account

It runs in two seconds but produces a blank file.... No idea what's happening.
Did you run this in the same directory as the big data file?
Did you substitute your big file name in place of Q_27976301_Data.txt ?

Note: if the big file name contains spaces, you need to put the file name in quotes.
It would be helpful if you posted the first three lines in the file.
Back tomorrow to try and resolve.
Did that (http:#a38722808) really work for you? It did not for me.

Did that command extract the 2012 rows?
It did but got more that I bargained for.  But my question wasn't fully qualified with all the requirements.  Thus I just want to move on.