Avatar of elwayisgod
elwayisgod
Flag for United States of America asked on

Another splice up of a bunch of files

DOS runs out of memory thus was going to see if easy VB way.  Source files are 2GB each.

I have 10 source files that are ! delimited.  The first row is a header row.  The first column in each file is the Year column.  It appears as:  "2012"!"January"!

I need a script that will read every .txt file in the directory and create new files based on the column 1 value.  Each new file will only contain the rows of the same year.  For Example all rows that begin with "2012" will be a new file named the same as the original but prefixed with '2012'.  So if the original file name was 'file.txt' new filename would be '2012file.txt' and so on.  I need the header row in each new file too.

Thanks
VB ScriptVisual Basic Classic

Avatar of undefined
Last Comment
elwayisgod

8/22/2022 - Mon
-Mystique-

Perhaps the information at one of the below links will help you.

vbscript to split very large text files
http://stackoverflow.com/questions/4606367/vbscript-to-split-very-large-text-files

Function Creates Multidimensional Arrays from Delimited Text Files
 This VBScript user-defined function can help streamline many text-based processes
http://www.windowsitpro.com/article/user-defined-function-udf/function-creates-multidimensional-arrays-from-delimited-text-files

Split text fiel searching for specific string of text and saving in mutltiple directories
http://stackoverflow.com/questions/12255167/split-text-fiel-searching-for-specific-string-of-text-and-saving-in-mutltiple-di

Working With Arrays in VBScript
http://www.aspfree.com/c/a/windows-scripting/working-with-arrays-in-vbscript/

Scripts to manage Text Files
http://www.activexperts.com/activmonitor/windowsmanagement/adminscripts/other/textfiles/

Topics for Writing or Appending to a File with VBScript
http://www.computerperformance.co.uk/vbscript/vbscript_file_opentextfile.htm

 SciTE A free source code editing component for Win32 and GTK+, including VB & VBScript..  
SciTE is a SCIntilla based Text Edito
http://www.scintilla.org/index.html
http://www.scintilla.org/ScintillaDownload.html
Text editing in SciTE works similarly to most Macintosh or Windows editors with the added feature of automatic syntax styling.
http://www.scintilla.org/SciTEDoc.html

HxD - Freeware Hex Editor and Disk Editor
http://mh-nexus.de/en/hxd/
Features
•Available as a portable and installable edition
•RAM-Editor ¿To edit the main memory
¿Memory sections are tagged with data-folds
•Disk-Editor (Hard disks, floppy disks, ZIP-disks, USB flash drives, CDs, ...) ¿RAW reading and writing of disks and drives
¿for Win9x, WinNT and higher
•Instant opening regardless of file-size ¿Up to 8GB, opening and editing is very fast
•Liberal but safe file sharing with other programs
•Flexible and fast searching/replacing for several data types ¿Data types: text (including Unicode), hex-values, integers and floats
¿Search direction: Forward, Backwards, All (starting from the beginning)
•File compare (simple)
•View data in Ansi, DOS, EBCDIC and Macintosh character sets
•Checksum-Generator: Checksum, CRCs, Custom CRC, SHA-1, SHA-512, MD5, ...
•Exporting of data to several formats ¿Source code (Pascal, C, Java, C#, VB.NET)
¿Formatted output (plain text, HTML, Richtext, TeX)
¿Hex files (Intel HEX, Motorola S-record)
•Insertion of byte patterns
•File tools ¿File shredder for safe file deletion
¿Splitting or concatenating of files
•Basic data analysis (statistics) ¿Graphical representation of the byte/character distribution
¿Helps to identify the data type of a selection
•Byte grouping ¿1, 2, 4, 8 or 16 bytes packed together into one column
•"Hex only" or "text only"-modes
•Progress-window for lengthy operations ¿Shows the remaining time
¿Button to cancel
•Modified data is highlighted
•Unlimited undo
•"Find updates..."-function
•Easy to use and modern interface
•Goto address
•Printing
•Overwrite or insert mode
•Cut, copy, paste insert, paste write
•Clipboard support for other hex editors ¿Visual Studio/Visual C++, WinHex, HexWorkshop, ...
•Bookmarks ¿Ctrl+Shift+Number (0-9) sets a bookmark
¿Ctrl+Number (0-9) goes to a bookmark
•Navigating to nibbles with Ctrl+Left or Ctrl+Right
•Flicker free display and fast drawing
aikimark

Does year data appear in more than one of these files?  That is, might we find 2012 lines in more than one file?
elwayisgod

ASKER
Yes.  no telling where the 2012 is
Your help has saved me hundreds of hours of internet surfing.
fblack61
aikimark

So, solutions need to check for an existing target (year) file and only append the line data, rather than including the header.

Will the target files ever exceed 2GB?
elwayisgod

ASKER
I doubt it but hard to say.
aikimark

If you can work it into your process, you would simplify all this processing by eliminating the header row from the front of each data file.  You would only need to append the data to the appropriate year file.  If you needed the header at a later time, you could use a simple copy command or convert the header file into a schema.ini file.
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
Qlemo

Is PowerShell an option here? The average speed I got with my test data is about 1MB per second. That is 35 minutes per 2GB file.
elwayisgod

ASKER
Its Windows Server 2003.  Is.Powershell built into the OS?
Qlemo

No, you will have to download it. All necessary info can be found at http://support.microsoft.com/kb/968929 .
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
aikimark

What extra software do you have on your server?  For instance, if you have Syncsort, you should get much better performance due to its I/O optimization.
elwayisgod

ASKER
Nothing that I know of.  We are looking possibly to try Cygwin.  But not sure.  Thinking Linux could do this file manipulation faster than DOS Batch?????
ASKER CERTIFIED SOLUTION
aikimark

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
elwayisgod

ASKER
Aikimark,

It runs in two seconds but produces a blank file.... No idea what's happening.
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
aikimark

Did you run this in the same directory as the big data file?
Did you substitute your big file name in place of Q_27976301_Data.txt ?

Note: if the big file name contains spaces, you need to put the file name in quotes.
aikimark

It would be helpful if you posted the first three lines in the file.
elwayisgod

ASKER
Back tomorrow to try and resolve.
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
Qlemo

Did that (http:#a38722808) really work for you? It did not for me.
aikimark

@elwayisgod

Did that command extract the 2012 rows?
elwayisgod

ASKER
It did but got more that I bargained for.  But my question wasn't fully qualified with all the requirements.  Thus I just want to move on.
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.