DOS runs out of memory thus was going to see if easy VB way. Source files are 2GB each.
I have 10 source files that are ! delimited. The first row is a header row. The first column in each file is the Year column. It appears as: "2012"!"January"!
I need a script that will read every .txt file in the directory and create new files based on the column 1 value. Each new file will only contain the rows of the same year. For Example all rows that begin with "2012" will be a new file named the same as the original but prefixed with '2012'. So if the original file name was 'file.txt' new filename would be '2012file.txt' and so on. I need the header row in each new file too.
Thanks
VB ScriptVisual Basic Classic
Last Comment
elwayisgod
8/22/2022 - Mon
-Mystique-
Perhaps the information at one of the below links will help you.
HxD - Freeware Hex Editor and Disk Editor http://mh-nexus.de/en/hxd/
Features
•Available as a portable and installable edition
•RAM-Editor ¿To edit the main memory
¿Memory sections are tagged with data-folds
•Disk-Editor (Hard disks, floppy disks, ZIP-disks, USB flash drives, CDs, ...) ¿RAW reading and writing of disks and drives
¿for Win9x, WinNT and higher
•Instant opening regardless of file-size ¿Up to 8GB, opening and editing is very fast
•Liberal but safe file sharing with other programs
•Flexible and fast searching/replacing for several data types ¿Data types: text (including Unicode), hex-values, integers and floats
¿Search direction: Forward, Backwards, All (starting from the beginning)
•File compare (simple)
•View data in Ansi, DOS, EBCDIC and Macintosh character sets
•Checksum-Generator: Checksum, CRCs, Custom CRC, SHA-1, SHA-512, MD5, ...
•Exporting of data to several formats ¿Source code (Pascal, C, Java, C#, VB.NET)
¿Formatted output (plain text, HTML, Richtext, TeX)
¿Hex files (Intel HEX, Motorola S-record)
•Insertion of byte patterns
•File tools ¿File shredder for safe file deletion
¿Splitting or concatenating of files
•Basic data analysis (statistics) ¿Graphical representation of the byte/character distribution
¿Helps to identify the data type of a selection
•Byte grouping ¿1, 2, 4, 8 or 16 bytes packed together into one column
•"Hex only" or "text only"-modes
•Progress-window for lengthy operations ¿Shows the remaining time
¿Button to cancel
•Modified data is highlighted
•Unlimited undo
•"Find updates..."-function
•Easy to use and modern interface
•Goto address
•Printing
•Overwrite or insert mode
•Cut, copy, paste insert, paste write
•Clipboard support for other hex editors ¿Visual Studio/Visual C++, WinHex, HexWorkshop, ...
•Bookmarks ¿Ctrl+Shift+Number (0-9) sets a bookmark
¿Ctrl+Number (0-9) goes to a bookmark
•Navigating to nibbles with Ctrl+Left or Ctrl+Right
•Flicker free display and fast drawing
aikimark
Does year data appear in more than one of these files? That is, might we find 2012 lines in more than one file?
elwayisgod
ASKER
Yes. no telling where the 2012 is
Your help has saved me hundreds of hours of internet surfing.
fblack61
aikimark
So, solutions need to check for an existing target (year) file and only append the line data, rather than including the header.
Will the target files ever exceed 2GB?
elwayisgod
ASKER
I doubt it but hard to say.
aikimark
If you can work it into your process, you would simplify all this processing by eliminating the header row from the front of each data file. You would only need to append the data to the appropriate year file. If you needed the header at a later time, you could use a simple copy command or convert the header file into a schema.ini file.
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
aikimark
What extra software do you have on your server? For instance, if you have Syncsort, you should get much better performance due to its I/O optimization.
elwayisgod
ASKER
Nothing that I know of. We are looking possibly to try Cygwin. But not sure. Thinking Linux could do this file manipulation faster than DOS Batch?????
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
Unlimited question asking, solutions, articles and more.
aikimark
Did you run this in the same directory as the big data file?
Did you substitute your big file name in place of Q_27976301_Data.txt ?
Note: if the big file name contains spaces, you need to put the file name in quotes.
aikimark
It would be helpful if you posted the first three lines in the file.
elwayisgod
ASKER
Back tomorrow to try and resolve.
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
Qlemo
Did that (http:#a38722808) really work for you? It did not for me.
aikimark
@elwayisgod
Did that command extract the 2012 rows?
elwayisgod
ASKER
It did but got more that I bargained for. But my question wasn't fully qualified with all the requirements. Thus I just want to move on.
vbscript to split very large text files
http://stackoverflow.com/questions/4606367/vbscript-to-split-very-large-text-files
Function Creates Multidimensional Arrays from Delimited Text Files
This VBScript user-defined function can help streamline many text-based processes
http://www.windowsitpro.com/article/user-defined-function-udf/function-creates-multidimensional-arrays-from-delimited-text-files
Split text fiel searching for specific string of text and saving in mutltiple directories
http://stackoverflow.com/questions/12255167/split-text-fiel-searching-for-specific-string-of-text-and-saving-in-mutltiple-di
Working With Arrays in VBScript
http://www.aspfree.com/c/a/windows-scripting/working-with-arrays-in-vbscript/
Scripts to manage Text Files
http://www.activexperts.com/activmonitor/windowsmanagement/adminscripts/other/textfiles/
Topics for Writing or Appending to a File with VBScript
http://www.computerperformance.co.uk/vbscript/vbscript_file_opentextfile.htm
SciTE A free source code editing component for Win32 and GTK+, including VB & VBScript..
SciTE is a SCIntilla based Text Edito
http://www.scintilla.org/index.html
http://www.scintilla.org/ScintillaDownload.html
Text editing in SciTE works similarly to most Macintosh or Windows editors with the added feature of automatic syntax styling.
http://www.scintilla.org/SciTEDoc.html
HxD - Freeware Hex Editor and Disk Editor
http://mh-nexus.de/en/hxd/
Features
•Available as a portable and installable edition
•RAM-Editor ¿To edit the main memory
¿Memory sections are tagged with data-folds
•Disk-Editor (Hard disks, floppy disks, ZIP-disks, USB flash drives, CDs, ...) ¿RAW reading and writing of disks and drives
¿for Win9x, WinNT and higher
•Instant opening regardless of file-size ¿Up to 8GB, opening and editing is very fast
•Liberal but safe file sharing with other programs
•Flexible and fast searching/replacing for several data types ¿Data types: text (including Unicode), hex-values, integers and floats
¿Search direction: Forward, Backwards, All (starting from the beginning)
•File compare (simple)
•View data in Ansi, DOS, EBCDIC and Macintosh character sets
•Checksum-Generator: Checksum, CRCs, Custom CRC, SHA-1, SHA-512, MD5, ...
•Exporting of data to several formats ¿Source code (Pascal, C, Java, C#, VB.NET)
¿Formatted output (plain text, HTML, Richtext, TeX)
¿Hex files (Intel HEX, Motorola S-record)
•Insertion of byte patterns
•File tools ¿File shredder for safe file deletion
¿Splitting or concatenating of files
•Basic data analysis (statistics) ¿Graphical representation of the byte/character distribution
¿Helps to identify the data type of a selection
•Byte grouping ¿1, 2, 4, 8 or 16 bytes packed together into one column
•"Hex only" or "text only"-modes
•Progress-window for lengthy operations ¿Shows the remaining time
¿Button to cancel
•Modified data is highlighted
•Unlimited undo
•"Find updates..."-function
•Easy to use and modern interface
•Goto address
•Printing
•Overwrite or insert mode
•Cut, copy, paste insert, paste write
•Clipboard support for other hex editors ¿Visual Studio/Visual C++, WinHex, HexWorkshop, ...
•Bookmarks ¿Ctrl+Shift+Number (0-9) sets a bookmark
¿Ctrl+Number (0-9) goes to a bookmark
•Navigating to nibbles with Ctrl+Left or Ctrl+Right
•Flicker free display and fast drawing