Solved

VB Script to mimic this DOS Batch Script

Posted on 2012-12-25
36
387 Views
Last Modified: 2012-12-27
I have a DOS script working great.  Problem is it takes 45 min to churn through files.  Some are pretty big.  Was wondering if there could be and equivalent VB script that could possibly churn through this faster?

@echo off
setlocal EnableDelayedExpansion

REM get a file name and the YEAR column
for /F "tokens=1,2 delims=|" %%A in (filelist.txt) do call :Extract %%A %%B

exit /b

:Extract FileName YearColNo
set filename=%1
set colno=%2

@echo Processing file %filename%

REM Get the header line
set head=
for /F "usebackq delims=" %%F in ("%filename%") do if not defined head set head=%%F

REM Construct RegEx for matching the correct year column
set pat=
for /L %%L in (2,1,%colno%) do set "pat=!pat![^^|]*|"
set "pat=^^!pat!2004^|"

> 2004_%filename%     echo,!head!
>>2004_%filename%     findstr /R "%pat%" %filename%
REM Head will be added automatically
REM > Non_2004_%filename% findstr /V /R "%pat%" %filename%
0
Comment
Question by:elwayisgod
  • 24
  • 12
36 Comments
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38721202
I'm not sure but reading the files once instead of twice should make it a lot faster. I found an earlier version and changed that to read the files line by line. Disk caching should make that fast enough versus reading the file in memory. Splitting a very large string (when reading the whole file at once) into an array may also not be very fast.

Here's my version:
Option Explicit

' note the year is a string
Const C_YEAR = "2004" ' "2012"

Const C_DATA_DIR = ".\" ' "C:\Data\"

Const C_FILELIST = "filelist.txt"

Const C_SEP = "|"


Dim objFSO, objFilelist, arrFields, strFilename, intYearColumnNumber, objFile, blnFirst, strLine, objFileOut1, objFileOut2

Set objFSO = CreateObject("Scripting.FileSystemObject")

' get file name and year column number for each file to be processed
Set objFilelist = objFSO.OpenTextFile(C_FILELIST, 1, False)
While Not objFilelist.AtEndOfStream
	arrFields = Split(objFilelist.ReadLine, C_SEP)
	strFilename = arrFields(0)
	intYearColumnNumber = CInt(arrFields(1))
	Set objFile = objFSO.OpenTextFile(C_DATA_DIR & strFilename, 1, False)
	blnFirst = True
	While Not objFile.AtEndOfStream
		strLine = objFile.ReadLine
		If blnFirst Then
			Set objFileOut1 = objFSO.CreateTextFile(C_DATA_DIR & C_YEAR & "_" & strFilename, True)
			Set objFileOut2 = objFSO.CreateTextFile(C_DATA_DIR & "Non_" & C_YEAR & "_" & strFilename, True)
			objFileOut1.WriteLine strLine
			objFileOut2.WriteLine strLine
			blnFirst = False
		Else
			arrFields = Split(strLine, C_SEP)
			If UBound(arrFields) >= intYearColumnNumber-1 Then
				If arrFields(intYearColumnNumber-1) = C_YEAR Then
					objFileOut1.WriteLine strLine
				Else
					objFileOut2.WriteLine strLine
				End If
			End If
		End If
	Wend
	If Not blnFirst Then
		objFileOut1.Close
		Set objFileOut1 = Nothing
		objFileOut2.Close
		Set objFileOut2 = Nothing
	End If
	objFile.Close
	Set objFile = Nothing
Wend

objFilelist.Close
Set objFilelist = Nothing

Set objFSO = Nothing

Open in new window

Note: I tested with files found also in an earlier question ("2012" files). Also I saw in another question that you wanted to use a separate data directory so prepared for that.
0
 

Author Comment

by:elwayisgod
ID: 38722685
OK.  So instead of writing the rows to the 'Non_' filename how about it's just skipped saving time.  It only then just writes the lines it finds thus saving time?  If file exists already it overwrites it too.  Can the script Query the years column and automatically loop through all the years that exist in source file?  That way I don't have to create 17 of these and run them back to back to back etc....  Just trying to think of things to tighten it up...

Or how about it 'removes' the lines from original file as it goes.  Thus then next year runs, there are less and less rows as it goes through each year?

Thanks for help.... Performance is such a key with this application.
0
 

Author Comment

by:elwayisgod
ID: 38722702
So it starts with a large file but as each year gets stripped out it gets smaller and smaller thus each subsequent year should process faster?
0
 

Author Comment

by:elwayisgod
ID: 38722705
Or can is just read the line and put that line in the appropriate new file.  Then every single line gets read once and is placed into new file once.  When a row finds another year it just appends to target file as it goes?   I don't know what to suggest.  Just trying to throw out ideas that might speed it up is all
0
 

Author Comment

by:elwayisgod
ID: 38722713
Just to clarify.  The 'filelist.txt' file contains the filename and the column number of the Year.  It can be delimited anyway.  Currently it's comma delimited.  Goal is to create separate files by Year for each file in 'Filelist.txt' and include the header row from original file.  It can be appended at the very end.  Doesn't matter to me whatever is fastest.  So at end of day I would have 'x' files that all together would have same rows as original file.  Just need a faster way to process this according to client :(
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38723013
Yes I think the idea to create and append files as unique years are found is the best. I'll have a look at that. Because writing back to the file all the data from other years would take very long (with big files) until one or more years with a lot of data have been taken out.
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38723083
Here's a new version:
Option Explicit

Const C_DATA_DIR = ".\" ' "C:\Data\"

Const C_FILELIST = "filelist.txt"

Const C_SEP = "|"
Const C_SEP_FILELIST = ","


Dim objFSO, objFilelist, arrFields, strFilename, intYearColumnNumber, objFile, blnFirst, strLine, strHeader, strYear, objFileOut1, objFileOut2, colFiles

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set colFiles = CreateObject("Scripting.Dictionary")

' get file name and year column number for each file to be processed
Set objFilelist = objFSO.OpenTextFile(C_FILELIST, 1, False)
While Not objFilelist.AtEndOfStream
	arrFields = Split(objFilelist.ReadLine, C_SEP_FILELIST)
	strFilename = arrFields(0)
	intYearColumnNumber = CInt(arrFields(1))
	Set objFile = objFSO.OpenTextFile(C_DATA_DIR & strFilename, 1, False)
	blnFirst = True
	While Not objFile.AtEndOfStream
		strLine = objFile.ReadLine
		If blnFirst Then
			strHeader = strLine
			blnFirst = False
		Else
			arrFields = Split(strLine, C_SEP)
			strYear = "Non"
			If UBound(arrFields) >= intYearColumnNumber-1 Then
				If IsNumeric(arrFields(intYearColumnNumber-1)) Then
					strYear = arrFields(intYearColumnNumber-1)
				End If
			End If
			If Not colFiles.Exists(strYear) Then
				colFiles.Add strYear, objFSO.CreateTextFile(C_DATA_DIR & strYear & "_" & strFilename, True)
				colFiles(strYear).WriteLine strHeader
			End If
			colFiles(strYear).WriteLine strLine
		End If
	Wend
	objFile.Close
	Set objFile = Nothing
	For Each strYear In colFiles
		colFiles(strYear).Close
		Set colFiles(strYear) = Nothing
	Next
	colFiles.RemoveAll
Wend

objFilelist.Close
Set objFilelist = Nothing

Set objFSO = Nothing
Set colFiles = Nothing

Open in new window

0
 

Author Comment

by:elwayisgod
ID: 38723860
Robert,

I get:

D:\AppFiles\SDS\SDS_Retail_R3\Data\Working\DataFiles\Spliced\VB\splice3.vbs(21,
2) Microsoft VBScript runtime error: Subscript out of range: '[number: 1]'
0
 

Author Comment

by:elwayisgod
ID: 38723873
Nevermind.  My filelist is pipe delimted in this directory.
0
 

Author Comment

by:elwayisgod
ID: 38723917
OK.  So this seems to be working.  There is a year actually named 'Other Yr'.  Which is ok as these rows end up in the 'Non' file by themselves.  

So, if I only wanted to extract  2011, 2012 and 2013 rows, can that be done?  Maybe have a line in script where I can define which years?  In 2017 they will need to add 2016, 2017 and 2018.  But not yet.
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38723989
How's the speed? Any use testing with the original code, reading the file into memory at once and then splitting it? Probably mainly depends on the size of your biggest file.

Yes, you could add a check on line 34:
If CheckYear(CInt(arrFields(intYearColumnNumber-1))) Then strYear = arrFields(intYearColumnNumber-1)

Open in new window

and for example at the bottom of the script add:
Function CheckYear(intYear)
	Select Case intYear
		Case 2011, 2012, 2013:
			CheckYear = True
		Case Else
			CheckYear = False
	End Select
End Function

Open in new window

That last function is easy to extend when needed or even based on the current year (if that's the logic you're after).
0
 

Author Comment

by:elwayisgod
ID: 38724018
I was just getting to the speed thing.  I'm running original dos code now against a set of data files.  I put a start time and end time in the log.  How do I do this for this script?  I want to know how long it takes for every year first.  If it takes 10 min vs 40 min, I'm good.  I will run against same set of data files.  I'm calling it like:

cscript splice3.vbs > splice3.log
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38724061
If you call it like that it should be possible to just use a line like this a number of times, obviously at least at the start and at the end:
WScript.Echo Now()

Open in new window

You could test one specific year easily by adjusting the CheckYear function now. If you want to take out the 'non' year processing that would be a bit more work...
0
 

Author Comment

by:elwayisgod
ID: 38724102
OK. I ran it but it's not returning all the years for each file.  For example on file 'file.txt' it returns 2008_file.txt.  However it's not picking up 2004 which exists in the source file.
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38724118
Earlier it ran fine, so is this after adding the CheckYear function? If so, what does it look like now? The version I posted was for 2011/12/13 so maybe a typo there? If not using this new functionality, can you post the file because like I said, it was running fine before?
0
 

Author Comment

by:elwayisgod
ID: 38724131
The years in each column aren't in order necessarily.  Not sure if that matters or not.  The lines it did pick up for 2012 are correct though.  do you want a sample file?
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38724144
No I meant the code file. Order doesn't matter. New files are created as needed and kept open until the end of the input file.
0
 

Author Comment

by:elwayisgod
ID: 38724177
OK.  So here's the issue.  The dos batch returns just the header row even if it doesn't find a year.  Which I actually need.  So in 2004, no rows were found but it created a file with just the header row.  Reason I need this is because the script that uses these files is hardcoded and every year for each file must be there or it will fail.   Is that possible?

Speed is phenomenal.  Blows through everything in 5 minutes.

Sorry I was looking at the wrong directory earlier.
0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 

Author Comment

by:elwayisgod
ID: 38724187
Non Year is ok.  I just need it to be 'OtherYr_' prefix instead.
0
 

Author Comment

by:elwayisgod
ID: 38724196
This is really encouraging.  This smokes by the .bat file.. wow.....  just a few more tweaks and I think we are there
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38724199
Ah, I see. Yeah, this behaviour changed when the split changed (at the time I thought for the better) to make files for each year as necessary. So then I guess it would be needed to define a range of years you want to have and create those files (with header) up front even when no data is found, that part of the code should be easy to adjust. We could use an array so the later addition of years is still easy. I'll be back...
0
 

Author Comment

by:elwayisgod
ID: 38724208
OK. Sorry.  That one slipped by me.
0
 

Author Comment

by:elwayisgod
ID: 38724239
Also,

The script will reside in  C:\Automation\Scripts

The data files will reside in C:\Automation\Data

The resulting files need to reside in C:\Automation\Data also.

The resulting files should overwrite any existing files that exist.  Very important.  Maybe delete all files in C:\Automation\Data that begin with 2004 to 2017 or OtherYr first?
0
 

Author Comment

by:elwayisgod
ID: 38724241
I can make these separate questions so more points can be rewarded.  This is getting crazy.
0
 
LVL 35

Accepted Solution

by:
Robert Schutt earned 500 total points
ID: 38724278
How about this:
Option Explicit

Const C_DATA_DIR = ".\" ' "C:\Data\"

Const C_FILELIST = "filelist.txt"

Const C_SEP = "|"
Const C_SEP_FILELIST = ","

WScript.Echo "start - " & Now()

Dim objFSO, objFilelist, arrFields, strFilename, intYearColumnNumber, objFile, blnFirst, strLine, strHeader, strYear, objFileOut1, objFileOut2, colFiles, arrYears

arrYears = Array(2011, 2012, 2013)

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set colFiles = CreateObject("Scripting.Dictionary")

' get file name and year column number for each file to be processed
Set objFilelist = objFSO.OpenTextFile(C_FILELIST, 1, False)
While Not objFilelist.AtEndOfStream
	arrFields = Split(objFilelist.ReadLine, C_SEP_FILELIST)
	strFilename = arrFields(0)
	intYearColumnNumber = CInt(arrFields(1))
WScript.Echo "file " & strFilename & " - " & Now()
	Set objFile = objFSO.OpenTextFile(C_DATA_DIR & strFilename, 1, False)
	blnFirst = True
	While Not objFile.AtEndOfStream
		strLine = objFile.ReadLine
		If blnFirst Then
			strHeader = strLine
			blnFirst = False
			' create all files up front (except "OtherYr"...)
			Dim intYearCnt
			For intYearCnt = 0 To Ubound(arrYears)
				strYear = CStr(arrYears(intYearCnt)) ' using strings as Dictionary keys!
				If Not colFiles.Exists(strYear) Then
					colFiles.Add strYear, objFSO.CreateTextFile(C_DATA_DIR & strYear & "_" & strFilename, True)
					colFiles(strYear).WriteLine strHeader
				End If
			Next
			' if you want to pre-create "OtherYr" as well, you could do it here: uncomment the next 5 lines
			'strYear = "OtherYr"
			'If Not colFiles.Exists(strYear) Then
			'	colFiles.Add strYear, objFSO.CreateTextFile(C_DATA_DIR & strYear & "_" & strFilename, True)
			'	colFiles(strYear).WriteLine strHeader
			'End If
		Else
			arrFields = Split(strLine, C_SEP)
			strYear = "OtherYr"
			If UBound(arrFields) >= intYearColumnNumber-1 Then
				If IsNumeric(arrFields(intYearColumnNumber-1)) Then
					If CheckYear(CInt(arrFields(intYearColumnNumber-1))) Then strYear = arrFields(intYearColumnNumber-1)
				End If
			End If
			If Not colFiles.Exists(strYear) Then
				colFiles.Add strYear, objFSO.CreateTextFile(C_DATA_DIR & strYear & "_" & strFilename, True)
				colFiles(strYear).WriteLine strHeader
			End If
			colFiles(strYear).WriteLine strLine
		End If
	Wend
	objFile.Close
	Set objFile = Nothing
	For Each strYear In colFiles
		colFiles(strYear).Close
		Set colFiles(strYear) = Nothing
	Next
	colFiles.RemoveAll
Wend

objFilelist.Close
Set objFilelist = Nothing

Set objFSO = Nothing
Set colFiles = Nothing

WScript.Echo "done - " & Now()

Function CheckYear(intYear)
	Dim blnResult, intYearCounter
	blnResult = False
	For intYearCounter = 0 To UBound(arrYears)
		If arrYears(intYearCounter) = intYear Then
			blnResult = True
			Exit For
		End If
	Next
	CheckYear = blnResult
End Function

Open in new window

The directory is already configurable. Don't worry about points. Deleting files maybe better from a batch file, where you call cscript from?
0
 

Author Comment

by:elwayisgod
ID: 38724414
OK. So I'm trying to run for all years in source files.  All years are:

2004 to 2017  and  "Other Year".

Array line is:  

arrYears = Array("Other Yr", 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017 )

I'm still getting 'OtherYr' files.  That shouldn't be should it?
0
 

Author Comment

by:elwayisgod
ID: 38724426
One of my file results is empty and is named 'Other Yr_Filename.txt'.  There is another file named 'OtherYr_Filename.txt' that has all the 'Other Yr' rows in it.  Not sure why.....
0
 

Author Comment

by:elwayisgod
ID: 38724440
I just saw the OtherYr part you put in.  I uncommentd the 5 lines and am running again.
0
 

Author Comment

by:elwayisgod
ID: 38724498
OK. The resulting files are PERFECT!!!   So to fire off within another .vbs file wiill this work:

Print Time() & " - Splitting data files into individual years for load"
intErr = objShell.Run("\Scripts\splice_years.vbs", 0, False)
0
 

Author Comment

by:elwayisgod
ID: 38724682
This won't run either:

      Print Time() & " - Splitting data files into individual years for load"
         objShell.Run "D:\Scripts\splice_years.vbs"
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38724691
I was out for a bit, reading your comments just now.

I see you found the commented out bit. And assuming you took out that first entry from the Array you posted earlier, as those should all be numbers.

I'm not sure what that last post means. You don't need another .vbs file, or are you running all scripts from one other script for testing purposes? In that case post the whole code because I don't know what objShell is (I can guess but what if that is actually the problem itself)...
0
 

Author Comment

by:elwayisgod
ID: 38724775
line 254, it won't execute.  Also can the script 'splice_years.vbs' reside in a different directory than the data files it needs to splice up?  I need that to work too.  Data directory is:

D:\AppFiles\SDS\SDS_Retail_R3\Data
update.vbs
0
 

Author Comment

by:elwayisgod
ID: 38724790
Script directory is:

D:\AppFiles\SDS\SDS_Retail_R3\Scripts
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38724904
Oh wow, that was a bit more than I expected... ;-)

Well at least it's clear now that the Print function is defined, because when you posted the bit of code earlier I couldn't tell what that did.

It's a common problem that when you call a Sub (or a function without using the return value) you have to leave out the parentheses. So either
objShell.Run """C:\Windows\System32\cscript.exe"" ""D:\AppFiles\SDS\SDS_Retail_R3\Scripts\splice_years.vbs"""

Open in new window

or
intErr = objShell.Run("""C:\Windows\System32\cscript.exe"" ""D:\AppFiles\SDS\SDS_Retail_R3\Scripts\splice_years.vbs""")

Open in new window

It will usually work when there is only 1 parameter because then it's just as if you're surrounding that value with parentheses instead of using them to 'bind' the parameter(s) to the function call.

The reason I go into this is that you probably need to add the parameters
, 0, True

Open in new window

back in unless the process can run in the background (same applies to line 259). The 'True' means the calling process waits for the called process to end. This is obviously necessary if another process (or the calling process) needs to work with the output generated by the previous call. Have a look here for more info: http://technet.microsoft.com/en-us/library/ee156605.aspx

The data directory can be configured in "splice_years.vbs":
Const C_DATA_DIR = "D:\AppFiles\SDS\SDS_Retail_R3\Data\"

Open in new window

0
 

Author Comment

by:elwayisgod
ID: 38724938
I think it can be false as this takes around 5 minutes.  The other process that runs takes 20 min or so to complete before it uses this output.  That is why i needed it faster so it wouldn't slow down the process as a whole.

So this maybe:

         objShell.Run("""C:\Windows\System32\cscript.exe"" ""D:\AppFiles\SDS\SDS_Retail_R3\Scripts\splice_years.vbs""", 0, False)
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38724956
In that case don't use parentheses.
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Have you ever wanted to restrict the users input in a textbox to numbers, and while doing that make sure that they can't 'cheat' by pasting in non-numeric text? Of course you can do that with code you write yourself but it's tedious and error-prone …
Deploying a Microsoft Access application in a Citrix environment is not difficult but takes a few steps. However, Citrix system people are often of little help, as they typically know next to nothing about Access. The script provided here will take …
Get people started with the process of using Access VBA to control Excel using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Excel. Using automation, an Access application can laun…
Show developers how to use a criteria form to limit the data that appears on an Access report. It is a common requirement that users can specify the criteria for a report at runtime. The easiest way to accomplish this is using a criteria form that a…

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now