Link to home
Start Free TrialLog in
Avatar of zhshqzyc

asked on

Merge files with VBS

The original question is here
The files that have the same header(the first line). I want to have the content of all files, but with only one header.
ReneGe's code is okay.
But I just worry the speed.

SET Output=Output.txt

FOR /F %%A IN ('dir /b chr*.txt') DO Call :GetHeader "%%~fA"
FOR /F %%A IN ('dir /b chr*.txt') DO FOR /F "usebackq Skip=1 delims=" %%B IN ("%%A") DO ECHO %%B>>"%Output%"

FOR /F "usebackq delims=" %%A IN ("%~1") DO (
	ECHO %%A>"%Output%"
	EXIT /b

Open in new window

huacat's code
copy chr1.assoc QT.assoc
for /L %%A in (2,1,23) do (
  type chr%%A.assoc | find /v "header" >> qt.assoc

Open in new window

I tried it, but it only merged two files. Not sure why?
Avatar of ReneGe
Flag of Canada image

How much time it takes to run my script?
Avatar of zhshqzyc


Almost 30 minutes passed, it processed 8210 kb roughly. About 5%.
Avatar of RobSampson
Hi, here's a VBS that should do the job, and also show you which file it's currently processing, so at least you know it's doing something.


' Specify the folder that contains the files
strDir = "C:\Files"
' Specify the extension of the files to be read
strExt = ".txt"
' Specify the output file, which should be in a separate folder
strOutput = "C:\Output.txt"

If LCase(Right(Wscript.FullName, 11)) = "wscript.exe" Then
    strPath = Wscript.ScriptFullName
    strCommand = "%comspec% /c cscript  """ & strPath & """"
    Set objShell = CreateObject("Wscript.Shell")
    objShell.Run(strCommand), 1, True
End If

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objOutput = objFSO.CreateTextFile(strOutput, True)
blnHeaderWritten = False
For Each objFile In objFSO.GetFolder(strDir).Files
	If Right(LCase(objFile.Name), Len(strExt)) = LCase(strExt) Then
		WScript.Echo "Processing " & objFile.Name & "..."
		Set objInput = objFSO.OpenTextFile(objFile.Path, 1, False)
		If blnHeaderWritten = False Then
			objOutput.WriteLine objInput.ReadAll
			blnHeaderWritten = True
			objOutput.WriteLine objInput.ReadAll
		End If
	End If
MsgBox "Done. Please see " & objOutput

Open in new window

Now, how much time it takes to run Rob's script?
Yes, that would certainly be interesting.  I can't really see that it would be much all depends on the amount of the files, and the size of each.

copy chr1.assoc QT.assoc  
for /L %%A in (2,1,23) do type (%%A).assoc | find /v "CHR      SNP      N_MISS" >> qt.assoc

I put above code to a .bat file, create 23 files to test it, and it run it correctly.
Please remember, the char after CHR must be a TAB char if you header used TAB to seperator columns.
I am new to VBS. Do I need install something to run VBS?
Copy ROB's script in a test file with a VBS extension.
Then, just run it by double-clicking on it.
Copy ROB's script in a text file with a VBS extension.
Then, just run it by double-clicking on it.
One more thing is that my copy order is  

Open in new window

However except huacat's code, the other codes's order is

Open in new window

I hope that somebody can modify it.
Based on huacat's script, are your file always going to be numbered from 2 to 23, or it will change?
Yes, it is always from 1 to 23,
Regardless of the files order, have you had the chance to compare execution speed between my batch file and Rob's VBS?

Could you please change your script so it reads files from 1 to 23 like:[for /L %%A in (1,1,23) DO ...]

Your code is fast, only took 5 minutes. But the copying order is wild.

Open in new window

Thanks for your help.

Maybe your code is the fastest, but I need manual to input the header.
Avatar of RobSampson
Flag of Australia image

Link to home
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Okay. Many thanks.
I am not sure that can you combine your code with huacat's one?
First getting the header with your code, then using find /v etc...
I would doubt that combining my code with huacat's code would increase performance, compared with Rob's script.

That is because huacat's script requires more processing to be achieved then my script. And since you experienced a major emprovment with Rob's script, my conclusion is obvious.

So, tell us, how did Rob's script performed?

Rob's code is great!

Hi Zhshqzyc,

If you wan't input the header, so easy:

for /f "delims=" %%i in (chr1.assoc) do (set header=%%i)&(goto :next)
copy chr1.assoc QT.assoc  
for /L %%A in (2,1,23) do type chr%%A.assoc | find /v "%header%" >> qt.assoc

Open in new window

Speed of light

(Please make sure you split points with all, and according to their contribution/effort.)


SET Output=Output.txt

FOR /F %%A IN ('dir /b chr1.txt') DO Call :GetHeader "%%~fA"
FOR /L %%A IN (1,1,23) DO IF EXIST chr%%A.txt (
	ECHO EXTRACTING: "chr%%A.txt"
	FINDSTR -v "%Header%" "chr%%A.txt">>"%Output%"

FOR /F "usebackq delims=" %%A IN ("%~1") DO (
	SET Header=%%A
	ECHO %%A>"%Output%"
	EXIT /b

Open in new window

Link to home
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial

I uploaded three large files at SkyDrive.
Testing it with your code but no lucky. Only two files merged.
Thanks for help.
Link to home
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Found that to.

I also found that we do not find anywhere in the file the caracter sets found in the header.

Also, since the header is always the same, I just changed the Findstr string with a header element.

@echo off
SET Output=Output.txt
for /f "delims=" %%i in (chr1.lmiss) do (set header=%%i)&(goto :next)
ECHO %Header%>"%Output%"
for /L %%A in (1,1,23) do IF EXIST "chr%%A.lmiss" (
	ECHO EXTRACTING: "chr%%A.lmiss"
	FINDSTR -v "F_MISS" "chr%%A.lmiss">>"%Output%"

Open in new window

Yes. By the way, Rob's code has a little error? He replaced the header with a blank row?
There is a blank row between file 1 and file 2.
Have you tried my last script version?
I tried it, it is great!
Glad I could help.
Thanks for the grade.  The blank row between file1 and file2 is mostly likely that there might be a blank line at the end of file1?

Otherwise, to be able to remove that, we'd need to do some more string processing, which would slow it down over such large files....
