Link to home
Start Free TrialLog in
Avatar of zhshqzyc
zhshqzyc

asked on

Merge files with VBS

The original question is here
The files that have the same header(the first line). I want to have the content of all files, but with only one header.
ReneGe's code is okay.
But I just worry the speed.
@ECHO OFF

SET Output=Output.txt

FOR /F %%A IN ('dir /b chr*.txt') DO Call :GetHeader "%%~fA"
FOR /F %%A IN ('dir /b chr*.txt') DO FOR /F "usebackq Skip=1 delims=" %%B IN ("%%A") DO ECHO %%B>>"%Output%"
EXIT

:GetHeader
FOR /F "usebackq delims=" %%A IN ("%~1") DO (
	ECHO %%A>"%Output%"
	EXIT /b
)

Open in new window

huacat's code
copy chr1.assoc QT.assoc
for /L %%A in (2,1,23) do (
  type chr%%A.assoc | find /v "header" >> qt.assoc
)

Open in new window

I tried it, but it only merged two files. Not sure why?
Thanks.
Avatar of ReneGe
ReneGe
Flag of Canada image

How much time it takes to run my script?
Avatar of zhshqzyc
zhshqzyc

ASKER

Almost 30 minutes passed, it processed 8210 kb roughly. About 5%.
Avatar of RobSampson
Hi, here's a VBS that should do the job, and also show you which file it's currently processing, so at least you know it's doing something.

Regards,

Rob.
' Specify the folder that contains the files
strDir = "C:\Files"
' Specify the extension of the files to be read
strExt = ".txt"
' Specify the output file, which should be in a separate folder
strOutput = "C:\Output.txt"

If LCase(Right(Wscript.FullName, 11)) = "wscript.exe" Then
    strPath = Wscript.ScriptFullName
    strCommand = "%comspec% /c cscript  """ & strPath & """"
    Set objShell = CreateObject("Wscript.Shell")
    objShell.Run(strCommand), 1, True
    Wscript.Quit
End If

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objOutput = objFSO.CreateTextFile(strOutput, True)
blnHeaderWritten = False
For Each objFile In objFSO.GetFolder(strDir).Files
	If Right(LCase(objFile.Name), Len(strExt)) = LCase(strExt) Then
		WScript.Echo "Processing " & objFile.Name & "..."
		Set objInput = objFSO.OpenTextFile(objFile.Path, 1, False)
		If blnHeaderWritten = False Then
			objOutput.WriteLine objInput.ReadAll
			blnHeaderWritten = True
		Else
			objInput.SkipLine
			objOutput.WriteLine objInput.ReadAll
		End If
		objInput.Close
	End If
Next
objOutput.Close
MsgBox "Done. Please see " & objOutput

Open in new window

Now, how much time it takes to run Rob's script?
Yes, that would certainly be interesting.  I can't really see that it would be much different....it all depends on the amount of the files, and the size of each.

Rob.
copy chr1.assoc QT.assoc  
for /L %%A in (2,1,23) do type (%%A).assoc | find /v "CHR      SNP      N_MISS" >> qt.assoc

I put above code to a .bat file, create 23 files to test it, and it run it correctly.
Please remember, the char after CHR must be a TAB char if you header used TAB to seperator columns.
I am new to VBS. Do I need install something to run VBS?
Copy ROB's script in a test file with a VBS extension.
Then, just run it by double-clicking on it.
Copy ROB's script in a text file with a VBS extension.
Then, just run it by double-clicking on it.
One more thing is that my copy order is  
chr1,chr2,chr3,..chr9,chr10,...chr20,chr21,chr22,chr23

Open in new window

However except huacat's code, the other codes's order is
chr10,...chr2,chr3,...chr9

Open in new window

I hope that somebody can modify it.
Based on huacat's script, are your file always going to be numbered from 2 to 23, or it will change?
Yes, it is always from 1 to 23,
@zhshqzyc
Regardless of the files order, have you had the chance to compare execution speed between my batch file and Rob's VBS?

@RobSampson
Could you please change your script so it reads files from 1 to 23 like:[for /L %%A in (1,1,23) DO ...]

Yes.
@RobSampson
Your code is fast, only took 5 minutes. But the copying order is wild.
10,4,8,5,11,9,12,6,7,13,23,18,14,...

Open in new window


@ReneGe,
Thanks for your help.

@huacat,
Maybe your code is the fastest, but I need manual to input the header.
ASKER CERTIFIED SOLUTION
Avatar of RobSampson
RobSampson
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Okay. Many thanks.
@ReneGe:
I am not sure that can you combine your code with huacat's one?
First getting the header with your code, then using find /v etc...
I would doubt that combining my code with huacat's code would increase performance, compared with Rob's script.

That is because huacat's script requires more processing to be achieved then my script. And since you experienced a major emprovment with Rob's script, my conclusion is obvious.

So, tell us, how did Rob's script performed?

Cheers,
Rene
Rob's code is great!

Hi Zhshqzyc,

If you wan't input the header, so easy:

for /f "delims=" %%i in (chr1.assoc) do (set header=%%i)&(goto :next)
:next
copy chr1.assoc QT.assoc  
for /L %%A in (2,1,23) do type chr%%A.assoc | find /v "%header%" >> qt.assoc

Open in new window

Speed of light

(Please make sure you split points with all, and according to their contribution/effort.)

@ECHO OFF

SET Output=Output.txt

FOR /F %%A IN ('dir /b chr1.txt') DO Call :GetHeader "%%~fA"
FOR /L %%A IN (1,1,23) DO IF EXIST chr%%A.txt (
	ECHO EXTRACTING: "chr%%A.txt"
	FINDSTR -v "%Header%" "chr%%A.txt">>"%Output%"
)
pause
EXIT

:GetHeader
FOR /F "usebackq delims=" %%A IN ("%~1") DO (
	SET Header=%%A
	ECHO %%A>"%Output%"
	EXIT /b
)

Open in new window

SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
@huacat

I uploaded three large files at SkyDrive.
Testing it with your code but no lucky. Only two files merged.
Thanks for help.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks.
Found that to.

I also found that we do not find anywhere in the file the caracter sets found in the header.

Also, since the header is always the same, I just changed the Findstr string with a header element.

 
@echo off
SET Output=Output.txt
for /f "delims=" %%i in (chr1.lmiss) do (set header=%%i)&(goto :next)
:next
ECHO %Header%>"%Output%"
for /L %%A in (1,1,23) do IF EXIST "chr%%A.lmiss" (
	ECHO EXTRACTING: "chr%%A.lmiss"
	FINDSTR -v "F_MISS" "chr%%A.lmiss">>"%Output%"
)
PAUSE

Open in new window

Yes. By the way, Rob's code has a little error? He replaced the header with a blank row?
There is a blank row between file 1 and file 2.
Have you tried my last script version?
@ReneGe:
I tried it, it is great!
Regards.
Glad I could help.
Thanks for the grade.  The blank row between file1 and file2 is mostly likely that there might be a blank line at the end of file1?

Otherwise, to be able to remove that, we'd need to do some more string processing, which would slow it down over such large files....

Rob.