Merge files with dos batch

Hi, I have 23 files that have the same header(the first line). I want to merge them together and only keep one header.
Each file's size is about 20MB. The command what I used was
copy chr1.assoc QT.assoc
for /L %%A in (2,1,23) do more +1 chr%%A.assoc >> QT.assoc

Open in new window

The question is it took a long time  still not finish. Even apending the second file, 12 hours passed no response. I run the batch on the server that has 32GB memory.
What is wrong?

Thanks for help.
zhshqzycAsked:
Who is Participating?
 
huacatCommented:
Hi guys,

please using type | find command to filter the file to a temporary file first, then using copy + command to concat all file, i test with a 10MB file, it's fast.

copy chr1.assoc QT.assoc
for /L %%A in (2,1,23) do (
  type chr%%A.assoc | find /v "header" >> qt.assoc
)

"header" is the file header of the fist line for each file.
And the file content should not contain the "head" lines, otherwise also be filter.
0
 
ReneGeCommented:
Try this

 
copy chr1.assoc QT.assoc
FOR /L %%A IN (2,1,23) DO CALL :ReadFile %%A

EXIT

:ReadFile
FOR /F "delims=" %%A IN ('type chr%~1.assoc') DO (
	ECHO %%A>>QT.assoc
	exit /b
)

Open in new window

0
 
knightEknightCommented:
At 20MB each, this script should not take that long.  What happens if you test it with just one file to the console?

copy chr1.assoc QT.assoc
for /L %%A in (2,1, 2 ) do more +1 chr%%A.assoc

Open in new window

0
Cloud Class® Course: Ruby Fundamentals

This course will introduce you to Ruby, as well as teach you about classes, methods, variables, data structures, loops, enumerable methods, and finishing touches.

 
zhshqzycAuthor Commented:
Did you remove the header in the remailing files?
0
 
knightEknightCommented:
on line 1: use copy/y instead of just copy
0
 
zhshqzycAuthor Commented:
copy chr1.assoc QT.assoc

Open in new window

Copying the first file is okay and just taking less than one second. BUT
for /L %%A in (2,1, 2 ) do more +1 chr%%A.assoc

Open in new window

No response at all.
0
 
zhshqzycAuthor Commented:
I may find the problem. I tested the code and attached files
copy chr1.txt QT.txt
for /L %%A in (2,1,3) do more +1 chr%%A.txt >> QT.txt
pause

Open in new window

The merge result became
header
test1	1test2   2
test3 3

Open in new window

The expected one should be
header
test1	1
test2   2
test3 3

Open in new window

chr1.txt
chr2.txt
chr3.txt
0
 
Gerwin Jansen, EE MVETopic Advisor Commented:
Nice puzzle :) - I created this batch file for you:

:: create file list
dir /b file*.txt >fl.txt

:: get first file - used for creating header
for /f %%f in (fl.txt) do (
  set fname=%%f
  goto HDR 	
)

:HDR
:: get header from first file
for /f %%f in (%fname%) do (
  echo %%f
  goto APD
) > output.txt

:APD
:: append content of files to output
for /f %%f in (fl.txt) do (
   more +1 %%f >> output.txt
)

Open in new window


Put your file pattern on line 2, I used file*.txt for my 3 test files file1.txt, file2.txt and file3.txt

I tested with these 3 files:

::file.txt
header
11
12
13

::file2.txt
header
21
22
23

:file3.txt
header
31
32
33

Output of batch file is this:

::output.txt
header
11
12
13
21
22
23
31
32
33
0
 
zhshqzycAuthor Commented:
The question is that the speed. Appending files is very very slow. I may consider to write a .net code to parse files. It might speed up.
0
 
ReneGeCommented:
Did not get any feedback about my script.
0
 
Gerwin Jansen, EE MVETopic Advisor Commented:
Why write code?

You could install Cygwin - using a simple shell script get performance figures like this:

20 files, 17.5Mb each merged (like you describe) in about 20 seconds:

$ date ; sh ./ccat.sh ; date
Sat May  7 23:02:37 WEDT 2011
Sat May  7 23:02:58 WEDT 2011

(ccat.sh is a simple shell script I wrote)
0
 
zhshqzycAuthor Commented:
@ReneGe
Your code is not working because of wrong result.

@gerwinjansen
I can't install Cygwin on the server because of permission. I guess that more command does cost time therefore it is slow.
0
 
Bill PrewCommented:
Would a VBS solution be acceptable, I suspect we could get a faster solution there.

~bp
0
 
zhshqzycAuthor Commented:
Okay. Thanks, the header in the files is a sentence rather than a word. Is that okay?
0
 
huacatCommented:
Also OK for a sentence in the batch file. Please notice:
If the sentence have some KEYWORD or KEY CHARs, I'm afraid we have to using escape char.
e.g. if the sentence include |, <, > and so on... , these chars should lead with ^ char.

Another issue:
If you run the command in the command line, it's diffcult to use the TAB char.
So I recommend you write a batch file, put these command into it and we can use the TAB char easily.
0
 
BillDLCommented:
Just another method to consider if the others don't do what you need.

Assuming that the following are all true:

1. All 23 files are in one folder
2. There are no other files in that folder
3. Their contents do not contain any fancy characters that cannot be read in as plain text and redirected to another text file

then the attached batch file should work in less than 12 hours if you just change the four SET= lines at the top to reflect the directory paths you want to use ;)

The comments and echo lines for screen feedback take up more room than the actual working code.

 
@echo off
SetLocal EnableDelayedExpansion

REM Replace paths in next 4 lines to reflect your paths.

set BaseDir=%~dp0
set BaseDir=%BaseDir:~0,-1%
set TempList=_FileList.txt
set OutFile=%BaseDir%\Concatenated.txt

echo Processing all TXT files in folder:
echo %BaseDir%
echo.

REM Create a list of all TXT files in BaseDir folder.
REM Note - this will overwrite any existing list file.
dir /on /b "%BaseDir%\*.txt" | find /i /v "%TempList%">"%TempList%"

REM Get the name of a single file to extract first line from
for /f %%A in ('type "%TempList%"') do set LastFile=%%~fA

REM Read the file in reverse and store top line in HDR variable
set /a i = -1
for /f "delims=" %%B in ('type "!LastFile!"') do (
    set /a i += 1
    set Line[!i!]=%%B
)
for /l %%C in (%i% -%i% 0) do set HDR=!Line[%%C]!

echo.
echo Creating %OutFile% with common header ...
echo.
echo.

REM Write String stored in HDR variable to new TXT file
echo !HDR!>"%OutFile%"

REM Read through all files in File Listing and Append
REM below the existing header in the new text file.
REM If you don't want a double-dashed separator between
REM the contents of each file appended, then remove the
REM 3rd and 4th lines below.
for /f "tokens=* delims=" %%D in ('type "%TempList%"') do (
    echo Writing contents of "%%D" to new file ...
    echo. >>"%OutFile%"
    echo ========================================================================>>"%OutFile%"
    type "%%~fD">>"%OutFile%"
)

REM Remove the File List
if exist "%TempList%" del "%TempList%" > nul

echo.
echo ===================================================
echo.
echo Finished processing.
echo.
echo Concatenated file is:
echo %OutFile%
echo.
echo Press any key to quit ... 
pause > nul

Open in new window

0
 
Gerwin Jansen, EE MVETopic Advisor Commented:
@huacat: Yes, type /f is faster :)

@zhshqzyc: You can combine my batchfile with huacat's type /f command.

change line 20
from:
   more +1 %%f >> output.txt
to:
   type %%f | find /v "%header%" >> output.txt

add a line after line 13
  set header=%%f

I tested, it takes about 10s per 17.5Mb file. Total time would be around 4 minutes. Take note of huacat's remarks about special characters in the header line of your files.
0
 
zhshqzycAuthor Commented:
@gerwinjansen,

Could you put entire code so it is clear?
Also can you add code to delete the file f1.txt after the job done?
0
 
Gerwin Jansen, EE MVETopic Advisor Commented:
Here it is, let me know if it works on your end.

:: create file list
dir /b test*.txt >fl.txt

:: get first file - used for creating header
for /f %%f in (fl.txt) do (
  set fname=%%f
  goto HDR 	
)

:HDR
:: get header from first file
for /f %%f in (%fname%) do (
  echo %%f
  set header=%%f
  goto APD
) > output.txt

:APD
:: append content of files to output
  for /f %%f in (fl.txt) do (
  type %%f | find /v "%header%" >> output.txt
)

del /q f1.txt

Open in new window

0
 
zhshqzycAuthor Commented:
Thanks for your effort, but it is incorrect. The header is
 CHR         SNP   N_MISS   N_GENO   F_MISS

Open in new window

The seperators are white spaces. Using the above code, I only got the header as
CHR

Open in new window

And also the program crashed after copying the fist file, that means copying the first file successful except the header and failed appending the second file(nothing appended then crashed).
0
 
ReneGeCommented:
Based on gerwinjansen's version

@echo off

SET Output=Output.txt
IF EXIST "%Output%" DEL "%Output%"

FOR /F %%A IN ('dir /b test*.txt') DO Call :GetHeader "%%~fA"
EXIT

:GetHeader
FOR /F "usebackq delims=" %%A IN ("%~1") DO (
	ECHO %%A>>Output.txt
	EXIT /b
)

Open in new window

0
 
ReneGeCommented:
Correction
 
@echo off

SET Output=Output.txt
IF EXIST "%Output%" DEL "%Output%"

FOR /F %%A IN ('dir /b test*.txt') DO Call :GetHeader "%%~fA"
EXIT

:GetHeader
FOR /F "usebackq delims=" %%A IN ("%~1") DO (
	ECHO %%A>>"%Output%"
	EXIT /b
)

Open in new window

0
 
ReneGeCommented:
@zhshqzyc:

So from reading your comment "07/05/11 11:20 AM, ID: 35712604"

I see that actually you want the second line to be sent to the output. Right?
Also, I see the word Header. Is it this the word you want to have there or it represents a common header line that you should find in all files?

The following will read the second line an put the word "header" in your output file.

@echo off

SET Output=Output.txt

ECHO HEADER>"%Output%"

FOR /F %%A IN ('dir /b chr*.txt') DO Call :GetHeader "%%~fA"
EXIT

:GetHeader
FOR /F "usebackq skip=1 delims=" %%A IN ("%~1") DO (
	ECHO %%A>>"%Output%"
	EXIT /b
)

Open in new window

0
 
zhshqzycAuthor Commented:
@ReneGe,

Yes, but the header is not always as the word "HEDAER". I hope that it can be read  by the code instead of manually setting up it.
0
 
ReneGeCommented:
Do they all have the same header?
0
 
ReneGeCommented:
Please give examples
0
 
zhshqzycAuthor Commented:
Please see the attached.
chr.zip
0
 
ReneGeCommented:
So I see they all have the same header

@echo off

SET Output=Output.txt

FOR /F %%A IN ('dir /b chr*.txt') DO Call :GetHeader "%%~fA"
FOR /F %%A IN ('dir /b chr*.txt') DO Call :GetFirstLines "%%~fA"
EXIT

:GetHeader
FOR /F "usebackq delims=" %%A IN ("%~1") DO (
	ECHO %%A>"%Output%"
	EXIT /b
)

:GetFirstLines
FOR /F "usebackq Skip=1 delims=" %%A IN ("%~1") DO (
	ECHO %%A>>"%Output%"
	EXIT /b
)

Open in new window

0
 
Gerwin Jansen, EE MVETopic Advisor Commented:
I believe a small modification to my script will make it work. Just add a 'delims=;' to the HDR for command. Just be sure the ; delimiter does not exist in the header line.

Like this:

:HDR
:: get header from first file
for /f "delims=;" %%f in (%fname%) do (
  echo %%f
  set header=%%f
  goto APD
) > output.txt

Open in new window


My test shows a complete header in the output file.
0
 
ReneGeCommented:
Even if my script resolves your issue, please split points with all contributing experts.
0
 
zhshqzycAuthor Commented:
Thanks for your input. It is still wrong though. I am going to give up and split points to everybody for your nice help.
Let me explain:
The sample file
chr1.lmiss:
 CHR         SNP   N_MISS   N_GENO   F_MISS
   1   rs4030303        0     2020        0
   1    rs940550        0     2020        0
   1   rs6594028        0     2020        0
   1  rs10458597       20     2020 0.009901
   1   rs9701055     1805     2020   0.8936
   1  rs12565286      562     2020   0.2782
   1  rs11804171      562     2020   0.2782
   1   rs2977670     1992     2020   0.9861

Open in new window

chr2.lmiss
 CHR          SNP   N_MISS   N_GENO   F_MISS
   2   rs11127467       62     2020  0.03069
   2   rs10193286       62     2020  0.03069
   2    rs4632379        7     2020 0.003465
   2    rs7595668       62     2020  0.03069
   2   rs10195681       62     2020  0.03069
   2   rs13386112       62     2020  0.03069
   2    rs7594188       62     2020  0.03069
   2    rs7594567       62     2020  0.03069
   2    rs6548217       10     2020  0.00495

Open in new window

I use the merge code:
@echo off

SET Output=QT.lmiss

FOR /F %%A IN ('dir /b chr*.lmiss') DO Call :GetHeader "%%~fA"
FOR /F %%A IN ('dir /b chr*.lmiss') DO Call :GetFirstLines "%%~fA"
pause

:GetHeader
FOR /F "usebackq delims=" %%A IN ("%~1") DO (
	ECHO %%A>"%Output%"
	EXIT /b
)

:GetFirstLines
FOR /F "usebackq Skip=1 delims=" %%A IN ("%~1") DO (
	ECHO %%A>>"%Output%"
	EXIT /b
)

Open in new window

The result is:
 CHR         SNP   N_MISS   N_GENO   F_MISS
  10   rs12218882       39     2020  0.01931
   4   rs4690249      193     2020  0.09554
   8   rs13276385      381     2020   0.1886
   5   rs10045830        3     2020 0.001485
  11   rs11605246        1     2020 0.000495
   9   rs2811026      506     2020   0.2505
  12   rs2003280       64     2020  0.03168
   6   rs7754266       13     2020 0.006436
   7   rs7457923      272     2020   0.1347
  13   rs2821685     2020     2020        1
  23   rs5939319        4     1999 0.002001
  18   rs7235612        0     2020        0
  14   rs2713521     2020     2020        1
  22   rs11089130     2020     2020        1
  19   rs7247199       10     2020  0.00495
  15   rs12443141     1950     2020   0.9653
   1   rs4030303        0     2020        0
  21   rs885550     2020     2020        1
   2   rs11127467       62     2020  0.03069
  16   rs3743872      163     2020  0.08069
  20   rs4814683       19     2020 0.009406
  17   rs17054921        3     2020 0.001485
   3   rs9756992        2     2020 0.0009901

Open in new window

I have 23 files, each file only one line was extracted. So it is wrong.
But it doesn't matter, I will try use a c# code to create a batch file.
Appreciate your guys.
0
 
zhshqzycAuthor Commented:
@gerwinjansen:
The code is still not working, never mind it. Thanks for help.
0
 
zhshqzycAuthor Commented:
THANKS!!!
0
 
Bill PrewCommented:
@zhshqzyc

Since you never answered me I assumed a VBS solution was not desired.

~bp
0
 
ReneGeCommented:
This is confusing.

So you want to have the content of all your files, but with only one header. Correct?

 
@ECHO OFF

SET Output=Output.txt

FOR /F %%A IN ('dir /b chr*.txt') DO Call :GetHeader "%%~fA"
FOR /F %%A IN ('dir /b chr*.txt') DO FOR /F "usebackq Skip=1 delims=" %%B IN ("%%A") DO ECHO %%B>>"%Output%"
EXIT

:GetHeader
FOR /F "usebackq delims=" %%A IN ("%~1") DO (
	ECHO %%A>"%Output%"
	EXIT /b
)

Open in new window

0
 
zhshqzycAuthor Commented:
@bp.
VBS is welcomed but I already assign points and I am not familar with it. Sorry about it, I forgot to answer your question. Do u mind my openeng a new thread?
0
 
zhshqzycAuthor Commented:
Opened a new thread at Merge files
0
 
huacatCommented:
copy chr1.assoc QT.assoc  
for /L %%A in (2,1,23) do type (%%A).assoc | find /v "CHR      SNP      N_MISS" >> qt.assoc

I put above code to a .bat file, create 23 files to test it, and it run it correctly.
Please remember, the char after CHR must be a TAB char if you header used TAB to seperator columns.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.