?
Solved

Merge files with dos batch

Posted on 2011-05-07
38
Medium Priority
?
995 Views
Last Modified: 2012-08-14
Hi, I have 23 files that have the same header(the first line). I want to merge them together and only keep one header.
Each file's size is about 20MB. The command what I used was
copy chr1.assoc QT.assoc
for /L %%A in (2,1,23) do more +1 chr%%A.assoc >> QT.assoc

Open in new window

The question is it took a long time  still not finish. Even apending the second file, 12 hours passed no response. I run the batch on the server that has 32GB memory.
What is wrong?

Thanks for help.
0
Comment
Question by:zhshqzyc
  • 15
  • 10
  • 5
  • +4
38 Comments
 
LVL 10

Expert Comment

by:ReneGe
ID: 35712022
Try this

 
copy chr1.assoc QT.assoc
FOR /L %%A IN (2,1,23) DO CALL :ReadFile %%A

EXIT

:ReadFile
FOR /F "delims=" %%A IN ('type chr%~1.assoc') DO (
	ECHO %%A>>QT.assoc
	exit /b
)

Open in new window

0
 
LVL 33

Expert Comment

by:knightEknight
ID: 35712048
At 20MB each, this script should not take that long.  What happens if you test it with just one file to the console?

copy chr1.assoc QT.assoc
for /L %%A in (2,1, 2 ) do more +1 chr%%A.assoc

Open in new window

0
 

Author Comment

by:zhshqzyc
ID: 35712049
Did you remove the header in the remailing files?
0
Veeam and MySQL: How to Perform Backup & Recovery

MySQL and the MariaDB variant are among the most used databases in Linux environments, and many critical applications support their data on them. Watch this recorded webinar to find out how Veeam Backup & Replication allows you to get consistent backups of MySQL databases.

 
LVL 33

Expert Comment

by:knightEknight
ID: 35712054
on line 1: use copy/y instead of just copy
0
 

Author Comment

by:zhshqzyc
ID: 35712129
copy chr1.assoc QT.assoc

Open in new window

Copying the first file is okay and just taking less than one second. BUT
for /L %%A in (2,1, 2 ) do more +1 chr%%A.assoc

Open in new window

No response at all.
0
 

Author Comment

by:zhshqzyc
ID: 35712604
I may find the problem. I tested the code and attached files
copy chr1.txt QT.txt
for /L %%A in (2,1,3) do more +1 chr%%A.txt >> QT.txt
pause

Open in new window

The merge result became
header
test1	1test2   2
test3 3

Open in new window

The expected one should be
header
test1	1
test2   2
test3 3

Open in new window

chr1.txt
chr2.txt
chr3.txt
0
 
LVL 38

Expert Comment

by:Gerwin Jansen, EE MVE
ID: 35713167
Nice puzzle :) - I created this batch file for you:

:: create file list
dir /b file*.txt >fl.txt

:: get first file - used for creating header
for /f %%f in (fl.txt) do (
  set fname=%%f
  goto HDR 	
)

:HDR
:: get header from first file
for /f %%f in (%fname%) do (
  echo %%f
  goto APD
) > output.txt

:APD
:: append content of files to output
for /f %%f in (fl.txt) do (
   more +1 %%f >> output.txt
)

Open in new window


Put your file pattern on line 2, I used file*.txt for my 3 test files file1.txt, file2.txt and file3.txt

I tested with these 3 files:

::file.txt
header
11
12
13

::file2.txt
header
21
22
23

:file3.txt
header
31
32
33

Output of batch file is this:

::output.txt
header
11
12
13
21
22
23
31
32
33
0
 

Author Comment

by:zhshqzyc
ID: 35713241
The question is that the speed. Appending files is very very slow. I may consider to write a .net code to parse files. It might speed up.
0
 
LVL 10

Expert Comment

by:ReneGe
ID: 35713447
Did not get any feedback about my script.
0
 
LVL 38

Expert Comment

by:Gerwin Jansen, EE MVE
ID: 35713455
Why write code?

You could install Cygwin - using a simple shell script get performance figures like this:

20 files, 17.5Mb each merged (like you describe) in about 20 seconds:

$ date ; sh ./ccat.sh ; date
Sat May  7 23:02:37 WEDT 2011
Sat May  7 23:02:58 WEDT 2011

(ccat.sh is a simple shell script I wrote)
0
 

Author Comment

by:zhshqzyc
ID: 35713818
@ReneGe
Your code is not working because of wrong result.

@gerwinjansen
I can't install Cygwin on the server because of permission. I guess that more command does cost time therefore it is slow.
0
 
LVL 59

Expert Comment

by:Bill Prew
ID: 35714099
Would a VBS solution be acceptable, I suspect we could get a faster solution there.

~bp
0
 
LVL 7

Accepted Solution

by:
huacat earned 500 total points
ID: 35714134
Hi guys,

please using type | find command to filter the file to a temporary file first, then using copy + command to concat all file, i test with a 10MB file, it's fast.

copy chr1.assoc QT.assoc
for /L %%A in (2,1,23) do (
  type chr%%A.assoc | find /v "header" >> qt.assoc
)

"header" is the file header of the fist line for each file.
And the file content should not contain the "head" lines, otherwise also be filter.
0
 

Author Comment

by:zhshqzyc
ID: 35714189
Okay. Thanks, the header in the files is a sentence rather than a word. Is that okay?
0
 
LVL 7

Expert Comment

by:huacat
ID: 35714360
Also OK for a sentence in the batch file. Please notice:
If the sentence have some KEYWORD or KEY CHARs, I'm afraid we have to using escape char.
e.g. if the sentence include |, <, > and so on... , these chars should lead with ^ char.

Another issue:
If you run the command in the command line, it's diffcult to use the TAB char.
So I recommend you write a batch file, put these command into it and we can use the TAB char easily.
0
 
LVL 39

Assisted Solution

by:BillDL
BillDL earned 500 total points
ID: 35714493
Just another method to consider if the others don't do what you need.

Assuming that the following are all true:

1. All 23 files are in one folder
2. There are no other files in that folder
3. Their contents do not contain any fancy characters that cannot be read in as plain text and redirected to another text file

then the attached batch file should work in less than 12 hours if you just change the four SET= lines at the top to reflect the directory paths you want to use ;)

The comments and echo lines for screen feedback take up more room than the actual working code.

 
@echo off
SetLocal EnableDelayedExpansion

REM Replace paths in next 4 lines to reflect your paths.

set BaseDir=%~dp0
set BaseDir=%BaseDir:~0,-1%
set TempList=_FileList.txt
set OutFile=%BaseDir%\Concatenated.txt

echo Processing all TXT files in folder:
echo %BaseDir%
echo.

REM Create a list of all TXT files in BaseDir folder.
REM Note - this will overwrite any existing list file.
dir /on /b "%BaseDir%\*.txt" | find /i /v "%TempList%">"%TempList%"

REM Get the name of a single file to extract first line from
for /f %%A in ('type "%TempList%"') do set LastFile=%%~fA

REM Read the file in reverse and store top line in HDR variable
set /a i = -1
for /f "delims=" %%B in ('type "!LastFile!"') do (
    set /a i += 1
    set Line[!i!]=%%B
)
for /l %%C in (%i% -%i% 0) do set HDR=!Line[%%C]!

echo.
echo Creating %OutFile% with common header ...
echo.
echo.

REM Write String stored in HDR variable to new TXT file
echo !HDR!>"%OutFile%"

REM Read through all files in File Listing and Append
REM below the existing header in the new text file.
REM If you don't want a double-dashed separator between
REM the contents of each file appended, then remove the
REM 3rd and 4th lines below.
for /f "tokens=* delims=" %%D in ('type "%TempList%"') do (
    echo Writing contents of "%%D" to new file ...
    echo. >>"%OutFile%"
    echo ========================================================================>>"%OutFile%"
    type "%%~fD">>"%OutFile%"
)

REM Remove the File List
if exist "%TempList%" del "%TempList%" > nul

echo.
echo ===================================================
echo.
echo Finished processing.
echo.
echo Concatenated file is:
echo %OutFile%
echo.
echo Press any key to quit ... 
pause > nul

Open in new window

0
 
LVL 38

Expert Comment

by:Gerwin Jansen, EE MVE
ID: 35714567
@huacat: Yes, type /f is faster :)

@zhshqzyc: You can combine my batchfile with huacat's type /f command.

change line 20
from:
   more +1 %%f >> output.txt
to:
   type %%f | find /v "%header%" >> output.txt

add a line after line 13
  set header=%%f

I tested, it takes about 10s per 17.5Mb file. Total time would be around 4 minutes. Take note of huacat's remarks about special characters in the header line of your files.
0
 

Author Comment

by:zhshqzyc
ID: 35715346
@gerwinjansen,

Could you put entire code so it is clear?
Also can you add code to delete the file f1.txt after the job done?
0
 
LVL 38

Expert Comment

by:Gerwin Jansen, EE MVE
ID: 35715826
Here it is, let me know if it works on your end.

:: create file list
dir /b test*.txt >fl.txt

:: get first file - used for creating header
for /f %%f in (fl.txt) do (
  set fname=%%f
  goto HDR 	
)

:HDR
:: get header from first file
for /f %%f in (%fname%) do (
  echo %%f
  set header=%%f
  goto APD
) > output.txt

:APD
:: append content of files to output
  for /f %%f in (fl.txt) do (
  type %%f | find /v "%header%" >> output.txt
)

del /q f1.txt

Open in new window

0
 

Author Comment

by:zhshqzyc
ID: 35716155
Thanks for your effort, but it is incorrect. The header is
 CHR         SNP   N_MISS   N_GENO   F_MISS

Open in new window

The seperators are white spaces. Using the above code, I only got the header as
CHR

Open in new window

And also the program crashed after copying the fist file, that means copying the first file successful except the header and failed appending the second file(nothing appended then crashed).
0
 
LVL 10

Assisted Solution

by:ReneGe
ReneGe earned 500 total points
ID: 35716249
Based on gerwinjansen's version

@echo off

SET Output=Output.txt
IF EXIST "%Output%" DEL "%Output%"

FOR /F %%A IN ('dir /b test*.txt') DO Call :GetHeader "%%~fA"
EXIT

:GetHeader
FOR /F "usebackq delims=" %%A IN ("%~1") DO (
	ECHO %%A>>Output.txt
	EXIT /b
)

Open in new window

0
 
LVL 10

Expert Comment

by:ReneGe
ID: 35716256
Correction
 
@echo off

SET Output=Output.txt
IF EXIST "%Output%" DEL "%Output%"

FOR /F %%A IN ('dir /b test*.txt') DO Call :GetHeader "%%~fA"
EXIT

:GetHeader
FOR /F "usebackq delims=" %%A IN ("%~1") DO (
	ECHO %%A>>"%Output%"
	EXIT /b
)

Open in new window

0
 
LVL 10

Expert Comment

by:ReneGe
ID: 35716275
@zhshqzyc:

So from reading your comment "07/05/11 11:20 AM, ID: 35712604"

I see that actually you want the second line to be sent to the output. Right?
Also, I see the word Header. Is it this the word you want to have there or it represents a common header line that you should find in all files?

The following will read the second line an put the word "header" in your output file.

@echo off

SET Output=Output.txt

ECHO HEADER>"%Output%"

FOR /F %%A IN ('dir /b chr*.txt') DO Call :GetHeader "%%~fA"
EXIT

:GetHeader
FOR /F "usebackq skip=1 delims=" %%A IN ("%~1") DO (
	ECHO %%A>>"%Output%"
	EXIT /b
)

Open in new window

0
 

Author Comment

by:zhshqzyc
ID: 35716325
@ReneGe,

Yes, but the header is not always as the word "HEDAER". I hope that it can be read  by the code instead of manually setting up it.
0
 
LVL 10

Expert Comment

by:ReneGe
ID: 35716442
Do they all have the same header?
0
 
LVL 10

Expert Comment

by:ReneGe
ID: 35716447
Please give examples
0
 

Author Comment

by:zhshqzyc
ID: 35716470
Please see the attached.
chr.zip
0
 
LVL 10

Expert Comment

by:ReneGe
ID: 35716576
So I see they all have the same header

@echo off

SET Output=Output.txt

FOR /F %%A IN ('dir /b chr*.txt') DO Call :GetHeader "%%~fA"
FOR /F %%A IN ('dir /b chr*.txt') DO Call :GetFirstLines "%%~fA"
EXIT

:GetHeader
FOR /F "usebackq delims=" %%A IN ("%~1") DO (
	ECHO %%A>"%Output%"
	EXIT /b
)

:GetFirstLines
FOR /F "usebackq Skip=1 delims=" %%A IN ("%~1") DO (
	ECHO %%A>>"%Output%"
	EXIT /b
)

Open in new window

0
 
LVL 38

Assisted Solution

by:Gerwin Jansen, EE MVE
Gerwin Jansen, EE MVE earned 500 total points
ID: 35716580
I believe a small modification to my script will make it work. Just add a 'delims=;' to the HDR for command. Just be sure the ; delimiter does not exist in the header line.

Like this:

:HDR
:: get header from first file
for /f "delims=;" %%f in (%fname%) do (
  echo %%f
  set header=%%f
  goto APD
) > output.txt

Open in new window


My test shows a complete header in the output file.
0
 
LVL 10

Expert Comment

by:ReneGe
ID: 35716582
Even if my script resolves your issue, please split points with all contributing experts.
0
 

Author Comment

by:zhshqzyc
ID: 35716619
Thanks for your input. It is still wrong though. I am going to give up and split points to everybody for your nice help.
Let me explain:
The sample file
chr1.lmiss:
 CHR         SNP   N_MISS   N_GENO   F_MISS
   1   rs4030303        0     2020        0
   1    rs940550        0     2020        0
   1   rs6594028        0     2020        0
   1  rs10458597       20     2020 0.009901
   1   rs9701055     1805     2020   0.8936
   1  rs12565286      562     2020   0.2782
   1  rs11804171      562     2020   0.2782
   1   rs2977670     1992     2020   0.9861

Open in new window

chr2.lmiss
 CHR          SNP   N_MISS   N_GENO   F_MISS
   2   rs11127467       62     2020  0.03069
   2   rs10193286       62     2020  0.03069
   2    rs4632379        7     2020 0.003465
   2    rs7595668       62     2020  0.03069
   2   rs10195681       62     2020  0.03069
   2   rs13386112       62     2020  0.03069
   2    rs7594188       62     2020  0.03069
   2    rs7594567       62     2020  0.03069
   2    rs6548217       10     2020  0.00495

Open in new window

I use the merge code:
@echo off

SET Output=QT.lmiss

FOR /F %%A IN ('dir /b chr*.lmiss') DO Call :GetHeader "%%~fA"
FOR /F %%A IN ('dir /b chr*.lmiss') DO Call :GetFirstLines "%%~fA"
pause

:GetHeader
FOR /F "usebackq delims=" %%A IN ("%~1") DO (
	ECHO %%A>"%Output%"
	EXIT /b
)

:GetFirstLines
FOR /F "usebackq Skip=1 delims=" %%A IN ("%~1") DO (
	ECHO %%A>>"%Output%"
	EXIT /b
)

Open in new window

The result is:
 CHR         SNP   N_MISS   N_GENO   F_MISS
  10   rs12218882       39     2020  0.01931
   4   rs4690249      193     2020  0.09554
   8   rs13276385      381     2020   0.1886
   5   rs10045830        3     2020 0.001485
  11   rs11605246        1     2020 0.000495
   9   rs2811026      506     2020   0.2505
  12   rs2003280       64     2020  0.03168
   6   rs7754266       13     2020 0.006436
   7   rs7457923      272     2020   0.1347
  13   rs2821685     2020     2020        1
  23   rs5939319        4     1999 0.002001
  18   rs7235612        0     2020        0
  14   rs2713521     2020     2020        1
  22   rs11089130     2020     2020        1
  19   rs7247199       10     2020  0.00495
  15   rs12443141     1950     2020   0.9653
   1   rs4030303        0     2020        0
  21   rs885550     2020     2020        1
   2   rs11127467       62     2020  0.03069
  16   rs3743872      163     2020  0.08069
  20   rs4814683       19     2020 0.009406
  17   rs17054921        3     2020 0.001485
   3   rs9756992        2     2020 0.0009901

Open in new window

I have 23 files, each file only one line was extracted. So it is wrong.
But it doesn't matter, I will try use a c# code to create a batch file.
Appreciate your guys.
0
 

Author Comment

by:zhshqzyc
ID: 35716636
@gerwinjansen:
The code is still not working, never mind it. Thanks for help.
0
 

Author Closing Comment

by:zhshqzyc
ID: 35716646
THANKS!!!
0
 
LVL 59

Expert Comment

by:Bill Prew
ID: 35716650
@zhshqzyc

Since you never answered me I assumed a VBS solution was not desired.

~bp
0
 
LVL 10

Expert Comment

by:ReneGe
ID: 35716700
This is confusing.

So you want to have the content of all your files, but with only one header. Correct?

 
@ECHO OFF

SET Output=Output.txt

FOR /F %%A IN ('dir /b chr*.txt') DO Call :GetHeader "%%~fA"
FOR /F %%A IN ('dir /b chr*.txt') DO FOR /F "usebackq Skip=1 delims=" %%B IN ("%%A") DO ECHO %%B>>"%Output%"
EXIT

:GetHeader
FOR /F "usebackq delims=" %%A IN ("%~1") DO (
	ECHO %%A>"%Output%"
	EXIT /b
)

Open in new window

0
 

Author Comment

by:zhshqzyc
ID: 35716701
@bp.
VBS is welcomed but I already assign points and I am not familar with it. Sorry about it, I forgot to answer your question. Do u mind my openeng a new thread?
0
 

Author Comment

by:zhshqzyc
ID: 35716771
Opened a new thread at Merge files
0
 
LVL 7

Expert Comment

by:huacat
ID: 35717233
copy chr1.assoc QT.assoc  
for /L %%A in (2,1,23) do type (%%A).assoc | find /v "CHR      SNP      N_MISS" >> qt.assoc

I put above code to a .bat file, create 23 files to test it, and it run it correctly.
Please remember, the char after CHR must be a TAB char if you header used TAB to seperator columns.
0

Featured Post

Restore individual SQL databases with ease

Veeam Explorer for Microsoft SQL Server delivers an easy-to-use, wizard-driven interface for restoring your databases from a backup. No expert SQL background required. Web interface provides a complete view of all available SQL databases to simplify the recovery of lost database

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

YESTERDAY YESTERDAY.BAT is inspired by a previous article I wrote entitled: TOMORROW.BAT (http://www.experts-exchange.com/OS/Microsoft_Operating_Systems/MS_DOS/A_4196-Advanced-Batch-File-Programming-TOMORROW-BAT.html). The crux of this batch f…
Being a system administrator some time we require to do things remotely, one of them is installing software. Here I am going to tell you how to install software through wmic (Windows management instrument console). I am not at all saying that this i…
Are you ready to place your question in front of subject-matter experts for more timely responses? With the release of Priority Question, Premium Members, Team Accounts and Qualified Experts can now identify the emergent level of their issue, signal…
In a question here at Experts Exchange (https://www.experts-exchange.com/questions/29062564/Adobe-acrobat-reader-DC.html), a member asked how to create a signature in Adobe Acrobat Reader DC (the free Reader product, not the paid, full Acrobat produ…
Suggested Courses

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question