zhshqzyc
asked on
Merge files with dos batch
Hi, I have 23 files that have the same header(the first line). I want to merge them together and only keep one header.
Each file's size is about 20MB. The command what I used was
What is wrong?
Thanks for help.
Each file's size is about 20MB. The command what I used was
copy chr1.assoc QT.assoc
for /L %%A in (2,1,23) do more +1 chr%%A.assoc >> QT.assoc
The question is it took a long time still not finish. Even apending the second file, 12 hours passed no response. I run the batch on the server that has 32GB memory.What is wrong?
Thanks for help.
At 20MB each, this script should not take that long. What happens if you test it with just one file to the console?
copy chr1.assoc QT.assoc
for /L %%A in (2,1, 2 ) do more +1 chr%%A.assoc
ASKER
Did you remove the header in the remailing files?
on line 1: use copy/y instead of just copy
ASKER
copy chr1.assoc QT.assoc
Copying the first file is okay and just taking less than one second. BUTfor /L %%A in (2,1, 2 ) do more +1 chr%%A.assoc
No response at all.
ASKER
Nice puzzle :) - I created this batch file for you:
Put your file pattern on line 2, I used file*.txt for my 3 test files file1.txt, file2.txt and file3.txt
I tested with these 3 files:
::file.txt
header
11
12
13
::file2.txt
header
21
22
23
:file3.txt
header
31
32
33
Output of batch file is this:
::output.txt
header
11
12
13
21
22
23
31
32
33
:: create file list
dir /b file*.txt >fl.txt
:: get first file - used for creating header
for /f %%f in (fl.txt) do (
set fname=%%f
goto HDR
)
:HDR
:: get header from first file
for /f %%f in (%fname%) do (
echo %%f
goto APD
) > output.txt
:APD
:: append content of files to output
for /f %%f in (fl.txt) do (
more +1 %%f >> output.txt
)
Put your file pattern on line 2, I used file*.txt for my 3 test files file1.txt, file2.txt and file3.txt
I tested with these 3 files:
::file.txt
header
11
12
13
::file2.txt
header
21
22
23
:file3.txt
header
31
32
33
Output of batch file is this:
::output.txt
header
11
12
13
21
22
23
31
32
33
ASKER
The question is that the speed. Appending files is very very slow. I may consider to write a .net code to parse files. It might speed up.
Did not get any feedback about my script.
Why write code?
You could install Cygwin - using a simple shell script get performance figures like this:
20 files, 17.5Mb each merged (like you describe) in about 20 seconds:
$ date ; sh ./ccat.sh ; date
Sat May 7 23:02:37 WEDT 2011
Sat May 7 23:02:58 WEDT 2011
(ccat.sh is a simple shell script I wrote)
You could install Cygwin - using a simple shell script get performance figures like this:
20 files, 17.5Mb each merged (like you describe) in about 20 seconds:
$ date ; sh ./ccat.sh ; date
Sat May 7 23:02:37 WEDT 2011
Sat May 7 23:02:58 WEDT 2011
(ccat.sh is a simple shell script I wrote)
ASKER
@ReneGe
Your code is not working because of wrong result.
@gerwinjansen
I can't install Cygwin on the server because of permission. I guess that more command does cost time therefore it is slow.
Your code is not working because of wrong result.
@gerwinjansen
I can't install Cygwin on the server because of permission. I guess that more command does cost time therefore it is slow.
Would a VBS solution be acceptable, I suspect we could get a faster solution there.
~bp
~bp
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Okay. Thanks, the header in the files is a sentence rather than a word. Is that okay?
Also OK for a sentence in the batch file. Please notice:
If the sentence have some KEYWORD or KEY CHARs, I'm afraid we have to using escape char.
e.g. if the sentence include |, <, > and so on... , these chars should lead with ^ char.
Another issue:
If you run the command in the command line, it's diffcult to use the TAB char.
So I recommend you write a batch file, put these command into it and we can use the TAB char easily.
If the sentence have some KEYWORD or KEY CHARs, I'm afraid we have to using escape char.
e.g. if the sentence include |, <, > and so on... , these chars should lead with ^ char.
Another issue:
If you run the command in the command line, it's diffcult to use the TAB char.
So I recommend you write a batch file, put these command into it and we can use the TAB char easily.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
@huacat: Yes, type /f is faster :)
@zhshqzyc: You can combine my batchfile with huacat's type /f command.
change line 20
from:
more +1 %%f >> output.txt
to:
type %%f | find /v "%header%" >> output.txt
add a line after line 13
set header=%%f
I tested, it takes about 10s per 17.5Mb file. Total time would be around 4 minutes. Take note of huacat's remarks about special characters in the header line of your files.
@zhshqzyc: You can combine my batchfile with huacat's type /f command.
change line 20
from:
more +1 %%f >> output.txt
to:
type %%f | find /v "%header%" >> output.txt
add a line after line 13
set header=%%f
I tested, it takes about 10s per 17.5Mb file. Total time would be around 4 minutes. Take note of huacat's remarks about special characters in the header line of your files.
ASKER
@gerwinjansen,
Could you put entire code so it is clear?
Also can you add code to delete the file f1.txt after the job done?
Could you put entire code so it is clear?
Also can you add code to delete the file f1.txt after the job done?
Here it is, let me know if it works on your end.
:: create file list
dir /b test*.txt >fl.txt
:: get first file - used for creating header
for /f %%f in (fl.txt) do (
set fname=%%f
goto HDR
)
:HDR
:: get header from first file
for /f %%f in (%fname%) do (
echo %%f
set header=%%f
goto APD
) > output.txt
:APD
:: append content of files to output
for /f %%f in (fl.txt) do (
type %%f | find /v "%header%" >> output.txt
)
del /q f1.txt
ASKER
Thanks for your effort, but it is incorrect. The header is
CHR SNP N_MISS N_GENO F_MISS
The seperators are white spaces. Using the above code, I only got the header asCHR
And also the program crashed after copying the fist file, that means copying the first file successful except the header and failed appending the second file(nothing appended then crashed).
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Correction
@echo off
SET Output=Output.txt
IF EXIST "%Output%" DEL "%Output%"
FOR /F %%A IN ('dir /b test*.txt') DO Call :GetHeader "%%~fA"
EXIT
:GetHeader
FOR /F "usebackq delims=" %%A IN ("%~1") DO (
ECHO %%A>>"%Output%"
EXIT /b
)
@zhshqzyc:
So from reading your comment "07/05/11 11:20 AM, ID: 35712604"
I see that actually you want the second line to be sent to the output. Right?
Also, I see the word Header. Is it this the word you want to have there or it represents a common header line that you should find in all files?
The following will read the second line an put the word "header" in your output file.
So from reading your comment "07/05/11 11:20 AM, ID: 35712604"
I see that actually you want the second line to be sent to the output. Right?
Also, I see the word Header. Is it this the word you want to have there or it represents a common header line that you should find in all files?
The following will read the second line an put the word "header" in your output file.
@echo off
SET Output=Output.txt
ECHO HEADER>"%Output%"
FOR /F %%A IN ('dir /b chr*.txt') DO Call :GetHeader "%%~fA"
EXIT
:GetHeader
FOR /F "usebackq skip=1 delims=" %%A IN ("%~1") DO (
ECHO %%A>>"%Output%"
EXIT /b
)
ASKER
@ReneGe,
Yes, but the header is not always as the word "HEDAER". I hope that it can be read by the code instead of manually setting up it.
Yes, but the header is not always as the word "HEDAER". I hope that it can be read by the code instead of manually setting up it.
Do they all have the same header?
Please give examples
ASKER
Please see the attached.
chr.zip
chr.zip
So I see they all have the same header
@echo off
SET Output=Output.txt
FOR /F %%A IN ('dir /b chr*.txt') DO Call :GetHeader "%%~fA"
FOR /F %%A IN ('dir /b chr*.txt') DO Call :GetFirstLines "%%~fA"
EXIT
:GetHeader
FOR /F "usebackq delims=" %%A IN ("%~1") DO (
ECHO %%A>"%Output%"
EXIT /b
)
:GetFirstLines
FOR /F "usebackq Skip=1 delims=" %%A IN ("%~1") DO (
ECHO %%A>>"%Output%"
EXIT /b
)
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Even if my script resolves your issue, please split points with all contributing experts.
ASKER
Thanks for your input. It is still wrong though. I am going to give up and split points to everybody for your nice help.
Let me explain:
The sample file
chr1.lmiss:
But it doesn't matter, I will try use a c# code to create a batch file.
Appreciate your guys.
Let me explain:
The sample file
chr1.lmiss:
CHR SNP N_MISS N_GENO F_MISS
1 rs4030303 0 2020 0
1 rs940550 0 2020 0
1 rs6594028 0 2020 0
1 rs10458597 20 2020 0.009901
1 rs9701055 1805 2020 0.8936
1 rs12565286 562 2020 0.2782
1 rs11804171 562 2020 0.2782
1 rs2977670 1992 2020 0.9861
chr2.lmiss CHR SNP N_MISS N_GENO F_MISS
2 rs11127467 62 2020 0.03069
2 rs10193286 62 2020 0.03069
2 rs4632379 7 2020 0.003465
2 rs7595668 62 2020 0.03069
2 rs10195681 62 2020 0.03069
2 rs13386112 62 2020 0.03069
2 rs7594188 62 2020 0.03069
2 rs7594567 62 2020 0.03069
2 rs6548217 10 2020 0.00495
I use the merge code:@echo off
SET Output=QT.lmiss
FOR /F %%A IN ('dir /b chr*.lmiss') DO Call :GetHeader "%%~fA"
FOR /F %%A IN ('dir /b chr*.lmiss') DO Call :GetFirstLines "%%~fA"
pause
:GetHeader
FOR /F "usebackq delims=" %%A IN ("%~1") DO (
ECHO %%A>"%Output%"
EXIT /b
)
:GetFirstLines
FOR /F "usebackq Skip=1 delims=" %%A IN ("%~1") DO (
ECHO %%A>>"%Output%"
EXIT /b
)
The result is: CHR SNP N_MISS N_GENO F_MISS
10 rs12218882 39 2020 0.01931
4 rs4690249 193 2020 0.09554
8 rs13276385 381 2020 0.1886
5 rs10045830 3 2020 0.001485
11 rs11605246 1 2020 0.000495
9 rs2811026 506 2020 0.2505
12 rs2003280 64 2020 0.03168
6 rs7754266 13 2020 0.006436
7 rs7457923 272 2020 0.1347
13 rs2821685 2020 2020 1
23 rs5939319 4 1999 0.002001
18 rs7235612 0 2020 0
14 rs2713521 2020 2020 1
22 rs11089130 2020 2020 1
19 rs7247199 10 2020 0.00495
15 rs12443141 1950 2020 0.9653
1 rs4030303 0 2020 0
21 rs885550 2020 2020 1
2 rs11127467 62 2020 0.03069
16 rs3743872 163 2020 0.08069
20 rs4814683 19 2020 0.009406
17 rs17054921 3 2020 0.001485
3 rs9756992 2 2020 0.0009901
I have 23 files, each file only one line was extracted. So it is wrong.But it doesn't matter, I will try use a c# code to create a batch file.
Appreciate your guys.
ASKER
@gerwinjansen:
The code is still not working, never mind it. Thanks for help.
The code is still not working, never mind it. Thanks for help.
ASKER
THANKS!!!
@zhshqzyc
Since you never answered me I assumed a VBS solution was not desired.
~bp
Since you never answered me I assumed a VBS solution was not desired.
~bp
This is confusing.
So you want to have the content of all your files, but with only one header. Correct?
So you want to have the content of all your files, but with only one header. Correct?
@ECHO OFF
SET Output=Output.txt
FOR /F %%A IN ('dir /b chr*.txt') DO Call :GetHeader "%%~fA"
FOR /F %%A IN ('dir /b chr*.txt') DO FOR /F "usebackq Skip=1 delims=" %%B IN ("%%A") DO ECHO %%B>>"%Output%"
EXIT
:GetHeader
FOR /F "usebackq delims=" %%A IN ("%~1") DO (
ECHO %%A>"%Output%"
EXIT /b
)
ASKER
@bp.
VBS is welcomed but I already assign points and I am not familar with it. Sorry about it, I forgot to answer your question. Do u mind my openeng a new thread?
VBS is welcomed but I already assign points and I am not familar with it. Sorry about it, I forgot to answer your question. Do u mind my openeng a new thread?
ASKER
Opened a new thread at Merge files
copy chr1.assoc QT.assoc
for /L %%A in (2,1,23) do type (%%A).assoc | find /v "CHR SNP N_MISS" >> qt.assoc
I put above code to a .bat file, create 23 files to test it, and it run it correctly.
Please remember, the char after CHR must be a TAB char if you header used TAB to seperator columns.
for /L %%A in (2,1,23) do type (%%A).assoc | find /v "CHR SNP N_MISS" >> qt.assoc
I put above code to a .bat file, create 23 files to test it, and it run it correctly.
Please remember, the char after CHR must be a TAB char if you header used TAB to seperator columns.
Open in new window