LuckyLucks
asked on
Looking up files in folder A against folder B using lookup text file
Hi EEE,
I have two folders on my hard drive A and B. I want to compare all files in B to A. Any file thats in B and not in A, shall be copied into a third folder C. All this to be done in the batch. I want to keep A , B and C as variables so set A=C:/A for example.
One little twist is that I can not compare all of B/*.* to A/*.* using the filenames as is. For that I need a lookup file.
So, under B, a file called 1.doc is the same as myfile1.doc in A.
B/1.doc = A/myfile1.doc
The lookup file has this information in the following format:
"B:\1.doc","myfile1.doc",< some string>,<some string>
How can I do this? Part of the code is placed below but needs to be augmented for the lookup file and correct picking of filename in A.
@echo off
setlocal enabledelayedexpansion
set SourceA=Z:\A
set SourceB=Z:\B
set Target=Z:\Archive
if exist "%Target%" del "%Target%"
for %%a in ("%SourceB%\*.*") do (
echo Processing '%%~nxa' ...
if not exist "%SourceA%\%%~nxa" (
copy "%%a" "%Target%"
)
)
echo ... done.
I have two folders on my hard drive A and B. I want to compare all files in B to A. Any file thats in B and not in A, shall be copied into a third folder C. All this to be done in the batch. I want to keep A , B and C as variables so set A=C:/A for example.
One little twist is that I can not compare all of B/*.* to A/*.* using the filenames as is. For that I need a lookup file.
So, under B, a file called 1.doc is the same as myfile1.doc in A.
B/1.doc = A/myfile1.doc
The lookup file has this information in the following format:
"B:\1.doc","myfile1.doc",<
How can I do this? Part of the code is placed below but needs to be augmented for the lookup file and correct picking of filename in A.
@echo off
setlocal enabledelayedexpansion
set SourceA=Z:\A
set SourceB=Z:\B
set Target=Z:\Archive
if exist "%Target%" del "%Target%"
for %%a in ("%SourceB%\*.*") do (
echo Processing '%%~nxa' ...
if not exist "%SourceA%\%%~nxa" (
copy "%%a" "%Target%"
)
)
echo ... done.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
oBDA: here are my responses:
1. The lookup file contains 200K records and is approx 30 MB.
2. The look up file name in the lookup file is only to be used to match.
3. I expect there will be stuff in B that has no entry in the lookup file. These are not matched to A. They are the odd ones out that we will stick into Archive.
1. The lookup file contains 200K records and is approx 30 MB.
2. The look up file name in the lookup file is only to be used to match.
3. I expect there will be stuff in B that has no entry in the lookup file. These are not matched to A. They are the odd ones out that we will stick into Archive.
ASKER
Also in response to:
And what's the exact format of the lookup file? Does the first column ("B:\1.doc") contain the full path to the document, or just the relative path (the B folder in this case)?
The first column (before ,) is the full path including filename of the file that will be matched againt.
The second column is the file name only. It will exist in SourceA.
And what's the exact format of the lookup file? Does the first column ("B:\1.doc") contain the full path to the document, or just the relative path (the B folder in this case)?
The first column (before ,) is the full path including filename of the file that will be matched againt.
The second column is the file name only. It will exist in SourceA.
ASKER
obDA,
what doe the set Lookup[%%~nxa]=%%~nxb coomplish?
Is the Lookup a pre-defined function or a function you have created somewhere and not pasted into the solution?
The code as is isnt working, all I get is a black screen.
what doe the set Lookup[%%~nxa]=%%~nxb coomplish?
Is the Lookup a pre-defined function or a function you have created somewhere and not pasted into the solution?
The code as is isnt working, all I get is a black screen.
The line "for /f "tokens=1,2 delims=," %%a in ('type "%LookupFile%"') do set Lookup[%%~nxa]=%%~nxb" fills an array "Lookup" with the file name B used as index and the file name A as content. With 200k entries, this might be a bit too much for the shell environment.
So if the lookup file contains all files that are potentially copied, there's basically no need to compare the folders, just use B as copy source if the file isn't found in A.
Try this with a shortened version of the lookup file first for testing. When you start it with the full list, it might take quite some time before you see any output.
In line 15 and 16, you can pick whether the file in C should have the original name from folder B, or the file name it should have according to the lookup table. Just comment out the one you don't want (currently it keeps the name from folder B).
So if the lookup file contains all files that are potentially copied, there's basically no need to compare the folders, just use B as copy source if the file isn't found in A.
Try this with a shortened version of the lookup file first for testing. When you start it with the full list, it might take quite some time before you see any output.
In line 15 and 16, you can pick whether the file in C should have the original name from folder B, or the file name it should have according to the lookup table. Just comment out the one you don't want (currently it keeps the name from folder B).
@echo off
setlocal enabledelayedexpansion
set SourceA=D:\Temp\A
set SourceB=D:\Temp\B
set Target=D:\Temp\C
set LookupFile=D:\Temp\lookup.csv
set LogFile=D:\Temp\lookup.log
if exist "%LogFile%" del "%LogFile%"
set /a Line = 0
for /f "usebackq tokens=1,2 delims=," %%f in ("%LookupFile%") do (
set /a Line += 1
echo [!Line!] Processing '%%~nxf' --^> '%%~nxg' ...
if not exist "%SourceA%\%%~nxg" (
REM *** The first of the following two lines will copy the file with its original name to C, the second with the name from the lookup file.
set TargetFileName=%%~nxf
REM set TargetFileName=%%~nxg
copy "%SourceB%\%%~nxf" "%Target%\!TargetFileName!"
if errorlevel 1 (
echo Error: could not copy '%SourceB%\%%~nxf'.
>>"%LogFile%" echo Error,"%SourceB%\%%~nxf","!TargetFileName!"
)
)
)
echo ... done.
ASKER
Oh during testing, I found a small requirements change. Under B, we also have a subfolder called subB that can also contain files to match against.
Hence, two examples are :
1) look for file 1.pdf of B:/1.pdf in A:/myfile1.pdf
lookup file contained the entry:
"B\1.pdf","myfile1.pdf"
2) look for file 2.pdf of B:/subB/2.pdf in A/myfile2.pdf
"B\sunB\2.pdf","myfile2.pd f"
Anything in B: or B:\subB that doesnt exist in A: (and A doesnt have subfolders) is to be stuck in folder Archive.
Hence, two examples are :
1) look for file 1.pdf of B:/1.pdf in A:/myfile1.pdf
lookup file contained the entry:
"B\1.pdf","myfile1.pdf"
2) look for file 2.pdf of B:/subB/2.pdf in A/myfile2.pdf
"B\sunB\2.pdf","myfile2.pd
Anything in B: or B:\subB that doesnt exist in A: (and A doesnt have subfolders) is to be stuck in folder Archive.
ASKER
And the lookup file doesnt contain the universe of all files. It is a subset of B.
It maps those to different filenames in A.
One way to do this would be:
All files in B that are not in the first column entry in the lookup file are definitely not in A. Archive those.
I this more doable. I expect about 5-10K files in B that are not present as first entry in the lookup file.
It maps those to different filenames in A.
One way to do this would be:
All files in B that are not in the first column entry in the lookup file are definitely not in A. Archive those.
I this more doable. I expect about 5-10K files in B that are not present as first entry in the lookup file.
What OS are you running this on?
Can you open a new command prompt (don't double-click the script, open a command prompt, cd into the folder where you saved it, and enter the script name) and run this short script (may take some time), and then check environment variables, especially for the very first and very last file name in the lookup file?
The output should look like this:
Can you open a new command prompt (don't double-click the script, open a command prompt, cd into the folder where you saved it, and enter the script name) and run this short script (may take some time), and then check environment variables, especially for the very first and very last file name in the lookup file?
@echo off
set LookupFile=D:\Temp\lookup.csv
for /f "usebackq tokens=1,2 delims=," %%a in ("%LookupFile%") do set Lookup[%%~nxa]=%%~nxb
After the prompt returns (hopefully), please enter
set Lookup[1.pdf]
set Lookup[x.pdf]
where 1.pdf is the name (no path, no quotes) of the B file in the first line in the lookup file, and x the name of the B file in the last line.The output should look like this:
Lookup[1.pdf]=myfile1.pdf
Lookup[x.doc]=myfilex.doc
ASKER
This is still running and taking a long time...is my other option plausible?
I have removed subdirs in B to make our solution easy. Everything is in B at first level.
All files in B that are not in the first column entry in the lookup file are definitely not in A. Archive those.
So: 1. Copy all files in B out to a text
2. Copy first entry in lookup file (remove double quotes, strip path Z:\B\ to leave only filename) out to a text
3. Compare 1 to 2 and all that dont exist output to a thrird txt called archive.txt
4. Iterate thru this archive.txt and copy all files into Z:\Archive
If you can help me write this I think we are done.
I have removed subdirs in B to make our solution easy. Everything is in B at first level.
All files in B that are not in the first column entry in the lookup file are definitely not in A. Archive those.
So: 1. Copy all files in B out to a text
2. Copy first entry in lookup file (remove double quotes, strip path Z:\B\ to leave only filename) out to a text
3. Compare 1 to 2 and all that dont exist output to a thrird txt called archive.txt
4. Iterate thru this archive.txt and copy all files into Z:\Archive
If you can help me write this I think we are done.
The main problem is neither the subfolder nor the way; the main problem is the sheer number of comparisons and that batch doesn't have proper ways to handle memory or file access. So even though this is certainly solvable in batch (though you might need quite some patience), would you mind a Powershell script in this case?
ASKER
Yes, sure, but not being familiar with Powershell (looking it up now), please let me know the steps assuming its Powershell for Dummies, if you can. Thanks!
ASKER
In response to your query, the stuff finished running and returned the results of env variable as expected for both 1st and last values
ASKER
I will award points to your first response :by: oBdAPosted on 2014-03-19 at 07:03:12
I will wait to see if the batch returns and if any questions arise in that code, I will ask here before closing....Will move the PowerShell to a new question.
I will wait to see if the batch returns and if any questions arise in that code, I will ask here before closing....Will move the PowerShell to a new question.
And about how large (number of lines) is the lookup file?
Edit:
And what's the exact format of the lookup file? Does the first column ("B:\1.doc") contain the full path to the document, or just the relative path (the B folder in this case)?