Link to home
Start Free TrialLog in
Avatar of LuckyLucks
LuckyLucks

asked on

Looking up files in folder A against folder B using lookup text file

Hi EEE,
 
I have two folders on my hard drive A and B. I want to compare all files in B to A. Any file thats in B and not in A, shall be copied into a third folder C. All this to be done in the batch. I want to keep A , B and C as variables so set A=C:/A for example.

One little twist is that I can not compare all of B/*.* to A/*.* using the filenames as is. For that I need a lookup file.

So, under B, a file called 1.doc is the same as myfile1.doc in A.
B/1.doc = A/myfile1.doc
The lookup file has this information in the following format:

"B:\1.doc","myfile1.doc",<some string>,<some string>

How can I do this? Part of the code is placed below but needs to be augmented for the lookup file and correct picking of filename in A.

@echo off
setlocal enabledelayedexpansion
set SourceA=Z:\A
set SourceB=Z:\B
set Target=Z:\Archive

if exist "%Target%" del "%Target%"

for %%a in ("%SourceB%\*.*") do (
      echo Processing '%%~nxa' ...
      if not exist "%SourceA%\%%~nxa" (
            copy "%%a" "%Target%"
      )
)
echo ... done.
Avatar of oBdA
oBdA

Do you only need the lookup file, or both the lookup file and comparison by file names for the rest that's not in the lookup file?
And about how large (number of lines) is the lookup file?

Edit:
And what's the exact format of the lookup file? Does the first column ("B:\1.doc") contain the full path to the document, or just the relative path (the B folder in this case)?
ASKER CERTIFIED SOLUTION
Avatar of oBdA
oBdA

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of LuckyLucks

ASKER

oBDA: here are my responses:

1. The lookup file contains 200K records and is approx 30 MB.
2. The look up file name in the lookup file is only to be used to match.
3. I expect there will be stuff in B that has no entry in the lookup file. These are not matched to A. They are the odd ones out that we will stick into Archive.
Also in response to:
And what's the exact format of the lookup file? Does the first column ("B:\1.doc") contain the full path to the document, or just the relative path (the B folder in this case)?

The first column (before ,) is the full path including filename of the file that will be matched againt.

The second column is the file name only. It will exist in SourceA.
obDA,
 what doe the set Lookup[%%~nxa]=%%~nxb coomplish?
 Is the Lookup a pre-defined function or a function you have created somewhere and not pasted into the solution?


The code as is isnt working, all I get is a black screen.
The line "for /f "tokens=1,2 delims=," %%a in ('type "%LookupFile%"') do set Lookup[%%~nxa]=%%~nxb" fills an array "Lookup" with the file name B used as index and the file name A as content. With 200k entries, this might be a bit too much for the shell environment.
So if the lookup file contains all files that are potentially copied, there's basically no need to compare the folders, just use B as copy source if the file isn't found in A.
Try this with a shortened version of the lookup file first for testing. When you start it with the full list, it might take quite some time before you see any output.
In line 15 and 16, you can pick whether the file in C should have the original name from folder B, or the file name it should have according to the lookup table. Just comment out the one you don't want (currently it keeps the name from folder B).
@echo off
setlocal enabledelayedexpansion
set SourceA=D:\Temp\A
set SourceB=D:\Temp\B
set Target=D:\Temp\C
set LookupFile=D:\Temp\lookup.csv
set LogFile=D:\Temp\lookup.log
if exist "%LogFile%" del "%LogFile%"
set /a Line = 0
for /f "usebackq tokens=1,2 delims=," %%f in ("%LookupFile%") do (
	set /a Line += 1
	echo [!Line!] Processing '%%~nxf' --^> '%%~nxg' ...
	if not exist "%SourceA%\%%~nxg" (
		REM *** The first of the following two lines will copy the file with its original name to C, the second with the name from the lookup file.
		set TargetFileName=%%~nxf
		REM set TargetFileName=%%~nxg
		copy "%SourceB%\%%~nxf" "%Target%\!TargetFileName!"
		if errorlevel 1 (
			echo Error: could not copy '%SourceB%\%%~nxf'.
			>>"%LogFile%" echo Error,"%SourceB%\%%~nxf","!TargetFileName!"
		)
	)
)
echo ... done.

Open in new window

Oh during testing, I found a small requirements change. Under B, we also have a subfolder called subB that can also contain files to match against.

Hence, two examples are :
1) look for file 1.pdf of B:/1.pdf in A:/myfile1.pdf  

lookup file contained the entry:
"B\1.pdf","myfile1.pdf"

2) look for file 2.pdf of B:/subB/2.pdf in A/myfile2.pdf
"B\sunB\2.pdf","myfile2.pdf"


Anything in B: or B:\subB that doesnt exist in A: (and A doesnt have subfolders) is to be stuck in folder Archive.
And the lookup file doesnt contain the universe of all files. It is a subset of B.
It maps those to different filenames in A.

One way to do this would be:

All files in B that are not in the first column entry in the lookup file are definitely not in A. Archive those.

I this more doable. I expect about 5-10K files in B that are not present as first entry in the lookup file.
What OS are you running this on?
Can you open a new command prompt (don't double-click the script, open a command prompt, cd into the folder where you saved it, and enter the script name) and run this short script (may take some time), and then check environment variables, especially for the very first and very last file name in the lookup file?
@echo off
set LookupFile=D:\Temp\lookup.csv
for /f "usebackq tokens=1,2 delims=," %%a in ("%LookupFile%") do set Lookup[%%~nxa]=%%~nxb

Open in new window

After the prompt returns (hopefully), please enter
set Lookup[1.pdf]
set Lookup[x.pdf]

Open in new window

where 1.pdf is the name (no path, no quotes) of the B file in the first line in the lookup file, and x the name of the B file in the last line.
The output should look like this:
Lookup[1.pdf]=myfile1.pdf
Lookup[x.doc]=myfilex.doc

Open in new window

This is still running and taking a long time...is my other option plausible?

I have removed subdirs in B to make our solution easy. Everything is in B at first level.

All files in B that are not in the first column entry in the lookup file are definitely not in A. Archive those.

So: 1. Copy all files in B out to a text
      2. Copy first entry in lookup file (remove double quotes, strip path Z:\B\ to leave only filename) out to a text
     3. Compare 1 to 2 and all that dont exist output to a thrird txt called archive.txt
     4. Iterate thru this archive.txt and copy all files into Z:\Archive

If you can help me write this I think we are done.
The main problem is neither the subfolder nor the way; the main problem is the sheer number of comparisons and that batch doesn't have proper ways to handle memory or file access. So even though this is certainly solvable in batch (though you might need quite some patience), would you mind a Powershell script in this case?
Yes, sure, but not being familiar with Powershell (looking it up now), please let me know the steps assuming its Powershell for Dummies, if you can. Thanks!
In response to your query, the stuff finished running and returned the results of env variable as expected for both 1st and last values
I will award points to your first response :by: oBdAPosted on 2014-03-19 at 07:03:12

I will wait to see if the batch returns and if any questions arise in that code, I will ask here before closing....Will move the PowerShell to a new question.