Solved

Looking up files in folder A against folder B using lookup text file

Posted on 2014-03-18
14
391 Views
Last Modified: 2014-03-22
Hi EEE,
 
I have two folders on my hard drive A and B. I want to compare all files in B to A. Any file thats in B and not in A, shall be copied into a third folder C. All this to be done in the batch. I want to keep A , B and C as variables so set A=C:/A for example.

One little twist is that I can not compare all of B/*.* to A/*.* using the filenames as is. For that I need a lookup file.

So, under B, a file called 1.doc is the same as myfile1.doc in A.
B/1.doc = A/myfile1.doc
The lookup file has this information in the following format:

"B:\1.doc","myfile1.doc",<some string>,<some string>

How can I do this? Part of the code is placed below but needs to be augmented for the lookup file and correct picking of filename in A.

@echo off
setlocal enabledelayedexpansion
set SourceA=Z:\A
set SourceB=Z:\B
set Target=Z:\Archive

if exist "%Target%" del "%Target%"

for %%a in ("%SourceB%\*.*") do (
      echo Processing '%%~nxa' ...
      if not exist "%SourceA%\%%~nxa" (
            copy "%%a" "%Target%"
      )
)
echo ... done.
0
Comment
Question by:LuckyLucks
  • 9
  • 5
14 Comments
 
LVL 82

Expert Comment

by:oBdA
Comment Utility
Do you only need the lookup file, or both the lookup file and comparison by file names for the rest that's not in the lookup file?
And about how large (number of lines) is the lookup file?

Edit:
And what's the exact format of the lookup file? Does the first column ("B:\1.doc") contain the full path to the document, or just the relative path (the B folder in this case)?
0
 
LVL 82

Accepted Solution

by:
oBdA earned 500 total points
Comment Utility
Assuming this is running on Vista or later (or the lookup list is no longer than about 30kb), try the following. If a file name is defined in the lookup file, the referenced file name will be used as target file name to compare. If not, a regular file name compare is still done for each file in B.
@echo off
setlocal enabledelayedexpansion
set SourceA=D:\Temp\A
set SourceB=D:\Temp\B
set Target=D:\Temp\C
set LookupFile=D:\Temp\lookup.csv
for /f "tokens=1,2 delims=," %%a in ('type "%LookupFile%"') do set Lookup[%%~nxa]=%%~nxb
for %%a in ("%SourceB%\*.*") do (
	if defined Lookup[%%~nxa] (
		set LookFor=!Lookup[%%~nxa]!
		echo Processing '%%~nxa' ^(--^> !LookFor!^) ...
	) else (
		set LookFor=%%~nxa
		echo Processing '%%~nxa' ...
	)
	if not exist "%SourceA%\!LookFor!" (
		copy "%%a" "%Target%"
	)
)
echo ... done.

Open in new window

0
 

Author Comment

by:LuckyLucks
Comment Utility
oBDA: here are my responses:

1. The lookup file contains 200K records and is approx 30 MB.
2. The look up file name in the lookup file is only to be used to match.
3. I expect there will be stuff in B that has no entry in the lookup file. These are not matched to A. They are the odd ones out that we will stick into Archive.
0
 

Author Comment

by:LuckyLucks
Comment Utility
Also in response to:
And what's the exact format of the lookup file? Does the first column ("B:\1.doc") contain the full path to the document, or just the relative path (the B folder in this case)?

The first column (before ,) is the full path including filename of the file that will be matched againt.

The second column is the file name only. It will exist in SourceA.
0
 

Author Comment

by:LuckyLucks
Comment Utility
obDA,
 what doe the set Lookup[%%~nxa]=%%~nxb coomplish?
 Is the Lookup a pre-defined function or a function you have created somewhere and not pasted into the solution?


The code as is isnt working, all I get is a black screen.
0
 
LVL 82

Expert Comment

by:oBdA
Comment Utility
The line "for /f "tokens=1,2 delims=," %%a in ('type "%LookupFile%"') do set Lookup[%%~nxa]=%%~nxb" fills an array "Lookup" with the file name B used as index and the file name A as content. With 200k entries, this might be a bit too much for the shell environment.
So if the lookup file contains all files that are potentially copied, there's basically no need to compare the folders, just use B as copy source if the file isn't found in A.
Try this with a shortened version of the lookup file first for testing. When you start it with the full list, it might take quite some time before you see any output.
In line 15 and 16, you can pick whether the file in C should have the original name from folder B, or the file name it should have according to the lookup table. Just comment out the one you don't want (currently it keeps the name from folder B).
@echo off
setlocal enabledelayedexpansion
set SourceA=D:\Temp\A
set SourceB=D:\Temp\B
set Target=D:\Temp\C
set LookupFile=D:\Temp\lookup.csv
set LogFile=D:\Temp\lookup.log
if exist "%LogFile%" del "%LogFile%"
set /a Line = 0
for /f "usebackq tokens=1,2 delims=," %%f in ("%LookupFile%") do (
	set /a Line += 1
	echo [!Line!] Processing '%%~nxf' --^> '%%~nxg' ...
	if not exist "%SourceA%\%%~nxg" (
		REM *** The first of the following two lines will copy the file with its original name to C, the second with the name from the lookup file.
		set TargetFileName=%%~nxf
		REM set TargetFileName=%%~nxg
		copy "%SourceB%\%%~nxf" "%Target%\!TargetFileName!"
		if errorlevel 1 (
			echo Error: could not copy '%SourceB%\%%~nxf'.
			>>"%LogFile%" echo Error,"%SourceB%\%%~nxf","!TargetFileName!"
		)
	)
)
echo ... done.

Open in new window

0
 

Author Comment

by:LuckyLucks
Comment Utility
Oh during testing, I found a small requirements change. Under B, we also have a subfolder called subB that can also contain files to match against.

Hence, two examples are :
1) look for file 1.pdf of B:/1.pdf in A:/myfile1.pdf  

lookup file contained the entry:
"B\1.pdf","myfile1.pdf"

2) look for file 2.pdf of B:/subB/2.pdf in A/myfile2.pdf
"B\sunB\2.pdf","myfile2.pdf"


Anything in B: or B:\subB that doesnt exist in A: (and A doesnt have subfolders) is to be stuck in folder Archive.
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 

Author Comment

by:LuckyLucks
Comment Utility
And the lookup file doesnt contain the universe of all files. It is a subset of B.
It maps those to different filenames in A.

One way to do this would be:

All files in B that are not in the first column entry in the lookup file are definitely not in A. Archive those.

I this more doable. I expect about 5-10K files in B that are not present as first entry in the lookup file.
0
 
LVL 82

Expert Comment

by:oBdA
Comment Utility
What OS are you running this on?
Can you open a new command prompt (don't double-click the script, open a command prompt, cd into the folder where you saved it, and enter the script name) and run this short script (may take some time), and then check environment variables, especially for the very first and very last file name in the lookup file?
@echo off
set LookupFile=D:\Temp\lookup.csv
for /f "usebackq tokens=1,2 delims=," %%a in ("%LookupFile%") do set Lookup[%%~nxa]=%%~nxb

Open in new window

After the prompt returns (hopefully), please enter
set Lookup[1.pdf]
set Lookup[x.pdf]

Open in new window

where 1.pdf is the name (no path, no quotes) of the B file in the first line in the lookup file, and x the name of the B file in the last line.
The output should look like this:
Lookup[1.pdf]=myfile1.pdf
Lookup[x.doc]=myfilex.doc

Open in new window

0
 

Author Comment

by:LuckyLucks
Comment Utility
This is still running and taking a long time...is my other option plausible?

I have removed subdirs in B to make our solution easy. Everything is in B at first level.

All files in B that are not in the first column entry in the lookup file are definitely not in A. Archive those.

So: 1. Copy all files in B out to a text
      2. Copy first entry in lookup file (remove double quotes, strip path Z:\B\ to leave only filename) out to a text
     3. Compare 1 to 2 and all that dont exist output to a thrird txt called archive.txt
     4. Iterate thru this archive.txt and copy all files into Z:\Archive

If you can help me write this I think we are done.
0
 
LVL 82

Expert Comment

by:oBdA
Comment Utility
The main problem is neither the subfolder nor the way; the main problem is the sheer number of comparisons and that batch doesn't have proper ways to handle memory or file access. So even though this is certainly solvable in batch (though you might need quite some patience), would you mind a Powershell script in this case?
0
 

Author Comment

by:LuckyLucks
Comment Utility
Yes, sure, but not being familiar with Powershell (looking it up now), please let me know the steps assuming its Powershell for Dummies, if you can. Thanks!
0
 

Author Comment

by:LuckyLucks
Comment Utility
In response to your query, the stuff finished running and returned the results of env variable as expected for both 1st and last values
0
 

Author Comment

by:LuckyLucks
Comment Utility
I will award points to your first response :by: oBdAPosted on 2014-03-19 at 07:03:12

I will wait to see if the batch returns and if any questions arise in that code, I will ask here before closing....Will move the PowerShell to a new question.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

The following is a collection of cases for strange behaviour when using advanced techniques in DOS batch files. You should have some basic experience in batch "programming", as I'm assuming some knowledge and not further explain the basics. For some…
I have published numerous articles here at Experts Exchange that present programs/scripts written in a language called AutoHotkey. Each of those articles has a brief paragraph describing where to download the product and how to install it. I have al…
Sending a Secure fax is easy with eFax Corporate (http://www.enterprise.efax.com). First, Just open a new email message.  In the To field, type your recipient's fax number @efaxsend.com. You can even send a secure international fax — just include t…
Here's a very brief overview of the methods PRTG Network Monitor (https://www.paessler.com/prtg) offers for monitoring bandwidth, to help you decide which methods you´d like to investigate in more detail.  The methods are covered in more detail in o…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

6 Experts available now in Live!

Get 1:1 Help Now