Solved

Looking up files in folder A against folder B using lookup text file

Posted on 2014-03-18
14
417 Views
Last Modified: 2014-03-22
Hi EEE,
 
I have two folders on my hard drive A and B. I want to compare all files in B to A. Any file thats in B and not in A, shall be copied into a third folder C. All this to be done in the batch. I want to keep A , B and C as variables so set A=C:/A for example.

One little twist is that I can not compare all of B/*.* to A/*.* using the filenames as is. For that I need a lookup file.

So, under B, a file called 1.doc is the same as myfile1.doc in A.
B/1.doc = A/myfile1.doc
The lookup file has this information in the following format:

"B:\1.doc","myfile1.doc",<some string>,<some string>

How can I do this? Part of the code is placed below but needs to be augmented for the lookup file and correct picking of filename in A.

@echo off
setlocal enabledelayedexpansion
set SourceA=Z:\A
set SourceB=Z:\B
set Target=Z:\Archive

if exist "%Target%" del "%Target%"

for %%a in ("%SourceB%\*.*") do (
      echo Processing '%%~nxa' ...
      if not exist "%SourceA%\%%~nxa" (
            copy "%%a" "%Target%"
      )
)
echo ... done.
0
Comment
Question by:LuckyLucks
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 9
  • 5
14 Comments
 
LVL 85

Expert Comment

by:oBdA
ID: 39939206
Do you only need the lookup file, or both the lookup file and comparison by file names for the rest that's not in the lookup file?
And about how large (number of lines) is the lookup file?

Edit:
And what's the exact format of the lookup file? Does the first column ("B:\1.doc") contain the full path to the document, or just the relative path (the B folder in this case)?
0
 
LVL 85

Accepted Solution

by:
oBdA earned 500 total points
ID: 39939276
Assuming this is running on Vista or later (or the lookup list is no longer than about 30kb), try the following. If a file name is defined in the lookup file, the referenced file name will be used as target file name to compare. If not, a regular file name compare is still done for each file in B.
@echo off
setlocal enabledelayedexpansion
set SourceA=D:\Temp\A
set SourceB=D:\Temp\B
set Target=D:\Temp\C
set LookupFile=D:\Temp\lookup.csv
for /f "tokens=1,2 delims=," %%a in ('type "%LookupFile%"') do set Lookup[%%~nxa]=%%~nxb
for %%a in ("%SourceB%\*.*") do (
	if defined Lookup[%%~nxa] (
		set LookFor=!Lookup[%%~nxa]!
		echo Processing '%%~nxa' ^(--^> !LookFor!^) ...
	) else (
		set LookFor=%%~nxa
		echo Processing '%%~nxa' ...
	)
	if not exist "%SourceA%\!LookFor!" (
		copy "%%a" "%Target%"
	)
)
echo ... done.

Open in new window

0
 

Author Comment

by:LuckyLucks
ID: 39939887
oBDA: here are my responses:

1. The lookup file contains 200K records and is approx 30 MB.
2. The look up file name in the lookup file is only to be used to match.
3. I expect there will be stuff in B that has no entry in the lookup file. These are not matched to A. They are the odd ones out that we will stick into Archive.
0
Visualize your virtual and backup environments

Create well-organized and polished visualizations of your virtual and backup environments when planning VMware vSphere, Microsoft Hyper-V or Veeam deployments. It helps you to gain better visibility and valuable business insights.

 

Author Comment

by:LuckyLucks
ID: 39940113
Also in response to:
And what's the exact format of the lookup file? Does the first column ("B:\1.doc") contain the full path to the document, or just the relative path (the B folder in this case)?

The first column (before ,) is the full path including filename of the file that will be matched againt.

The second column is the file name only. It will exist in SourceA.
0
 

Author Comment

by:LuckyLucks
ID: 39940157
obDA,
 what doe the set Lookup[%%~nxa]=%%~nxb coomplish?
 Is the Lookup a pre-defined function or a function you have created somewhere and not pasted into the solution?


The code as is isnt working, all I get is a black screen.
0
 
LVL 85

Expert Comment

by:oBdA
ID: 39940362
The line "for /f "tokens=1,2 delims=," %%a in ('type "%LookupFile%"') do set Lookup[%%~nxa]=%%~nxb" fills an array "Lookup" with the file name B used as index and the file name A as content. With 200k entries, this might be a bit too much for the shell environment.
So if the lookup file contains all files that are potentially copied, there's basically no need to compare the folders, just use B as copy source if the file isn't found in A.
Try this with a shortened version of the lookup file first for testing. When you start it with the full list, it might take quite some time before you see any output.
In line 15 and 16, you can pick whether the file in C should have the original name from folder B, or the file name it should have according to the lookup table. Just comment out the one you don't want (currently it keeps the name from folder B).
@echo off
setlocal enabledelayedexpansion
set SourceA=D:\Temp\A
set SourceB=D:\Temp\B
set Target=D:\Temp\C
set LookupFile=D:\Temp\lookup.csv
set LogFile=D:\Temp\lookup.log
if exist "%LogFile%" del "%LogFile%"
set /a Line = 0
for /f "usebackq tokens=1,2 delims=," %%f in ("%LookupFile%") do (
	set /a Line += 1
	echo [!Line!] Processing '%%~nxf' --^> '%%~nxg' ...
	if not exist "%SourceA%\%%~nxg" (
		REM *** The first of the following two lines will copy the file with its original name to C, the second with the name from the lookup file.
		set TargetFileName=%%~nxf
		REM set TargetFileName=%%~nxg
		copy "%SourceB%\%%~nxf" "%Target%\!TargetFileName!"
		if errorlevel 1 (
			echo Error: could not copy '%SourceB%\%%~nxf'.
			>>"%LogFile%" echo Error,"%SourceB%\%%~nxf","!TargetFileName!"
		)
	)
)
echo ... done.

Open in new window

0
 

Author Comment

by:LuckyLucks
ID: 39940652
Oh during testing, I found a small requirements change. Under B, we also have a subfolder called subB that can also contain files to match against.

Hence, two examples are :
1) look for file 1.pdf of B:/1.pdf in A:/myfile1.pdf  

lookup file contained the entry:
"B\1.pdf","myfile1.pdf"

2) look for file 2.pdf of B:/subB/2.pdf in A/myfile2.pdf
"B\sunB\2.pdf","myfile2.pdf"


Anything in B: or B:\subB that doesnt exist in A: (and A doesnt have subfolders) is to be stuck in folder Archive.
0
 

Author Comment

by:LuckyLucks
ID: 39940732
And the lookup file doesnt contain the universe of all files. It is a subset of B.
It maps those to different filenames in A.

One way to do this would be:

All files in B that are not in the first column entry in the lookup file are definitely not in A. Archive those.

I this more doable. I expect about 5-10K files in B that are not present as first entry in the lookup file.
0
 
LVL 85

Expert Comment

by:oBdA
ID: 39941098
What OS are you running this on?
Can you open a new command prompt (don't double-click the script, open a command prompt, cd into the folder where you saved it, and enter the script name) and run this short script (may take some time), and then check environment variables, especially for the very first and very last file name in the lookup file?
@echo off
set LookupFile=D:\Temp\lookup.csv
for /f "usebackq tokens=1,2 delims=," %%a in ("%LookupFile%") do set Lookup[%%~nxa]=%%~nxb

Open in new window

After the prompt returns (hopefully), please enter
set Lookup[1.pdf]
set Lookup[x.pdf]

Open in new window

where 1.pdf is the name (no path, no quotes) of the B file in the first line in the lookup file, and x the name of the B file in the last line.
The output should look like this:
Lookup[1.pdf]=myfile1.pdf
Lookup[x.doc]=myfilex.doc

Open in new window

0
 

Author Comment

by:LuckyLucks
ID: 39943043
This is still running and taking a long time...is my other option plausible?

I have removed subdirs in B to make our solution easy. Everything is in B at first level.

All files in B that are not in the first column entry in the lookup file are definitely not in A. Archive those.

So: 1. Copy all files in B out to a text
      2. Copy first entry in lookup file (remove double quotes, strip path Z:\B\ to leave only filename) out to a text
     3. Compare 1 to 2 and all that dont exist output to a thrird txt called archive.txt
     4. Iterate thru this archive.txt and copy all files into Z:\Archive

If you can help me write this I think we are done.
0
 
LVL 85

Expert Comment

by:oBdA
ID: 39943122
The main problem is neither the subfolder nor the way; the main problem is the sheer number of comparisons and that batch doesn't have proper ways to handle memory or file access. So even though this is certainly solvable in batch (though you might need quite some patience), would you mind a Powershell script in this case?
0
 

Author Comment

by:LuckyLucks
ID: 39943477
Yes, sure, but not being familiar with Powershell (looking it up now), please let me know the steps assuming its Powershell for Dummies, if you can. Thanks!
0
 

Author Comment

by:LuckyLucks
ID: 39943530
In response to your query, the stuff finished running and returned the results of env variable as expected for both 1st and last values
0
 

Author Comment

by:LuckyLucks
ID: 39943590
I will award points to your first response :by: oBdAPosted on 2014-03-19 at 07:03:12

I will wait to see if the batch returns and if any questions arise in that code, I will ask here before closing....Will move the PowerShell to a new question.
0

Featured Post

What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Using dates in 'DOS' batch files has always been tricky as it has no built in ways of extracting date information.  There are many tricks using string manipulation to pull out parts of the %date% variable or output of the date /t command but these r…
TOMORROW TOMORROW.BAT is inspired by a question I get asked over and over again; that is, "How can I use batch file commands to obtain tomorrow's date?" The crux of this batch file revolves around the XCOPY command - a technique I discovered w…
There's a multitude of different network monitoring solutions out there, and you're probably wondering what makes NetCrunch so special. It's completely agentless, but does let you create an agent, if you desire. It offers powerful scalability …
Do you want to know how to make a graph with Microsoft Access? First, create a query with the data for the chart. Then make a blank form and add a chart control. This video also shows how to change what data is displayed on the graph as well as form…
Suggested Courses

622 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question