Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

bash script - Part 1a - mod to compare (diff) files in different folders

Posted on 2014-08-05
10
Medium Priority
?
377 Views
Last Modified: 2014-08-12
Ref: http://www.experts-exchange.com/Programming/Languages/Scripting/Shell/Q_28489050.html
From this previous question, I got the following code which generally works well. When I run it on a BASE and TEST folder having one file each, where the two file sizes are different, I get no results. I believe this has the same problem with say, two files each, but where no files have the same size. Could you tweak this to handle this case?

There is a message:
grep: same1.txt: No such file or directory
grep: same2.txt: No such file or directory
# set paths
BASE=/path/to/base
TEST=/path/to/test

# get 2 file lists (size name)
ls -lS $BASE | awk '{print $5 " " $9}' > base.txt
ls -lS $TEST | awk '{print $5 " " $9}' > test.txt

# loop through BASE, find same file sizes in TEST
cat base.txt | while read line
do
  s1=$(echo $line | awk '{print $1}')
  if grep -q $s1 test.txt
  then
    echo $line >> same1.txt
    grep $s1 test.txt >> same2.txt
  fi;
done;

# create files with different sizes
grep -v -f same1.txt base.txt > diff1.txt
grep -v -f same2.txt test.txt > diff2.txt

# do the diffs
echo "Diffing files with same size"
paste same1.txt same2.txt | while read line
do
s1=$(echo $line | awk '{print $2}')
s2=$(echo $line | awk '{print $4}')
diff $s1 $s2
done;

echo "Diffing files with differerent size"
paste diff1.txt diff2.txt | while read line
do
s1=$(echo $line | awk '{print $2}')
s2=$(echo $line | awk '{print $4}')
diff $s1 $s2
done;

Open in new window

0
Comment
Question by:phoffric
  • 4
  • 3
  • 3
10 Comments
 
LVL 38

Accepted Solution

by:
Gerwin Jansen, EE MVE earned 2000 total points
ID: 40242457
If we check whether same1.txt exist before we create the diff files then you don't get the 2 grep errors:

# check if any files with same size
if [ -s same1.txt ]
then
      # create files with different sizes
      grep -v -f same1.txt base.txt > diff1.txt
      grep -v -f same2.txt test.txt > diff2.txt

      # do the diffs
      echo "Diffing files with same size"
      paste same1.txt same2.txt | while read line
      do
            s1=$(echo $line | awk '{print $2}')
            s2=$(echo $line | awk '{print $4}')
            diff $s1 $s2
      done;
fi

And if we don't have a 'same1.txt' (or same2.txt) then we just compare base and test:

# check which files contain different sized files
if [ ! -s same1.txt ]
then
      f1=base.txt
      f2=test.txt
else
      f1=diff1.txt
      f2=diff2.txt
fi

echo "Diffing files with differerent size"
paste $f1 $f2 | while read line
do
      s1=$(echo $line | awk '{print $2}')
      s2=$(echo $line | awk '{print $4}')
      diff $s1 $s2
done;
0
 
LVL 35

Expert Comment

by:Duncan Roe
ID: 40242557
Perhaps you need to re-jig the script so as to do something more general than work from 2 lists. I don't have time to code anything right now but my approach would be:
Have 2 lists of files, sorted by size (as now)
Work through files in one of the lists individually
If there's an equal size file in 2nd list, compare against it and you are done
OTHERWISE
locate next-smaller file and count # lines in diff
locate next-larger file and count # lines in diff
if the above 2 steps only find one file (e.g. no smaller file), report comparison against that file
Otherwise report the smaller diff (or maybe both, depending ...)
0
 
LVL 32

Author Closing Comment

by:phoffric
ID: 40243714
I reviewed your code, and I get the gist of it. I will try to implement it in the next week. Thanks again.
0
Free learning courses: Active Directory Deep Dive

Get a firm grasp on your IT environment when you learn Active Directory best practices with Veeam! Watch all, or choose any amount, of this three-part webinar series to improve your skills. From the basics to virtualization and backup, we got you covered.

 
LVL 32

Author Comment

by:phoffric
ID: 40243725
@Duncan Roe,
Sorry about the title confusion. The titles now put in the Part number early for easier visibility. I believe your comment belongs in Par 2:
   http://www.experts-exchange.com/Programming/Languages/Scripting/Shell/Q_28490563.html

This Part 1a was intended to be just a relatively easier update to the original question to handle the specific case where all the file sizes in BASE and TEST were different.
0
 
LVL 38

Expert Comment

by:Gerwin Jansen, EE MVE
ID: 40244561
>> Thanks again.
No problem. If you can post some (redacted) samples next week, we can do some testing for you.
0
 
LVL 35

Expert Comment

by:Duncan Roe
ID: 40244971
OK - re-posted. Doesn't look so pretty though :-/
Phoffric, could you possibly tar up sample TEST & BASE directories and post as a file attachment? (assuming they're not confidential).
Thanks ... Duncan.
0
 
LVL 32

Author Comment

by:phoffric
ID: 40245013
Yeah, unfortunately, I am not allowed to present actual files. I would have to generate by hand some look-alikes, which I will be more than pleased to do.
0
 
LVL 35

Expert Comment

by:Duncan Roe
ID: 40245753
If you would be so kind as to spend the time to do so, that would be terrific. You could make them differ in a way that mirrors how your production files do, which the rest of us can only guess at.
0
 
LVL 32

Author Comment

by:phoffric
ID: 40254908
Worked this weekend so no time. Will have free time soon.
0
 
LVL 38

Expert Comment

by:Gerwin Jansen, EE MVE
ID: 40255418
No problem. Just open a new question when needed.
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Utilizing an array to gracefully append to a list of EmailAddresses
I have written articles previously comparing SARDU and YUMI.  I also included a couple of lines about Easy2boot (easy2boot.com).  I have now been using, and enjoying easy2boot as my sole multiboot utility for some years and realize that it deserves …
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.
How to Install VMware Tools in Red Hat Enterprise Linux 6.4 (RHEL 6.4) Step-by-Step Tutorial
Suggested Courses
Course of the Month21 days, 3 hours left to enroll

810 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question