Link to home
Start Free TrialLog in
Avatar of phoffric
phoffric

asked on

bash script - Part 1a - mod to compare (diff) files in different folders

Ref: https://www.experts-exchange.com/questions/28489050/bash-script-modifications-to-compare-diff-files-in-different-folders.html
From this previous question, I got the following code which generally works well. When I run it on a BASE and TEST folder having one file each, where the two file sizes are different, I get no results. I believe this has the same problem with say, two files each, but where no files have the same size. Could you tweak this to handle this case?

There is a message:
grep: same1.txt: No such file or directory
grep: same2.txt: No such file or directory
# set paths
BASE=/path/to/base
TEST=/path/to/test

# get 2 file lists (size name)
ls -lS $BASE | awk '{print $5 " " $9}' > base.txt
ls -lS $TEST | awk '{print $5 " " $9}' > test.txt

# loop through BASE, find same file sizes in TEST
cat base.txt | while read line
do
  s1=$(echo $line | awk '{print $1}')
  if grep -q $s1 test.txt
  then
    echo $line >> same1.txt
    grep $s1 test.txt >> same2.txt
  fi;
done;

# create files with different sizes
grep -v -f same1.txt base.txt > diff1.txt
grep -v -f same2.txt test.txt > diff2.txt

# do the diffs
echo "Diffing files with same size"
paste same1.txt same2.txt | while read line
do
s1=$(echo $line | awk '{print $2}')
s2=$(echo $line | awk '{print $4}')
diff $s1 $s2
done;

echo "Diffing files with differerent size"
paste diff1.txt diff2.txt | while read line
do
s1=$(echo $line | awk '{print $2}')
s2=$(echo $line | awk '{print $4}')
diff $s1 $s2
done;

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of Gerwin Jansen
Gerwin Jansen
Flag of Netherlands image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Perhaps you need to re-jig the script so as to do something more general than work from 2 lists. I don't have time to code anything right now but my approach would be:
Have 2 lists of files, sorted by size (as now)
Work through files in one of the lists individually
If there's an equal size file in 2nd list, compare against it and you are done
OTHERWISE
locate next-smaller file and count # lines in diff
locate next-larger file and count # lines in diff
if the above 2 steps only find one file (e.g. no smaller file), report comparison against that file
Otherwise report the smaller diff (or maybe both, depending ...)
Avatar of phoffric
phoffric

ASKER

I reviewed your code, and I get the gist of it. I will try to implement it in the next week. Thanks again.
@Duncan Roe,
Sorry about the title confusion. The titles now put in the Part number early for easier visibility. I believe your comment belongs in Par 2:
   https://www.experts-exchange.com/questions/28490563/bash-script-Part-2-to-compare-diff-files-in-different-folders.html

This Part 1a was intended to be just a relatively easier update to the original question to handle the specific case where all the file sizes in BASE and TEST were different.
>> Thanks again.
No problem. If you can post some (redacted) samples next week, we can do some testing for you.
OK - re-posted. Doesn't look so pretty though :-/
Phoffric, could you possibly tar up sample TEST & BASE directories and post as a file attachment? (assuming they're not confidential).
Thanks ... Duncan.
Yeah, unfortunately, I am not allowed to present actual files. I would have to generate by hand some look-alikes, which I will be more than pleased to do.
If you would be so kind as to spend the time to do so, that would be terrific. You could make them differ in a way that mirrors how your production files do, which the rest of us can only guess at.
Worked this weekend so no time. Will have free time soon.
No problem. Just open a new question when needed.