Link to home
Start Free TrialLog in
Avatar of Jason_Sutiono
Jason_Sutiono

asked on

Comparing 2 files and display the difference

Hi Experts,

I'm trying to compare 2 sets of files to get the difference in another file using perl.

I have written the code but it takes forever to run as the input file is quite large.

Being new to perl, what I have written is structured inefficiently.

The code is attached in the file.

Here are my input files:

stkturnsalesv3.csv (file1)
11-APR-2012|002J011|3223|1|001036|W035|S
26-MAR-2012|0020L36|3264|1|0020L36|W007|S
10-APR-2012|0020L36|3264|1|0020L36|W007|S
02-APR-2012|002J011|3223|1|002J011|W007|S

stkturnsohv3.csv (file2)
TEL|002N2D0|S|3544|1|0|0|72|002N2D0|W007| |S
TPN|002N2D0|S|3430|6|0|0|83.3333|002N2D0|W007| |S
TRH|002N2D0|S|3528|0|9|0|72|002N2D0|W007| |S
TWG|002N2D0|S|3732|0|7|0|72|002N2D0|W007| |S

Basically, I am trying to find out the difference between file 1 and file 2 based on column 2 and column 3 of file 1.

In other words, if column 2 and column 3 in file 1 does not match column 2 and column 4 in file 2, the difference is displayed in an output file in the following format:

Final Output
--------------------------------------------------
0020L36|3264|0|0|2|0|0
002J011|3223|0|0|2|0|0
---------------------------------------------------
(Col2 from file 1)|(Col3 from file 1|Default0|Default0|Quantity Added}Default 0|Default0

Note that only data from file 1 is displayed.

Also, if column 2 and column 3 in file 1 matches, I need to add the quantity (column 4) instead of repeating the 2 lines.
26-MAR-2012|0020L36|3264|1|0020L36|W007|S
10-APR-2012|0020L36|3264|1|0020L36|W007|S

Hence, the output would be:
0020L36|3264|0|0|2|0|0

Instead of:
0020L36|3264|0|0|1|0|0
0020L36|3264|0|0|1|0|0

Thank you in advance!!

Looking forward to the responses.

This is not a homework btw.

Regards,

Jason
#!/usr/bin/perl -s
$f1 = 'stkturnsalesv3.csv';
open FILE1, "$f1" or die "Could not open file file2.csv\n";
$f2= 'stkturnsohv3.csv';
open FILE2, "$f2" or die "Could not open file2.csv\n";
$outfile = 'test3.csv';
my @outlines;
my @line;
my %a;
my %qty;
my @temparray;
  
open(INFO, ">$outfile") or die "$outfile $!";
foreach (<FILE1>) {

my @col = split /\|/;

$y = 0;
$outer_text = $col[1].$col[2];
$qty{$col[1]}{$col[2]} += $col[3];

seek(FILE2,0,0);
foreach (<FILE2>) {

my @colb = split /\|/;
$inner_text = $colb[1].$colb[3];
if($outer_text eq $inner_text) {
$y = 1;
last;
}
}

if($y != 1) {

push(@temparray, "$outer_text|$col[1]|$col[2]|$col[3]\n");

}
}

my %temparrayqty;

for (@temparray){
my($col1e,$col2e,$col3e,$col4e)=split(/\|/);
	$temparrayqty{$col2e}{$col3e} += $col4e;

	}
		foreach $stockcode (sort keys %temparrayqty) {
		foreach $warehouse (keys %{$temparrayqty{$stockcode}})
		{
			my $c = "0|0|";
			my $d = "0|0";
			print INFO "$stockcode|";
			print INFO "$warehouse|";
			print INFO "$c";
			print INFO "$temparrayqty{$stockcode}{$warehouse}|";
			print INFO "$d", "\n";
		}

	}

	close(STKTURNTEST);
	
close INFO;
close FILE1;
close FILE2;

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I know you're not specifically looking for a bathfile solution however, I couldn't resist the temptation to write one.

Please confirm whether this is of any use to you and whether it returns the results you expect it to:

Copy and paste the code into Notepad and save it as say, 'FILEDIFF.BAT' in the same folder where your two files are. Then fire up a DOS session, navigate to where your files are and start the batch file like this:

    FILEDIFF

NOTE: Don't forget to change 'FILE1' and 'FILE2' in lines 5 and 8 to the names of your own files.
@echo off
setlocal enabledelayedexpansion
for /f "tokens=1 delims==" %%A in ('2^>nul set diff[') do set "%%A="

for /f "tokens=2,3 usebackq delims=|" %%A in ("FILE1") do (
  set "flag="
  if not defined diff[%%A][%%B] set diff[%%A][%%B]=0
  for /f "tokens=2,4 usebackq delims=|" %%a in ("FILE2") do if "%%A" equ "%%a" if "%%B" equ "%%b" set flag=1
  if not defined flag set /a diff[%%A][%%B]+=1
  if !diff[%%A][%%B]! equ 0 set "diff[%%A][%%B]="
)

echo --------------------------------------------------
for /f "tokens=2,3 delims=[]" %%A in ('set diff[') do echo %%A^|%%B^|0^|0^|2^|0^|0
echo --------------------------------------------------

Open in new window

NOTE: If you replace the words 'FILE1' and 'FILE2' in lines 5 and 8 in the above batch file code with '%~1' and '%~2' respectively, then you can start the batch file passing both filenames as parameters like this:

    FILEDIFF file1 file2

(You will have to put double-quote around file1 and file2 if the fienames contain spaces).

See the modified code below (I've done it for you so don't worry):
@echo off
setlocal enabledelayedexpansion
for /f "tokens=1 delims==" %%A in ('2^>nul set diff[') do set "%%A="

for /f "tokens=2,3 usebackq delims=|" %%A in ("%~1") do (
  set "flag="
  if not defined diff[%%A][%%B] set diff[%%A][%%B]=0
  for /f "tokens=2,4 usebackq delims=|" %%a in ("%~2") do if "%%A" equ "%%a" if "%%B" equ "%%b" set flag=1
  if not defined flag set /a diff[%%A][%%B]+=1
  if !diff[%%A][%%B]! equ 0 set "diff[%%A][%%B]="
)

echo --------------------------------------------------
for /f "tokens=2,3 delims=[]" %%A in ('set diff[') do echo %%A^|%%B^|0^|0^|2^|0^|0
echo --------------------------------------------------

Open in new window

Avatar of Jason_Sutiono
Jason_Sutiono

ASKER

Thanks ozo. You're the legend! That did the trick!!
Thanks Paultomasi. I do use batch file occasionally. Would come in handy someday. I cant seem to give points to assisted solutions now :(