# PHP csv compare, but ignore one or more columns

I have a script that was written before I started, and it works well, however there are 2 or more columns I'd like to ignore when doing the comparison's. Right now I use sed or cut to manually remove the columns, and then do the compare/diff, but I'd like to just use the php script itself to ignore the pesky date columns.

I want to ignore column 3 and 8
File -1.csv
col-1, col-2, col-3, col-4, col5, col-6, col-7, col-8
asdf,asdf,1234,asdf,asdf,asdf,bbbb,1111
lkjh,lkjh,5432,lkjh,lkjh,lkjh,lkjh,2222

File -2.csv
col-1, col-2, col-3, col-4, col5, col-6, col-7,col-8
asdf,asdf,4321,asdf,asdf,asdf,aaaa,3333
lkjh,lkjh,2345,lkjh,lkjh,lkjh,lkjh,4444

?php
if (empty($argv[1]) or !is_file($argv[1])) {
exit("You must provide a file to sort: php csv-compare.php \\path\\to\\file1.csv \\path\\to\\file2.csv\n");
} else {
$filename1 =$argv[1];
};
if (empty($argv[2]) or !is_file($argv[2])) {
exit("You must provide a file to sort: php csv-compare.php \\path\\to\\file1.csv \\path\\to\\file2.csv\n");
} else {
$filename2 =$argv[2];
};
$file1 = file($filename1);
$file2 = file($filename2);
$matches = array_intersect($file1, $file2);$diffs   = array_merge(array_diff($file1,$file2), array_diff($file2,$file1));
if (($handle = fopen("csv-compare.matches.csv", "w")) !== FALSE) { foreach ($matches as $line) { fwrite($handle, $line); }; }; fclose($handle);
if (($handle = fopen("csv-compare.diffs.csv", "w")) !== FALSE) { fwrite($handle, $file1[0]); foreach ($diffs as $line) { fwrite($handle, $line); }; }; fclose($handle);
?>

LVL 38
###### Who is Participating?

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Commented:
Write a script that rewrites file1 and file2 without the columns and parse the newly create files to your compare script.

Just read file1 line by line. Implode the line on , so you have an array. rewrite the CSV but without the parts you don't like.

kind regards
Security SamuraiAuthor Commented:
I'm already doing that with cut and sed, I want the php to ignore a column or two without rewriting the input files beforehand. I also can't write "hello wrold" (<--see) so I won't be able to rewrite or write a script in php. I was hoping it was a simple exercise for an expert here.
-rich
Commented:
See if this helps.
<?php // demo/temp_richrumble.php
/**
* http://www.experts-exchange.com/questions/28758260/PHP-csv-compare-but-ignore-one-or-more-columns.html
*/
error_reporting(E_ALL);
echo '<pre>';

// WHAT COLUMNS DO WE IGNORE NUMBERED RELATIVE TO ZERO
$cols = [ 2, 7 ]; // GET THE REDACTED FILES INTO ARRAYS$arr1 = getMyCSV('temp_richrumble1.csv', $cols);$arr2 = getMyCSV('temp_richrumble2.csv', $cols); // LOOK FOR MATCHES OR DIFFS$match = $diffs = []; foreach ($arr1 as $key =>$csv1)
{
$csv2 =$arr2[$key]; if ($csv1 == $csv2) {$match[$key] =$csv1;
}
else
{
$diffs[$key] = $csv2; } } // SHOW THE "ORIGINALS" echo PHP_EOL; print_r($arr1);
echo PHP_EOL;
print_r($arr2); echo PHP_EOL; // SHOW THE WORK PRODUCTS echo PHP_EOL . 'MATCH' . PHP_EOL; print_r($match);
echo PHP_EOL . 'DIFFS' . PHP_EOL;
print_r($diffs); function getMyCSV($file, array $cols) {$farr = [];
$fpr = fopen($file, 'r');
if (!$fpr) trigger_error("Unable to open$file1", E_USER_ERROR);
while (!feof($fpr)) {$line = fgetcsv($fpr); if (empty($line)) continue;
foreach ($cols as$col)
{
unset($line[$col]);
}
$farr[] =$line;
}

return array_values(\$farr);
}


Experts Exchange Solution brought to you by

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Security SamuraiAuthor Commented:
That does work on small CSV's, but when I compare my smallest (32Mb's each) I go over 2Gb of ram! I kept upping the php.ini memory, 256, 512, 1024, 2048 and stopped there. With the current code I (I posted) can compare 700Mb csv's (that's after removing ever-changing date columns).
The solution does work, the output is horrible :)
I've attached two mock csv's (from mockaroo.com), and used 1,5,6 as the columns to ignore, diff's is blank (awesome!). I'm going to work on finding the size limits for a few, hang in there for a minute!
-rich
csv-1.csv
csv-2.csv
Commented:
Wow, if we had known you were dealing with multi-megabyte files, we would have designed the applications completely differently.  Here's what you want to do...

Process each file by reading the rows one-at-a-time.  Remove the fields you do not want in the comparison, or replace these fields with constants.  Use fgetcsv() to read and fputcsv() to write.  As you compare each line, write the rows to either the match or diffs output files.  The objective is to completely process each file, one row at a time, and thereby avoid bringing the entire file into an array in memory.  PHP arrays and objects can be 20X larger in memory than the data they contain, so for big-data applications, it's necessary to step away from ideas about arrays, and just use unit-record processing.

There may be other designs that use a database, but that would require a bit of a research project.  It might make sense to hire a professional programmer who could get access to your live data set.
Security SamuraiAuthor Commented:
Thanks!
###### It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
PHP

From novice to tech pro — start learning today.