Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

perl csv compare columns

Posted on 2015-02-05
5
Medium Priority
?
284 Views
Last Modified: 2015-02-10
I have two csv files as below
File1:
Col1
ABC
KJH
KEW

File2:
Col1       col2
ABC         12
DSA         46
KJH          22
KEW         66
ADF         78

Now how can I write a perl script for file2 that should only include values from file1 and exclude remaining rows from col
From above example I want to update the File2 as below
File2:
Col1       col2
ABC         12
KJH          22
KEW        66

If you observe "ABC, KJH, KEW" are the only 3 values that are in File1 so we should keep only those entries in File2.
So how can I achieve this using perl

Thanks,
0
Comment
Question by:shragi
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
5 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 40591878
perl -ane 'if( 1..eof&&($^I=".bak") ){ ++$v{$F[0]} }elsif( $v{$F[0]} ){ print; }' File1 File2
0
 
LVL 27

Accepted Solution

by:
wilcoxon earned 2000 total points
ID: 40592901
Or, in script form...
use strict;
use warnings;
use Tie::File; # not strictly required but I like it *MUCH* better for editing files
# put the values from File1 into hash %keep
open IN, 'File1' or die "could not open File1: $!";
my %keep = map { chomp; $_ => 1 } <IN>;
close IN;
# Tie::File lets us edit the file as if it was a simple list
tie my @lines, 'Tie::File', 'File2' or die "could not edit File2: $!";
# only keep the lines that match hash %keep
@lines = grep { m{^(\w+)} and exists $keep{$1} } @lines

Open in new window

0
 
LVL 84

Expert Comment

by:ozo
ID: 40593426
@lines = grep seems to defeat the Tie::File feature of not loading the file into memory, without taking advantage of any of its other features
chomp and m{^(\w+)} may not always have consistent concepts of what is a value
The exists test would be better than the truth test if one of the values can be "0"
0
 
LVL 27

Expert Comment

by:wilcoxon
ID: 40594218
I would expect the grep (under Tie::File) to only load pieces of the file into memory at a time (although I haven't tested on a very large file).  I'm usually doing more complicated editing of files but I pretty much just always use Tie::File when editing a file (or when reading a file if it seems to make more sense).

I don't follow your comment about "chomp and m{^(\w+)} may not always have consistent concepts of what is a value.  Can you elaborate?

I thought exists was always (slightly) more efficient than the truth test for hashes?
0
 
LVL 84

Expert Comment

by:ozo
ID: 40602416
If you had values like AB.C, or if there was whitespace at the end of the File1 lines, then the values found by line 6 and line 11 may not match.

I have not tested the relative efficiency of exists and truth, but it seems reasonable to expect exists to be more efficient.
exists can also be more correct than truth in the case where an existing value may be false.
0

Featured Post

Ask an Anonymous Question!

Don't feel intimidated by what you don't know. Ask your question anonymously. It's easy! Learn more and upgrade.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this post we will learn how to make Android Gesture Tutorial and give different functionality whenever a user Touch or Scroll android screen.
Although it can be difficult to imagine, someday your child will have a career of his or her own. He or she will likely start a family, buy a home and start having their own children. So, while being a kid is still extremely important, it’s also …
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
Suggested Courses

609 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question