Solved

Using File::Tie to open file grab strng and search for it on different file

Posted on 2009-06-28
2
354 Views
Last Modified: 2012-05-07
In perl how can use File::Tie to to open file1 read the first line and search for that string in file2, if found print out "found string" if not found print out "Not found".  Then continue to the second string in file1 searching for it in file2.  So on so forth until no more lines in file1.
I don't want to read the whole file to memory because both of these files might get very large in size.
0
Comment
Question by:warrior32
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 250 total points
ID: 24733127
  "memory"
       This is an upper limit on the amount of memory that "Tie::File" will
       consume at any time while managing the file.  This is used for two
       things: managing the read cache and managing the deferred write buffer.

       Records read in from the file are cached, to avoid having to re-read
       them repeatedly.  If you read the same record twice, the first time it
       will be stored in memory, and the second time it will be fetched from
       the read cache.  The amount of data in the read cache will not exceed
       the value you specified for "memory".  If "Tie::File" wants to cache a
       new record, but the read cache is full, it will make room by expiring
       the least-recently visited records from the read cache.

       The default memory limit is 2Mib.  You can adjust the maximum read
       cache size by supplying the "memory" option.  The argument is the
       desired cache size, in bytes.

               # I have a lot of memory, so use a large cache to speed up access
               tie @array, 'Tie::File', $file, memory => 20_000_000;

       Setting the memory limit to 0 will inhibit caching; records will be
       fetched from disk every time you examine them.

       The "memory" value is not an absolute or exact limit on the memory
       used.  "Tie::File" objects contains some structures besides the read
       cache and the deferred write buffer, whose sizes are not charged
       against "memory".

       The cache itself consumes about 310 bytes per cached record, so if your
       file has many short records, you may want to decrease the cache memory
       limit, or else the cache overhead may exceed the size of the cached
       data.

But there seems to be no reason to reason to read file1 into memory at all
file2 would be the file that would be useful to read into memory so you can index into a hash instead of search a file.  But if you don't have enough memory for that, you may prefer to use Tie::Hash, or perhaps to use a DBI
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24787570
I don't see any need for Tie::File for this, you can just read the file one line at a time, and search for the string.
open(my $fh1, '<file1.txt') or die "could not open file1: $!\n";
open(my $fh2, '<file2.txt') or die "Could not open file2: $!\n";
while(my $s1=<$fh1>) {
	chomp $s1;
	seek $fh2,0,0;
	my $found=0;
	while(<$fh2>) {
		next unless /$s1/;
		print "$s1: found string\n";
		$found=1;
		last;
	}
	print "$s1: Not found\n" unless $found;
}
close($fh1);
close($fh2);

Open in new window

0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

717 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question