Link to home
Start Free TrialLog in
Avatar of fabiano petrone
fabiano petroneFlag for Italy

asked on

best way of fetching some records

hello,
please find attached a list of various records resulting from various databases.
what is in your opinion the best way for extracting only the records of the form:

***BEGIN OF RECORD***
TITLE: Interacting with computers
PACKAGE: Elsevier SD Freedom Collection:Full Text
***END OF RECORD***

(I.E. with only TITLE & PACKAGE fields)?

I guess a perl script...but also other scripting languages will be good...

Thanks a Lot for your help,
fabiano
records.txt
Avatar of wilcoxon
wilcoxon
Flag of United States of America image

You could do this in a simple regex.  Here is Perl code if you want only records with TITLE and PACKAGE (and nothing else).
# assume $string contains the record you want to check
unless ($string =~ m{\*+BEGIN OF RECORD\*+\s+TITLE: ([^\n]+)\s*PACKAGE: ([^\n]+)\s*\*+END OF RECORD\*}ms) {
    next; # or return or something else to skip processing
}
my ($title, $pkg) = ($1, $2)
# do whatever you want with the values

Open in new window

Let me know if you need more help.  If you do, please provide more details (such as are the files plain text files, how big are the files, what do you want to do with the values extracted, etc).
Avatar of fabiano petrone

ASKER

Hello, Wilcoxon
Before all, Thanks a lot for your reply.
I've tried the following code but still something goes wrong...can you help me?

#c:\perl\bin\perl.exe
use strict;
my $in_file = "records.txt";
my $out_file = "results.txt";

open INFILE, "< $in_file" or die "Can't open $in_file $!";
open OUTFILE, "> $out_file" or die "Can't open $out_file $!";

while (<INFILE>) {
# assume $in_file contains the record you want to check
unless ($in_file =~ m{\*+BEGIN OF RECORD\*+\s+TITLE: ([^\n]+)\s*PACKAGE: ([^\n]+)\s*\*+END OF RECORD\*}ms) {
    next; # or return or something else to skip processing
}
my ($title, $pkg) = ($1, $2);
# do whatever you want with the values
  print OUTFILE $title, $pkg, "\n";
}

Open in new window

Thanks again,
Fabiano
ASKER CERTIFIED SOLUTION
Avatar of wilcoxon
wilcoxon
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi fabianope65,

Here are the first 17 lines of your file:
***BEGIN OF RECORD***
TITLE: Publishers weekly
AVAILABILITY: Available from 1997. 
PACKAGE: EBSCOhost Business Source Complete:Full Text
***END OF RECORD***


***BEGIN OF RECORD***
TITLE: Communications of the ACM
AVAILABILITY: Available from 1958 volume: 1 issue: 1. 
PACKAGE: ACM Digital Library:Full Text
***END OF RECORD***


AVAILABILITY: Available from 1999. 
PACKAGE: EBSCOhost Business Source Complete:Full Text
***END OF RECORD***

Open in new window


What do you want done with records which don't start with a "***BEGIN OF RECORD***" marker?  See the last part of the above extract, for an example.

Thanks.
tel2
Hi again fabianope65,

Assuming records.txt has Windows style line terminators (i.e. CR+LF), I think this Perl script should work for you:
{
  local $/ = "\r\n\r\n";  # Temporarily change the record terminator to CR+LF+CR+LF (i.e. paragraph mode)
  open INFILE, "<records.txt" or die "Can't open records.txt";
  while (<INFILE>)
  {
    next unless ($title, $pkg) = $_ =~ /\*\*\*BEGIN OF RECORD\*\*\*.*?\sTITLE: (.+?)\n.*?PACKAGE: (.+?)\n.*?\*\*\*END OF RECORD\*\*\*/s;
    # Do whatever you want with $title & $pkg
  }
}

Open in new window

Or if records.txt has UNIX style line terminators, change:
    local $/ = "\r\n\r\n";  # ...
to:
    local $/ = '';

Both options should work with any size records.txt file.

And if you can answer the question in my previous post sometime, that would be good.
Also, what OS are you running?
hi, I'm now on a windows 7 + activeperl 5.16.3.1604
thanks
fabiano
hello,
thanks a lot to both of you...the wilcoxon script works perfectly thanks again, fabiano