Solved

best way of fetching some records

Posted on 2014-10-16
7
91 Views
Last Modified: 2014-10-20
hello,
please find attached a list of various records resulting from various databases.
what is in your opinion the best way for extracting only the records of the form:

***BEGIN OF RECORD***
TITLE: Interacting with computers
PACKAGE: Elsevier SD Freedom Collection:Full Text
***END OF RECORD***

(I.E. with only TITLE & PACKAGE fields)?

I guess a perl script...but also other scripting languages will be good...

Thanks a Lot for your help,
fabiano
records.txt
0
Comment
Question by:fabianope65
  • 3
  • 2
  • 2
7 Comments
 
LVL 26

Expert Comment

by:wilcoxon
ID: 40384231
You could do this in a simple regex.  Here is Perl code if you want only records with TITLE and PACKAGE (and nothing else).
# assume $string contains the record you want to check
unless ($string =~ m{\*+BEGIN OF RECORD\*+\s+TITLE: ([^\n]+)\s*PACKAGE: ([^\n]+)\s*\*+END OF RECORD\*}ms) {
    next; # or return or something else to skip processing
}
my ($title, $pkg) = ($1, $2)
# do whatever you want with the values

Open in new window

Let me know if you need more help.  If you do, please provide more details (such as are the files plain text files, how big are the files, what do you want to do with the values extracted, etc).
0
 

Author Comment

by:fabianope65
ID: 40384795
Hello, Wilcoxon
Before all, Thanks a lot for your reply.
I've tried the following code but still something goes wrong...can you help me?

#c:\perl\bin\perl.exe
use strict;
my $in_file = "records.txt";
my $out_file = "results.txt";

open INFILE, "< $in_file" or die "Can't open $in_file $!";
open OUTFILE, "> $out_file" or die "Can't open $out_file $!";

while (<INFILE>) {
# assume $in_file contains the record you want to check
unless ($in_file =~ m{\*+BEGIN OF RECORD\*+\s+TITLE: ([^\n]+)\s*PACKAGE: ([^\n]+)\s*\*+END OF RECORD\*}ms) {
    next; # or return or something else to skip processing
}
my ($title, $pkg) = ($1, $2);
# do whatever you want with the values
  print OUTFILE $title, $pkg, "\n";
}

Open in new window

Thanks again,
Fabiano
0
 
LVL 26

Accepted Solution

by:
wilcoxon earned 500 total points
ID: 40387489
Here's one way to do it that will work regardless of size of file.  If the files are always smallish, I'd maybe look at using File::Slurp.
use strict;
use warnings;
use Fcntl qw(O_RDONLY);
use Tie::File;
tie my @file, 'Tie::File', 'records.txt', mode => O_RDONLY or die "could not tie records.txt: $!";
open OUT, '>', 'results.txt' or die "could not write results.txt: $!';
for my $i (0..@file-4) {
    if ($file[$i] =~ m{\*+BEGIN OF RECORD\*+}
        and my ($ttl) = $file[$i+1] =~ m{^\s*TITLE:\s*(.+)}
        and my ($pkg) = $file[$i+2] =~ m{^\s*PACKAGE:\s*(.+)}
        and $file[$i+3] =~ m{\*+END OF RECORD\*+}) {
        print OUT $ttl, '  ', $pkg, "\n";
        $i += 4;
    } else {
        $i++;
    }
}
close OUT;

Open in new window

0
Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

 
LVL 11

Expert Comment

by:tel2
ID: 40391072
Hi fabianope65,

Here are the first 17 lines of your file:
***BEGIN OF RECORD***
TITLE: Publishers weekly
AVAILABILITY: Available from 1997. 
PACKAGE: EBSCOhost Business Source Complete:Full Text
***END OF RECORD***


***BEGIN OF RECORD***
TITLE: Communications of the ACM
AVAILABILITY: Available from 1958 volume: 1 issue: 1. 
PACKAGE: ACM Digital Library:Full Text
***END OF RECORD***


AVAILABILITY: Available from 1999. 
PACKAGE: EBSCOhost Business Source Complete:Full Text
***END OF RECORD***

Open in new window


What do you want done with records which don't start with a "***BEGIN OF RECORD***" marker?  See the last part of the above extract, for an example.

Thanks.
tel2
0
 
LVL 11

Expert Comment

by:tel2
ID: 40391333
Hi again fabianope65,

Assuming records.txt has Windows style line terminators (i.e. CR+LF), I think this Perl script should work for you:
{
  local $/ = "\r\n\r\n";  # Temporarily change the record terminator to CR+LF+CR+LF (i.e. paragraph mode)
  open INFILE, "<records.txt" or die "Can't open records.txt";
  while (<INFILE>)
  {
    next unless ($title, $pkg) = $_ =~ /\*\*\*BEGIN OF RECORD\*\*\*.*?\sTITLE: (.+?)\n.*?PACKAGE: (.+?)\n.*?\*\*\*END OF RECORD\*\*\*/s;
    # Do whatever you want with $title & $pkg
  }
}

Open in new window

Or if records.txt has UNIX style line terminators, change:
    local $/ = "\r\n\r\n";  # ...
to:
    local $/ = '';

Both options should work with any size records.txt file.

And if you can answer the question in my previous post sometime, that would be good.
Also, what OS are you running?
0
 

Author Comment

by:fabianope65
ID: 40391490
hi, I'm now on a windows 7 + activeperl 5.16.3.1604
thanks
fabiano
0
 

Author Closing Comment

by:fabianope65
ID: 40391605
hello,
thanks a lot to both of you...the wilcoxon script works perfectly thanks again, fabiano
0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

by Batuhan Cetin Regular expression is a language that we use to edit a string or retrieve sub-strings that meets specific rules from a text. A regular expression can be applied to a set of string variables. There are many RegEx engines for u…
Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now