Solved

Extract data from a file using perl

Posted on 2011-03-19
12
752 Views
Last Modified: 2012-05-11
Hi,
I want to extract some data from "file1.txt"

file1.txt :
20.03.2011 00:35:42 xxxxxxxxxxxxxxxxxxxx  yyyy yyyyyy: yyyyyyyy
20.03.2011 00:35:42 xxxxxxxxxxxxxxxxxxxx  aaaa aa a : bb bb bb b b
20.03.2011 00:35:42 xxxxxxxxxxxxxxxxxxxx  zzzzzzzz zzz : zzzzzz zzz

19.03.2011 13:40:55 xxxxxxxxxxxxxxxxxxxx  zzzzzzzzzzzz : zzzzzz zzzz
19.03.2011 13:40:55 xxxxxxxxxxxxxxxxxxxx  zzzzzzzzzzzz : zzzzzz zzzz
19.03.2011 13:40:55 xxxxxxxxxxxxxxxxxxxx  aaaa aa a : bb bb bb b b
19.03.2011 13:40:55 xxxxxxxxxxxxxxxxxxxx  zzzzzzzzzzzz : zzzzzz zzzz
19.03.2011 13:40:55 xxxxxxxxxxxxxxxxxxxx  ccc cc c : sss ss

My output should be:

20.03.2011 00:35:42  aaaa aa a : bb bb bb b b
19.03.2011 13:40:55  aaaa aa a : bb bb bb b b
19.03.2011 13:40:55  ccc cc c : sss ss

values of b's  & s's may change
Need perl script
0
Comment
Question by:pravink22
  • 5
  • 4
  • 2
  • +1
12 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 35174279
perl -lane 'next unless $F[6]eq":"; @F[2]=""; print "@F"' file1.txt
0
 

Author Comment

by:pravink22
ID: 35174294
This is my script,
********************************************
#!/usr/bin/perl
print "Enter the file name:";
$input = <STDIN>;
open (INFILE, "<$input");
open(OUTFILE, ">>out.csv");
@array = <INFILE>;
$count = 0;
while ($count <= $#array) {
if ( @array($count) = aaaa aa a) || (@array($count) = ccc cc c) {
print "date time aaaa aa a : ?????????"}
$count++
}
close(INFILE);
close(OUTFILE);
*****************************************

can u guide me now ???
0
 
LVL 84

Expert Comment

by:ozo
ID: 35174322
#!/usr/bin/perl                                                                                      
print "Enter the file name:";
$input = <STDIN>;
open (INFILE, "<$input");
open(OUTFILE, ">>out.csv");
while( <INFILE> ){
 if( /aaaa aa a(.*)/ || /ccc cc c(.*)/ ){
     print "date time aaaa aa a : $1\n";
 }
}
0
 

Author Comment

by:pravink22
ID: 35174331
Hi ozo,

I have tried but out.csv = 0kb

Please find the attachments
file1.txt
test.pl.txt
out.csv
0
 
LVL 84

Expert Comment

by:ozo
ID: 35174404
what do you want out.csv to be?
0
 

Author Comment

by:pravink22
ID: 35174406
My output should be:

20.03.2011 00:35:42  aaaa aa a : bb bb bb b b
19.03.2011 13:40:55  aaaa aa a : bb bb bb b b
19.03.2011 13:40:55  ccc cc c : sss ss
0
Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 35174412
#!/usr/bin/perl
print "Enter the file name:";
$input = <STDIN>;
open (INFILE, "<$input");
open(OUTFILE, ">>out.csv");
while( <INFILE> ){
 if( /(\S+\s+\S+).*?(aaaa aa a.*)/ || /(\S+\s+\S+).*?(ccc cc c.*)/ ){
     print OUTFILE"$1 $2\n";
 }
}
close(INFILE);
close(OUTFILE);
0
 

Author Comment

by:pravink22
ID: 35174415
Its working Great !!! :-)
0
 

Author Comment

by:pravink22
ID: 35174908
How to get an output like this

date             Time        aaaa aa a       ccc cc c
20.03.2011 00:35:42  bb bb bb b b
19.03.2011 13:40:55  bb bb bb b b
19.03.2011 13:40:55                         sss ss


0
 
LVL 31

Expert Comment

by:farzanj
ID: 35175034
Try this
my $file = 'file1.txt';

open(INFO,"<$file") or die "Could not open the file";
my @lines = <INFO>;
close(INFO);
my $filter = "yyyyy|zzzzz|^$";
my @filter = grep(! /$filter/, @lines);
print "date             Time        aaaa aa a       ccc cc c";
print @filter;

Open in new window

0
 
LVL 31

Expert Comment

by:farzanj
ID: 35175040
Sorry a slight problem in the last one. Please ignore that
 
my $file = 'file1.txt';

open(INFO,"<$file") or die "Could not open the file";
my @lines = <INFO>;
close(INFO);
my $filter = 'yyyyy|zzzzz|^$';
my @filter = grep(! /$filter/, @lines);
print "date             Time        aaaa aa a       ccc cc c";
print @filter;

Open in new window

0
 
LVL 11

Expert Comment

by:tel2
ID: 35202539
Hi pravink22,

The above message says:

"Notice: pravink22 has requested that this question be closed by accepting pravink22's comment #35174415 (0 points) as the solution for the following reason:
Its working Great !!! :-)
..."

If it works great, prevink22, then I suggest you allocate points to the expert who made it work great.

I see that *after* you said it works great (so I guess your original request had been satisfied), you changed your requirements to a format which has got some of the data ("aaaa aa a       ccc cc c") in the heading line.  This is a problem, because:
  1. An expert has already gone to the trouble of writing code to satisfy your original requirements, but that expert has not been given any points for that effort.
  2. If you keep changing your requirements, it just adds more work to the experts, who would probably not have spent any time on this question if they knew, up front, that the requirements were going to change so much.
  3. Your latest requirements don't look very logical, since they contain part of one of the detail records in the heading line.  You have not explained the rules for working out what data to put in the heading line, so an expert may be able to make it work for the test data you've provided, but it may not work for some other data, which will mean that, once again, you will not be satisfied, and will probably ask for more changes.  I'm guessing that this is the main reason you've had no recent comments from experts.
  4. Even if you had not changed your requirements, you haven't responded to farzanj's latest post.  Yes, I know he hasn't tested his code, and it doesn't meet your first or latest requirements, but you should have told him that, and given farzanj a chance to fix it.

I suggest you award points to the expert who has provided a solution which met your original requirements, and if you still require the new format, explain the logic of it clearly in a new question.  Failing that, you might be hearing from a (real) moderator soon.
0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now