Solved

Join, Sequence Number, and Two Ouputs

Posted on 2000-04-04
6
298 Views
Last Modified: 2010-03-05
You have an input file with over a million lines like so :

0998|1999|1000|ATL|SEA|H|USAIR|2725|3002|0845


The first ouput file:

Assigns a sequence number to each line that is unique, combines the first three fields into one, only uses the last two digits of field 2, defaults the ouput of 2 to 99 if is is blank, puts a "_" after field 2, and also prints out field 6.

For example:

the above line would be:

00000001|099899_1000|USAIR


The second output file assigns a sequence number if fields 4,5,6,7,8,9,10 are unique.

For example the ouput would be:

00000001|ATL|SEA|H|USAIR|2725|3002|0845


How would u write this in perl

Any help appreciated:

Thanks
0
Comment
Question by:tomatocans
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
6 Comments
 

Author Comment

by:tomatocans
ID: 2686042
Adjusted points from 25 to 50
0
 
LVL 84

Expert Comment

by:ozo
ID: 2686066
$sequence='0000000';
while( <> ){
  @field=split/\|/;
  print join'|',$sequence++,$field[0].substr($field[1]||99,-2)."_$field[2]","$field[6]\n";
}
0
 
LVL 5

Accepted Solution

by:
PC_User321 earned 50 total points
ID: 2686768
First file (based on ozo's post):

$sequence='0000001';
while( <> ){
  unless (defined($CheckDup{$_})) {
    $CheckDup{$_} = 1;
    @field=split/\|/;
    print join'|',$sequence++,$field[0].substr($field[1]||99,-2)."_$field[2]","$field[6]\n";
  }
}

Second file:
$sequence='0000001';
while( <> ){
  $Line = $_;
  $Line =~ s/^(.*?\|){3}//;
  unless (defined($CheckDup{$Line})) {
    $CheckDup{$Line} = 1;
    print $sequence++ . "|$Line";
  }
}

0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 5

Expert Comment

by:PC_User321
ID: 2687917
BTW tomatocans, many of your questions remain ungraded.
I suggest you go through all your questions and awards points to people who have given correct answers/comments, or if no one is correct, then delete the question or add more information.
0
 
LVL 5

Expert Comment

by:PC_User321
ID: 2687943
My solutions can be streamlined slightly:

In each script the two lines in the form of
     unless (defined($CheckDup{$Line})) {
    $CheckDup{$Line} = 1;
   
can be replaced with
  unless (++$CheckDup{$Line} > 1) {
   
0
 

Author Comment

by:tomatocans
ID: 2688142
Thanks
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question