# Join, Sequence Number, and Two Ouputs

Posted on 2000-04-04
You have an input file with over a million lines like so :

0998|1999|1000|ATL|SEA|H|USAIR|2725|3002|0845

The first ouput file:

Assigns a sequence number to each line that is unique, combines the first three fields into one, only uses the last two digits of field 2, defaults the ouput of 2 to 99 if is is blank, puts a "_" after field 2, and also prints out field 6.

For example:

the above line would be:

00000001|099899_1000|USAIR

The second output file assigns a sequence number if fields 4,5,6,7,8,9,10 are unique.

For example the ouput would be:

00000001|ATL|SEA|H|USAIR|2725|3002|0845

How would u write this in perl

Any help appreciated:

Thanks
Question by:tomatocans
• 3
• 2

Author Comment

Expert Comment

\$sequence='0000000';
while( <> ){
@field=split/\|/;
print join'|',\$sequence++,\$field[0].substr(\$field[1]||99,-2)."_\$field[2]","\$field[6]\n";
}
First file (based on ozo's post):

\$sequence='0000001';
while( <> ){
unless (defined(\$CheckDup{\$_})) {
\$CheckDup{\$_} = 1;
@field=split/\|/;
print join'|',\$sequence++,\$field[0].substr(\$field[1]||99,-2)."_\$field[2]","\$field[6]\n";
}
}

Second file:
\$sequence='0000001';
while( <> ){
\$Line = \$_;
\$Line =~ s/^(.*?\|){3}//;
unless (defined(\$CheckDup{\$Line})) {
\$CheckDup{\$Line} = 1;
print \$sequence++ . "|\$Line";
}
}

Expert Comment

Expert Comment

My solutions can be streamlined slightly:

In each script the two lines in the form of
unless (defined(\$CheckDup{\$Line})) {
\$CheckDup{\$Line} = 1;

can be replaced with
unless (++\$CheckDup{\$Line} > 1) {

Author Comment

Thanks
