Solved

Join, Sequence Number, and Two Ouputs

Posted on 2000-04-04
257 Views
You have an input file with over a million lines like so :

0998|1999|1000|ATL|SEA|H|USAIR|2725|3002|0845

The first ouput file:

Assigns a sequence number to each line that is unique, combines the first three fields into one, only uses the last two digits of field 2, defaults the ouput of 2 to 99 if is is blank, puts a "_" after field 2, and also prints out field 6.

For example:

the above line would be:

00000001|099899_1000|USAIR

The second output file assigns a sequence number if fields 4,5,6,7,8,9,10 are unique.

For example the ouput would be:

00000001|ATL|SEA|H|USAIR|2725|3002|0845

How would u write this in perl

Any help appreciated:

Thanks
0
Question by:tomatocans
• 3
• 2

Author Comment

ID: 2686042
Adjusted points from 25 to 50
0

LVL 84

Expert Comment

ID: 2686066
\$sequence='0000000';
while( <> ){
@field=split/\|/;
print join'|',\$sequence++,\$field[0].substr(\$field[1]||99,-2)."_\$field[2]","\$field[6]\n";
}
0

LVL 5

Accepted Solution

PC_User321 earned 50 total points
ID: 2686768
First file (based on ozo's post):

\$sequence='0000001';
while( <> ){
unless (defined(\$CheckDup{\$_})) {
\$CheckDup{\$_} = 1;
@field=split/\|/;
print join'|',\$sequence++,\$field[0].substr(\$field[1]||99,-2)."_\$field[2]","\$field[6]\n";
}
}

Second file:
\$sequence='0000001';
while( <> ){
\$Line = \$_;
\$Line =~ s/^(.*?\|){3}//;
unless (defined(\$CheckDup{\$Line})) {
\$CheckDup{\$Line} = 1;
print \$sequence++ . "|\$Line";
}
}

0

LVL 5

Expert Comment

ID: 2687917
0

LVL 5

Expert Comment

ID: 2687943
My solutions can be streamlined slightly:

In each script the two lines in the form of
unless (defined(\$CheckDup{\$Line})) {
\$CheckDup{\$Line} = 1;

can be replaced with
unless (++\$CheckDup{\$Line} > 1) {

0

Author Comment

ID: 2688142
Thanks
0

Featured Post

Suggested Solutions

A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …