?
Solved

Reformatting or rearranging a text file using awk or sed...

Posted on 2005-04-09
10
Medium Priority
?
667 Views
Last Modified: 2010-08-05
Hi,

I am trying to reformat or reaarange a packing list (text file) on my AIX unix machine that our database program produces.  I need to do this outside of the database system as I do not currently have the source code.  I assume the program should be written in awk or sed or??  To make this simple I will show an example of what I have and what I need.  

+++++Begin Existing Invoice++++++++++++
Order Number:  12345    Date: 04/08/05
Ship Via: UPS           Terms: VISA

  Item   Description   Qty  Price
1 34567  Modem          1    100.00
2 31266  Router         2    125.00
3 21890  Phone          1    100.00




Entered By: MRM    
+++++End Existing Invoice++++++++++++++

+++++Begin Invoice Like I Want+++++++++
Date: 04/08/05         Ship Via: UPS
Terms: VISA            Entered By: MRM

  Item   Description   Qty  Price
1 34567  Modem          1    100.00
2 31266  Router         2    125.00
3 21890  Phone          1    100.00




Order Number: 12345
++++++End Invoice Like I Want++++++++++

Note that the number of items can be variable but the Entered By line is always on line 12 of the input.  To make things more complicated, this can be a multipage packing list, so I need to account for a form feed (or count the lines and assume the top of the next page at 14 lines) and then make additional pages as required.

Any thoughts on how to do this?

I will throw in an additional 500 points for a sample script..

Thanks,
Mark
0
Comment
Question by:9thTee
  • 5
  • 4
10 Comments
 
LVL 21

Accepted Solution

by:
tfewster earned 2000 total points
ID: 13746137
Parse the whole "page" into variables and when you reach the line "Entered By", print the reformatted page:

awk '
{LINE=NR; LINE %= 13}
/Order Number:/{ONO=$3;DATE=$5}
/Ship Via:/{SHIP=$3;TERMS=$5}
LINE == 3 , LINE == 11  {ITEM[LINE]=$0}
/Entered By:/{BY=$3;
             print "Date: " DATE "\tShip Via: " SHIP
             print "Terms: " TERMS "\tEntered By: " BY
             for ( i=3; i <=11; i++ ) { print ITEM[i] }
             print "Order Number: " ONO "\f"
             }
' A.txt

From your example, it actually looks like Page 2 would start at line 13 - If so, change "LINE %= 13" to "LINE %= 12"
0
 

Author Comment

by:9thTee
ID: 13747368
Hi tfewster,

How do I run it from a $.  I do not use awk much at all.

Thanks,
Mark

0
 
LVL 23

Expert Comment

by:brettmjohnson
ID: 13748195
The easiest thing to do would be to write the awk script to a file (myprog.awk in the following example,
everything between the single quotes, but not the single quotes themselves),
then invoke awk with the -f syntax:

$ awk -f myprog.awk A.txt

0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:9thTee
ID: 13748397
Ok, what if the 2nd line changes to something like this? :

++++++++++++++
Ship Via: Hickory Truck   Terms: 2% 10th Prox Net 25th  
++++++++++++++

Ship via is a 16 place field and can be multiple words right padded with spaces.
Terms is a 24 place field and can be multiple words right padded with spaces.

Thanks,
Mark
0
 
LVL 21

Expert Comment

by:tfewster
ID: 13748451
If those character positions are always fixed, just change the line to extract those strings to:
/Ship Via:/{SHIP=substr($0,11,16);TERMS=substr($0,34,24)}
0
 

Author Comment

by:9thTee
ID: 13748697
Hi tfewster,

I am closing up this question and starting another that asks other questions about your script.  Please continue here:
 http://www.experts-exchange.com/Operating_Systems/Unix/Q_21383387.html

Thanks for the help.

Mark
0
 
LVL 21

Expert Comment

by:tfewster
ID: 13749202
Thanks for the points, but there is no need for a seperate question for the explanation! In fact, awarding more than 500 points to a question is against the rules.

> Please briefly explain how this script works.

Each line in the awk program takes the form [Optional pattern] {actions to take}
Multiple actions are seperated by semicolons or newlines.

{LINE=NR; LINE %= 13} # There's no [pattern] here, as we want to do this with every line; This part is to deal with multiple pages. Set variable "LINE"= to the number of "records" (NR) read so far; In this case, each line of the input file is a record.  "LINE %= 13" is the remainder of LINE divided by 13, i.e. the line number on a page: 1%=13 is Line 1; 14%=13 is Line 1 (on the second page) and so on.

/Order Number:/{ONO=$3;DATE=$5} # If the string "Order Number" is found in the record we're processing, capture field 3 and field 5 into variables (appropriately named, and in upper case so we can see the difference between variables & text) ; The field seperator in this case is any whitespace.

/Ship Via:/{SHIP=substr($0,11,16);TERMS=substr($0,34,24)} # If the string "Ship Via:" is found, capture the relevant substring into a variable;  Here, we're doing it by character start position+known length of the text as we can't be sure of the number of fields.  Note that "$0" is the entire record.

LINE == 3 , LINE == 11  {ITEM[LINE]=$0} # For lines 3-11, just save the line in an array ITEM[3], ITEM[4]..ITEM[11]

/Entered By:/{BY=$3;    # Capture "By" into a variable;  And as this is the end of a page, print out the newly formatted page
             print "Date: " DATE "\tShip Via: " SHIP
             print "Terms: " TERMS "\tEntered By: " BY
             for ( i=3; i <=11; i++ ) { print ITEM[i] }  # for i= 3-11, output the items we saved in the ITEM array
             print "Order Number: " ONO "\f"
             }

So awk reads the input file record by record, and checks against all the rules we've defined - More than one rule may apply, e.g. the first rule {LINE=NR...} will always be executed.

The AIX man page for awk is quite good; A tutorial can be found at http://www.gnu.org/software/gawk/manual/gawk.html


>  Also, can it be changed so that we specify a line number and then grab the variables?  For example somehow specify line 1 get 20 characters starting at position 10.  Line 2 get 12 characters starting at position 10, etc.

Certainly, instead of searching for the string "Order No", we can just specify the pattern as "LINE == 1"; Then we use substr to extract the characters we want:
LINE == 1 { ONO=substr( $0, 10, 20 ) }
0
 

Author Comment

by:9thTee
ID: 13749367
Thank you so much for the wonderful explanation.  This clears up a lot.  
I am looking over the man page that you mentioned.

I have so far went thru my actual packing list (which is quite a bit more complex than the example I originally used) and using the substr() command, I have all of the fields for the header in variables.  I have also used the print command to verify that I have all the variables set up correctly.  

I have run into a small problem and not sure how to fix it.  Say that the last field that I need to grab is at the end of a line and that particular line is 80 characters long.   If I need to grab 15 characters (since I know the field can be up to 15 characters lone) and I do SHIPVIA=(substr($0,70,15) I am trying to read past the end of the line and this seems to be the cause of the problem.  Since this particular Ship Via is only 10 characters line, it causes something weird when I use the printf "%15s", SHIPVIA command.  It is fine if I use the print command but it looks like that I need the printf commnad since I need to specify a width for the fields so that everything lines up properly.  When I do printf it looks like about 8 spaces are added to the front of what prints.  When I do printf "%12s", TERMS it actually spells the field wrong (Caah Only instead of Cash Only).  I tried removing the last parameter in the substr command so the command became SHIPVIA=substr($0,70) but that did not make any difference.  

Do you know how to overcome this problem or am I just not using the correct command?  Basically I just need the fields to be a specific width when printed so that they line up with an overlay form.  Maybe testing the line length and making sure the substr command does not specify a length that goes over this??  I did try setting the length in substr to the actual shorter length for this particular Ship Via and it works properly, so I am sure the problem is going beyond the end of line.

Thanks for your help.

Mark
0
 
LVL 21

Expert Comment

by:tfewster
ID: 13749401
Hi Mark,

Sorry, I can't think what might be the problem in this case; I'd say it was reasonable to post another question for that, and if you can give some sample text as well as the relevant sections of your code I'm sure someone less dopy will be able to help ;-)

Incidentally, you can delete the other question if noone has posted in it.

Regards,
Tim
0
 

Author Comment

by:9thTee
ID: 13749420
Ok, cool.  Thanks.

Mark
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have been running these systems for a few years now and I am just very happy with them.   I just wanted to share the manual that I have created for upgrades and other things.  Oooh yes! FreeBSD makes me happy (as a server), no maintenance and I al…
Why Shell Scripting? Shell scripting is a powerful method of accessing UNIX systems and it is very flexible. Shell scripts are required when we want to execute a sequence of commands in Unix flavored operating systems. “Shell” is the command line i…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
In a previous video, we went over how to export a DynamoDB table into Amazon S3.  In this video, we show how to load the export from S3 into a DynamoDB table.
Suggested Courses
Course of the Month14 days, 22 hours left to enroll

839 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question