• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 678
  • Last Modified:

Reformatting or rearranging a text file using awk or sed...

Hi,

I am trying to reformat or reaarange a packing list (text file) on my AIX unix machine that our database program produces.  I need to do this outside of the database system as I do not currently have the source code.  I assume the program should be written in awk or sed or??  To make this simple I will show an example of what I have and what I need.  

+++++Begin Existing Invoice++++++++++++
Order Number:  12345    Date: 04/08/05
Ship Via: UPS           Terms: VISA

  Item   Description   Qty  Price
1 34567  Modem          1    100.00
2 31266  Router         2    125.00
3 21890  Phone          1    100.00




Entered By: MRM    
+++++End Existing Invoice++++++++++++++

+++++Begin Invoice Like I Want+++++++++
Date: 04/08/05         Ship Via: UPS
Terms: VISA            Entered By: MRM

  Item   Description   Qty  Price
1 34567  Modem          1    100.00
2 31266  Router         2    125.00
3 21890  Phone          1    100.00




Order Number: 12345
++++++End Invoice Like I Want++++++++++

Note that the number of items can be variable but the Entered By line is always on line 12 of the input.  To make things more complicated, this can be a multipage packing list, so I need to account for a form feed (or count the lines and assume the top of the next page at 14 lines) and then make additional pages as required.

Any thoughts on how to do this?

I will throw in an additional 500 points for a sample script..

Thanks,
Mark
0
9thTee
Asked:
9thTee
  • 5
  • 4
1 Solution
 
tfewsterCommented:
Parse the whole "page" into variables and when you reach the line "Entered By", print the reformatted page:

awk '
{LINE=NR; LINE %= 13}
/Order Number:/{ONO=$3;DATE=$5}
/Ship Via:/{SHIP=$3;TERMS=$5}
LINE == 3 , LINE == 11  {ITEM[LINE]=$0}
/Entered By:/{BY=$3;
             print "Date: " DATE "\tShip Via: " SHIP
             print "Terms: " TERMS "\tEntered By: " BY
             for ( i=3; i <=11; i++ ) { print ITEM[i] }
             print "Order Number: " ONO "\f"
             }
' A.txt

From your example, it actually looks like Page 2 would start at line 13 - If so, change "LINE %= 13" to "LINE %= 12"
0
 
9thTeeAuthor Commented:
Hi tfewster,

How do I run it from a $.  I do not use awk much at all.

Thanks,
Mark

0
 
brettmjohnsonCommented:
The easiest thing to do would be to write the awk script to a file (myprog.awk in the following example,
everything between the single quotes, but not the single quotes themselves),
then invoke awk with the -f syntax:

$ awk -f myprog.awk A.txt

0
The 14th Annual Expert Award Winners

The results are in! Meet the top members of our 2017 Expert Awards. Congratulations to all who qualified!

 
9thTeeAuthor Commented:
Ok, what if the 2nd line changes to something like this? :

++++++++++++++
Ship Via: Hickory Truck   Terms: 2% 10th Prox Net 25th  
++++++++++++++

Ship via is a 16 place field and can be multiple words right padded with spaces.
Terms is a 24 place field and can be multiple words right padded with spaces.

Thanks,
Mark
0
 
tfewsterCommented:
If those character positions are always fixed, just change the line to extract those strings to:
/Ship Via:/{SHIP=substr($0,11,16);TERMS=substr($0,34,24)}
0
 
9thTeeAuthor Commented:
Hi tfewster,

I am closing up this question and starting another that asks other questions about your script.  Please continue here:
 http://www.experts-exchange.com/Operating_Systems/Unix/Q_21383387.html

Thanks for the help.

Mark
0
 
tfewsterCommented:
Thanks for the points, but there is no need for a seperate question for the explanation! In fact, awarding more than 500 points to a question is against the rules.

> Please briefly explain how this script works.

Each line in the awk program takes the form [Optional pattern] {actions to take}
Multiple actions are seperated by semicolons or newlines.

{LINE=NR; LINE %= 13} # There's no [pattern] here, as we want to do this with every line; This part is to deal with multiple pages. Set variable "LINE"= to the number of "records" (NR) read so far; In this case, each line of the input file is a record.  "LINE %= 13" is the remainder of LINE divided by 13, i.e. the line number on a page: 1%=13 is Line 1; 14%=13 is Line 1 (on the second page) and so on.

/Order Number:/{ONO=$3;DATE=$5} # If the string "Order Number" is found in the record we're processing, capture field 3 and field 5 into variables (appropriately named, and in upper case so we can see the difference between variables & text) ; The field seperator in this case is any whitespace.

/Ship Via:/{SHIP=substr($0,11,16);TERMS=substr($0,34,24)} # If the string "Ship Via:" is found, capture the relevant substring into a variable;  Here, we're doing it by character start position+known length of the text as we can't be sure of the number of fields.  Note that "$0" is the entire record.

LINE == 3 , LINE == 11  {ITEM[LINE]=$0} # For lines 3-11, just save the line in an array ITEM[3], ITEM[4]..ITEM[11]

/Entered By:/{BY=$3;    # Capture "By" into a variable;  And as this is the end of a page, print out the newly formatted page
             print "Date: " DATE "\tShip Via: " SHIP
             print "Terms: " TERMS "\tEntered By: " BY
             for ( i=3; i <=11; i++ ) { print ITEM[i] }  # for i= 3-11, output the items we saved in the ITEM array
             print "Order Number: " ONO "\f"
             }

So awk reads the input file record by record, and checks against all the rules we've defined - More than one rule may apply, e.g. the first rule {LINE=NR...} will always be executed.

The AIX man page for awk is quite good; A tutorial can be found at http://www.gnu.org/software/gawk/manual/gawk.html


>  Also, can it be changed so that we specify a line number and then grab the variables?  For example somehow specify line 1 get 20 characters starting at position 10.  Line 2 get 12 characters starting at position 10, etc.

Certainly, instead of searching for the string "Order No", we can just specify the pattern as "LINE == 1"; Then we use substr to extract the characters we want:
LINE == 1 { ONO=substr( $0, 10, 20 ) }
0
 
9thTeeAuthor Commented:
Thank you so much for the wonderful explanation.  This clears up a lot.  
I am looking over the man page that you mentioned.

I have so far went thru my actual packing list (which is quite a bit more complex than the example I originally used) and using the substr() command, I have all of the fields for the header in variables.  I have also used the print command to verify that I have all the variables set up correctly.  

I have run into a small problem and not sure how to fix it.  Say that the last field that I need to grab is at the end of a line and that particular line is 80 characters long.   If I need to grab 15 characters (since I know the field can be up to 15 characters lone) and I do SHIPVIA=(substr($0,70,15) I am trying to read past the end of the line and this seems to be the cause of the problem.  Since this particular Ship Via is only 10 characters line, it causes something weird when I use the printf "%15s", SHIPVIA command.  It is fine if I use the print command but it looks like that I need the printf commnad since I need to specify a width for the fields so that everything lines up properly.  When I do printf it looks like about 8 spaces are added to the front of what prints.  When I do printf "%12s", TERMS it actually spells the field wrong (Caah Only instead of Cash Only).  I tried removing the last parameter in the substr command so the command became SHIPVIA=substr($0,70) but that did not make any difference.  

Do you know how to overcome this problem or am I just not using the correct command?  Basically I just need the fields to be a specific width when printed so that they line up with an overlay form.  Maybe testing the line length and making sure the substr command does not specify a length that goes over this??  I did try setting the length in substr to the actual shorter length for this particular Ship Via and it works properly, so I am sure the problem is going beyond the end of line.

Thanks for your help.

Mark
0
 
tfewsterCommented:
Hi Mark,

Sorry, I can't think what might be the problem in this case; I'd say it was reasonable to post another question for that, and if you can give some sample text as well as the relevant sections of your code I'm sure someone less dopy will be able to help ;-)

Incidentally, you can delete the other question if noone has posted in it.

Regards,
Tim
0
 
9thTeeAuthor Commented:
Ok, cool.  Thanks.

Mark
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

  • 5
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now