asked on

Reformatting or rearranging a text file using awk or sed...

Hi,

I am trying to reformat or reaarange a packing list (text file) on my AIX unix machine that our database program produces. I need to do this outside of the database system as I do not currently have the source code. I assume the program should be written in awk or sed or?? To make this simple I will show an example of what I have and what I need.

+++++Begin Existing Invoice++++++++++++
Order Number: 12345 Date: 04/08/05
Ship Via: UPS Terms: VISA

Item Description Qty Price
1 34567 Modem 1 100.00
2 31266 Router 2 125.00
3 21890 Phone 1 100.00

Entered By: MRM
+++++End Existing Invoice++++++++++++++

+++++Begin Invoice Like I Want+++++++++
Date: 04/08/05 Ship Via: UPS
Terms: VISA Entered By: MRM

Item Description Qty Price
1 34567 Modem 1 100.00
2 31266 Router 2 125.00
3 21890 Phone 1 100.00

Order Number: 12345
++++++End Invoice Like I Want++++++++++

Note that the number of items can be variable but the Entered By line is always on line 12 of the input. To make things more complicated, this can be a multipage packing list, so I need to account for a form feed (or count the lines and assume the top of the next page at 14 lines) and then make additional pages as required.

Any thoughts on how to do this?

I will throw in an additional 500 points for a sample script..

Thanks,
Mark

ASKER CERTIFIED SOLUTION

tfewster

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

9thTee

ASKER

Hi tfewster,

How do I run it from a $. I do not use awk much at all.

Thanks,
Mark

brettmjohnson

The easiest thing to do would be to write the awk script to a file (myprog.awk in the following example,
everything between the single quotes, but not the single quotes themselves),
then invoke awk with the -f syntax:

$ awk -f myprog.awk A.txt

9thTee

ASKER

Ok, what if the 2nd line changes to something like this? :

++++++++++++++
Ship Via: Hickory Truck Terms: 2% 10th Prox Net 25th
++++++++++++++

Ship via is a 16 place field and can be multiple words right padded with spaces.
Terms is a 24 place field and can be multiple words right padded with spaces.

Thanks,
Mark

tfewster

If those character positions are always fixed, just change the line to extract those strings to:
/Ship Via:/{SHIP=substr($0,11,16);TERMS=substr($0,34,24)}

9thTee

ASKER

Hi tfewster,

I am closing up this question and starting another that asks other questions about your script. Please continue here:
https://www.experts-exchange.com/questions/21383387/Awk-subscr-0-70-15-and-printf-programming-question.html

Thanks for the help.

Mark

tfewster

Thanks for the points, but there is no need for a seperate question for the explanation! In fact, awarding more than 500 points to a question is against the rules.

> Please briefly explain how this script works.

Each line in the awk program takes the form [Optional pattern] {actions to take}
Multiple actions are seperated by semicolons or newlines.

{LINE=NR; LINE %= 13} # There's no [pattern] here, as we want to do this with every line; This part is to deal with multiple pages. Set variable "LINE"= to the number of "records" (NR) read so far; In this case, each line of the input file is a record. "LINE %= 13" is the remainder of LINE divided by 13, i.e. the line number on a page: 1%=13 is Line 1; 14%=13 is Line 1 (on the second page) and so on.

/Order Number:/{ONO=$3;DATE=$5} # If the string "Order Number" is found in the record we're processing, capture field 3 and field 5 into variables (appropriately named, and in upper case so we can see the difference between variables & text) ; The field seperator in this case is any whitespace.

/Ship Via:/{SHIP=substr($0,11,16);TERMS=substr($0,34,24)} # If the string "Ship Via:" is found, capture the relevant substring into a variable; Here, we're doing it by character start position+known length of the text as we can't be sure of the number of fields. Note that "$0" is the entire record.

LINE == 3 , LINE == 11 {ITEM[LINE]=$0} # For lines 3-11, just save the line in an array ITEM[3], ITEM[4]..ITEM[11]

/Entered By:/{BY=$3; # Capture "By" into a variable; And as this is the end of a page, print out the newly formatted page
print "Date: " DATE "\tShip Via: " SHIP
print "Terms: " TERMS "\tEntered By: " BY
for ( i=3; i <=11; i++ ) { print ITEM[i] } # for i= 3-11, output the items we saved in the ITEM array
print "Order Number: " ONO "\f"
}

So awk reads the input file record by record, and checks against all the rules we've defined - More than one rule may apply, e.g. the first rule {LINE=NR...} will always be executed.

The AIX man page for awk is quite good; A tutorial can be found at http://www.gnu.org/software/gawk/manual/gawk.html

> Also, can it be changed so that we specify a line number and then grab the variables? For example somehow specify line 1 get 20 characters starting at position 10. Line 2 get 12 characters starting at position 10, etc.

Certainly, instead of searching for the string "Order No", we can just specify the pattern as "LINE == 1"; Then we use substr to extract the characters we want:
LINE == 1 { ONO=substr( $0, 10, 20 ) }

9thTee

ASKER

Thank you so much for the wonderful explanation. This clears up a lot.
I am looking over the man page that you mentioned.

I have so far went thru my actual packing list (which is quite a bit more complex than the example I originally used) and using the substr() command, I have all of the fields for the header in variables. I have also used the print command to verify that I have all the variables set up correctly.

I have run into a small problem and not sure how to fix it. Say that the last field that I need to grab is at the end of a line and that particular line is 80 characters long. If I need to grab 15 characters (since I know the field can be up to 15 characters lone) and I do SHIPVIA=(substr($0,70,15) I am trying to read past the end of the line and this seems to be the cause of the problem. Since this particular Ship Via is only 10 characters line, it causes something weird when I use the printf "%15s", SHIPVIA command. It is fine if I use the print command but it looks like that I need the printf commnad since I need to specify a width for the fields so that everything lines up properly. When I do printf it looks like about 8 spaces are added to the front of what prints. When I do printf "%12s", TERMS it actually spells the field wrong (Caah Only instead of Cash Only). I tried removing the last parameter in the substr command so the command became SHIPVIA=substr($0,70) but that did not make any difference.

Do you know how to overcome this problem or am I just not using the correct command? Basically I just need the fields to be a specific width when printed so that they line up with an overlay form. Maybe testing the line length and making sure the substr command does not specify a length that goes over this?? I did try setting the length in substr to the actual shorter length for this particular Ship Via and it works properly, so I am sure the problem is going beyond the end of line.

Thanks for your help.

Mark

tfewster

Hi Mark,

Sorry, I can't think what might be the problem in this case; I'd say it was reasonable to post another question for that, and if you can give some sample text as well as the relevant sections of your code I'm sure someone less dopy will be able to help ;-)

Incidentally, you can delete the other question if noone has posted in it.

Regards,
Tim

9thTee

ASKER

Ok, cool. Thanks.

Mark