I'm converting credit card data from one system to another. The data as is sits today has a card holder name field which consists of a single column in my Sybase DB. To properly extract/format for loading into the new system, I have to break apart the name into two parts (firstname, lastname). The problem is, due to user creativity (LOL) the single field is not consistent in containing just a first name and a last name - sometimes there are middle initials, some times there are "notes" (e.g. do not use, etc.) in the field. So, what we decided to act on the data as follows:
If there is only one element in the field, we assume it to be a last name and want to return it as "VERIFY, Lastname" (where VERIFY would end up in the first name field on the new system).
If there are two elements to the field, we'll place the first into first name and second into last name.
If there are three elements, we want to check the length of the second....if the length of the 2nd is equal to 1, we'll assume a middle initial (e.g. "J" ), and we'll place element one into the first name, and then element three into the last name (thus leaving the middle initial or element two behind).
If there are more than three elements in the field, or there are three elements and element two has a length greater than 1, then we want to place the first into first name, and all the remaining into the last name field.
I think I am very close, and I can get this to work, if the card holder name is the only field in the input file (since then NF applies to the total of the line as well as the field to be modified, since they are one in the same) - but the problem is, that this field is the forth field in the delimited record - so, I need to be able to count the number of elements/fields within that forth field so as to be able to act on it accordingly as indicated above.
Currently, I am trying to accomplish this via the following (but am open for other suggestions- if easier ways)
#! /bin/awk -f
BEGIN {FS="|"} { while ((getline) > 0)
print $1 "," $2 "," $3 "," chk_data($4) "," $5 "," $6 "," $7 "," $8 "," $9 "," $10 }
function chk_data(input, result,words,n,i)
{
n = split(input, words, " ")
i=NF
if (i > 3) {
for (r = 2; r <= i; r++) {
result = result " " words[r]
}
result = words[1] "," result
return result
}
if (i == 3 && length(words[2]) == 1 ) {
result = words[1] "," words[3]
return result
}
if (i == 3 && length(words[2]) > 1 ) {
for (r = 2; r <= i; r++) {
result = result " " words[r]
}
result = words[1] "," result
return result
}
if (i == 2) {
result = words[1] "," words[2]
return result
}
if (i == 1) {
result = "VERIFY," words[1]
return result
{
}
Examples of each scenario:
File being read in is pipe delimited. Final file needed is comma delimited.
READ FROM FILE:
304|VISACREDIT|40941234567
89012|DOE |9|2000|N|||250.00
Expected Out---> "304,VISACREDIT,4094123456
789012,VER
IFY,DOE,9,
2000,N,,,2
50.00
304|VISACREDIT|40941234567
89012|JOHN
DOE |9|2000|N|||250.00
Expected Out---> "304,VISACREDIT,4094123456
789012,JOH
N,DOE,9,20
00,N,,,250
.00
304|VISACREDIT|40941234567
89012|JOHN
X DOE |9|2000|N|||250.00
Expected Out---> "304,VISACREDIT,4094123456
789012,JOH
N,DOE,9,20
00,N,,,250
.00
304|VISACREDIT|40941234567
89012|JOHN
EDWARD DOE |9|2000|N|||250.00
Expected Out---> "304,VISACREDIT,4094123456
789012,JOH
N,EDWARD DOE,9,2000,N,,,250.00
304|VISACREDIT|40941234567
89012|DO NOT USE THIS CARD |9|2000|N|||250.00
Expected Out---> "304,VISACREDIT,4094123456
789012,DO,
NOT USE THIS CARD,9,2000,N,,,250.00
The problem I am having getting to these results is that NF even after calling the function (and passing it $4) is still referencing the total # of elements/fields in the entire line - not just that of field #4. Within the function I need to be able to know the count of elements in $4 that was passed in.
Hopefully this makes sense.... =)
Any and all help is appreciated....
Start Free Trial