Link to home
Start Free TrialLog in
Avatar of tel2
tel2Flag for New Zealand

asked on

Converting seconds to hh:mm:ss

I'm trying to convert raw phone call data collected from a PABX, into a format suitable for loading into a database.  As part of this, I need to be able to convert a 4 digit field which represents the call duration, in 6 second periods, into hours:minutes:seconds.  Eg: "0011" should be converted to "0:1:6" (or alternatively "00:01:06", ie: 11 x 6 = 66, which = 0 hours, 1 minutes & 6 seconds).

This field is the 3rd "|" delimited field in a record, and there may be thousands of records (lines) in the input file.  For every 3rd field of each record, the conversion should be made.
 
Here's a sample input file:
  01/01/1999|01:20|0011|12|678...
  01/01/1999|03:20|0621||4...
  01/01/1999|05:41|0001|1|77...
 
And here's the expected output file:
  01/01/1999|01:20|0:1:6|12|678...
  01/01/1999|03:20|1:2:6||4...
  01/01/1999|05:41|0:0:6|1|77...
 
Your mission, is to write a script (preferably using Perl, or some standard (fast) UNIX utility (preferably not C) which will do this very quickly (ie: NOT with a shell loop)), which takes the 1st file as standard input, and writes the 2nd as standard output.  You should preferably take the field as being between the 2nd and 3rd "|" delimiters, or if you can't do that, take bytes 18-21.

For extra points, subtract this newly generated call duration from the call end date/time (fields 1 and 2), to generate call start date/time fields, which should replace the original fields 1 and 2.

Thanks.  Any questions?
ASKER CERTIFIED SOLUTION
Avatar of geotiger
geotiger

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of tel2

ASKER

2500 records in 6 seconds on my machine!  Almost as fast as your reply.
Well done geotiger!  Keep up the good work.

Any ideas how the 2nd part could be done?  Is awk capable of doing such calculations?  Can be quite complex when going back over the midnight barrier.
Avatar of geotiger
geotiger

Instead of using awk, the nawk is used. It is more powerful than awk. I had built an application with 10k lines of codes to create Oracle tables, load data into Oracle tables, and build a web interface. What else you want to do with nawk?

If you do not cross the year, the following code should do what you have asked for. If you do have cases of crossing years, please let me know.

How many points the following codes worth? :-)

GeoTiger


#!/bin/nawk -f
#
BEGIN { FS="|"; OFS="|";
}
{ str = $1 "|" $2
  tot = $3 * 6 ;
  hr = int(tot / 3600); rst = tot - hr * 3600;
  mm = int (rst / 60);  
  ss = rst - mm * 60;
  str = str "|" hr ":" mm ":" ss;
  str = str "|" getCallTime($1,$2,tot);
  for (i=4; i<=NF; i++) { str = str "|" $i }
  print str;
}
END {}
function getCallTime(dt,tm,gap,   ymd,yr,a,b,t,r) {
    # Input variables:
    #   dt  - in the format of MM/DD/YYYY
    #   tm  - in the format of hh:mm
    #   gap - in seconds
    #
    a[0] = split(dt, a, "/")
    b[0] = split(tm, b, ":")
    ymd  = sprintf("%04d%02d%02d.%02d%02d%02d", a[3], a[1], a[2],
           b[1], b[2], 0)
    yr = int(ymd)                # get year
    t  = getSecondsInYear(ymd)   # year got dropped
    r  = t - gap;
    return yr cvtSecondsToTime(r)
}
# <a name="cvtSecondsToTime"></a>
function cvtSecondsToTime(sec,   rst,Y,M,D,hh,mm,ss,tm,ds,f,a,r) {
    # Input variables:
    #   tm - time in the format of YYYYMMDD.hhmmss
    # Local variables:
    #   f  - array of time factors
    #   a  - time array for tm
    #   sec - total seconds
    # Global variables used: None
    # Output global variables: None
    # Calls-To:
    #   decompTime("YYYYMMDD.hhmmss")
    #   getDaysInYear(tm)
    # Return: seconds
    #  
    # set constant variables
    f["ms"]  = 60          # 1  minute =         60 seconds
    f["hs"]  = 3600        # 1  hour   =       3600 seconds
    f["ds"]  = 86400       # 1  day    =      86400 seconds
    f["ys"]  = 31536000    # 1  year   =   31536000 seconds
    Y = int(sec / f["ys"]);  rst = sec -  Y * f["ys"]
    ds = int(rst / f["ds"]); rst = rst - ds * f["ds"]
    f["ymd"] = getNewDate(sprintf("%04d0000", Y), ds, 10)
    hh = int(rst / f["hs"]); rst = rst - hh * f["hs"]
    mm = int(rst / f["ms"]); rst = rst - hh * f["ms"]
    ss = rst
    if (Y==0) {
        r = substr(ymd, 5,4) "." sprintf("%02d%02d%02d", hh, mm, ss)
    } else {
        r = f["ymd"] "." sprintf("%02d%02d%02d", hh, mm, ss)
    }
    return r
}
# <a name="getSecondsInYear"></a>
function getSecondsInYear(tm,  f,a,sec) {
    # Input variables:
    #   tm - time in the format of YYYYMMDD.hhmmss
    # Local variables:
    #   f  - array of time factors
    #   a  - time array for tm
    #   sec - total seconds
    # Global variables used: None
    # Output global variables: None
    # Calls-To:
    #   decompTime("YYYYMMDD.hhmmss")
    #   getDaysInYear(tm)
    # Return: seconds
    #  
    # set constant variables
    f["ms"]  = 60          # 1  minute =         60 seconds
    f["hs"]  = 3600        # 1  hour   =       3600 seconds
    f["ds"]  = 86400       # 1  day    =      86400 seconds
    f["ys"]  = 31536000    # 1  year   =   31536000 seconds
    # ymd is in format of YYYYMMDD or YYYYMMDD.hhmmss
    tm=tm ""; split(decompTime(tm), a, ":")
    sec = getDaysInYear(tm) * f["ds"]
    sec = sec + a[4] * f["hs"] + a[5] * f["ms"] + a[6]
    return sec
}
# <a name="decompTime"></a>
function decompTime(tm,  a) {
    # Input variable:
    #   tm - time in the format of YYYYMMDD.hhmmss or YYYYMMDD
    # Local variable:
    #   a - a string variable
    # Global variables used: None
    # Output global variables: None
    # Calls-To: None
    # Return:
    #   a  - the string of YYYY:MM:DD:hh:mm:ss
    #
    a = substr(tm, 1, 4)        # current year:  0 ~ N
    a = a ":" substr(tm, 5, 2)  # current month: 1~12
    a = a ":" substr(tm, 7, 2)  # current date:  1~31
    a = a ":" substr(tm, 10, 2) # current hour:  0~23
    a = a ":" substr(tm, 12, 2) # current minute: 0~59
    a = a ":" substr(tm, 14, 2) # current second: 0~59
    return a
}
# <a name="getDaysInYear"></a>
function getDaysInYear(ymd,  year,mon,days,mds,i) {
    # Input variables:
    #   ymd - input date in format of YYYYMMDD or YYYYMMDD.hhmmss
    # Local variables:
    #   year - a string of YYYY
    #   mon  - a string of MM
    #   days - a string of DD
    #   mds  - array of days in months
    #   i    - loop index
    # Global variables used: None
    # Output global variables: None
    # Calls-to: None
    # Return: days
    #
    year = substr(ymd, 1, 4) + 0  # current year:  0 ~ N
    mon  = substr(ymd, 5, 2) + 0  # current month: 1~12
    days = substr(ymd, 7, 2) + 0  # current date:  1~31
    if (year%4==0)
        split("31 29 31 30 31 30 31 31 30 31 30 31", mds, " ")
    else
        split("31 28 31 30 31 30 31 31 30 31 30 31", mds, " ")
    for (i=1; i<=(mon - 1); ++i)  
        days += mds[i]
    return days
}

# <a name="obtain_ymd"></a>
function getNewDate(ymd,gap,it,  cy,a,b,Y,M,D,i,n,r,z) {
    # Input variables:
    #   ymd - date in the format of YY/MM/DD or MM/DD/YY or DD/MM/YY
    #   gap - number of days from ymd
    #   it  - input date format: default is YY/MM/DD or YY MM DD
    #         1=MM/DD/YY 2=DD/MM/YY 10=YYYYMMDD
    # Local variables:
    #   cy  - Y2K cutoff year
    #   a,b - date and time array
    #   Y   - year in YYYY
    #   M   - month in MM
    #   D   - date in DD
    #   n   - total days in the year ( ymd + gap)
    #   i   - loop index
    #   z   - days in each month
    # Global varialbes used: None
    # Output global variables: None
    # Return: date in the format of YYYY/MM/DD
    #
    cy = 31
    split(ymd, a, " "); split(a[1], b, "/")
    if (it==1) {          # in_type is MM/DD/YY
        M = b[1];  D = b[2];  Y = b[3]
    } else if (it==2) {   # in_type is DD/MM/YY
        D = b[1];  M = b[2];  Y = b[3]
    } else if (it==10) {  # in_type is YYYYMMDD.hhmmss
        Y = substr(ymd, 1, 4)+0
        M = substr(ymd, 5, 2)+0
        D = substr(ymd, 7, 2)+0
    } else {
        Y = b[1]; M = b[2];  D = b[3]
    }
    if (Y>cy && Y<100 ) Y += 1900
    else if (Y<=cy)     Y += 2000
    n = getDaysInYear(sprintf("%04d%02d%02d", Y, M, D)) + gap + 0
    while (n < 0) {
        if (Y%4==0) { n += 366; --Y }
        else {        n += 365; --Y }
    }
    while (n > 365) {
        if (Y%4 == 0) { n -= 366; ++Y }
        else {          n -= 365; ++Y }
    }
    z[0] = split("31 28 31 30 31 30 31 31 30 31 30 31", z, " ")
    if( Y%4 == 0 ) { z[2]=29 }
    else z[2]=28
    for (i=1; i<=12; ++i) {
        if (n<=z[i]) {M=i; D=n; break}
        n -= z[i]
    }
    if (it==10) {
        return sprintf("%04d%02d%02d", Y, M, D)
    } else {
        return sprintf("%04d/%02d/%02d", Y, M, D)
    }
}


The output is

$ ./tst03 tst03.txt
  01/01/1999|01:20|0:1:6|19990101.01181074|12|678...
  01/01/1999|03:20|1:2:6|19990101.0217954||4...
  01/01/1999|05:41|0:0:6|19990101.05402154|1|77...
Avatar of tel2

ASKER

geotiger,

WOW!  That's an amazing amount of work you've done for me, and I appreciate it.  I wasn't expecting such a prompt and lengthy answer!
I'm sorry I took so long to get back to you on this.

I still haven't used your first answer yet (though I have tested it), and I may never get the chance to, but I'll definitely keep it in case I need it.
The 2nd answer is less likely to be used, but it would also be good to have.

Here are my comments:
1. I don't recognise your date/time format.  Looks like: YYYYMMDD.HHMM???? or something.  What is it exactly?
2. In the first example, I would have expected 1:20:00 - 0:01:06 = 1:18:54, but your answer is ...01181074.  Please explain.
3. The answer should be in "DD/MM/YYYY|HH:MM:SS" format, because it is meant to "replace the original fields 1 & 2".  In the above example, the output should be:
  01/01/1999|01:18:54|0:1:6|12|678...
4. Yes, it would need to handle year boundaries.

If you're still interested, and you can do all the above, I'll give you 100 points.  Otherwise, I'll give you 30 points for what you've done so far.  
Which do you prefer?

I know it's not many points for all that work, but it was meant to be just an option add-on extra.

Anyway, keep up the good work, and it's good to have you on the EE team!
Ok, you asked for it. :-)

1. I had a typo in the previous program, that is why you got more digits in the date output. The correct format should be YYYYMMDD.hhmmss.

2. The answer to 1 should correct this one.

3. Running the program as ./tst03.awk your_data_file, you should get the format you requested.

4. It handles leap years and year boundaries. See the last record I added. It gets the correct date (2/29) and time.


$ more tst03.txt
  01/01/1999|01:20|0011|12|678...
  01/01/1999|03:20|0621||4...
  01/01/1999|05:41|0001|1|77...
  01/01/1999|00:41|0015|1|77...
  03/01/1996|00:01|0135|1|77...
$ more ./tst03.awk
#!/bin/nawk -f
# File name tst03.awk
# Purpose: 1. convert seconds into hh:mm:ss format
#          2. get dt_tm from dt_tm minus gap in seconds
#
#
BEGIN { FS="|"; OFS="|";
}
{ str = $1 "|" $2   # $1 - year in MM/DD/YYYY
                    # $2 - call end time in hh:mm
  tot = $3 * 6 ;    # $3 - count of each 6 second
  tm1 = cvtSec2Tm(tot)
  tm2 = getCallTime($1,$2,tot)
  str = str "|" $3 "|" tot
  str = str "|" tm1
  str = str "|" tm2
  tel = cvtTimeFormat(tm2) "|" tm1
  for (i=4; i<=NF; i++) {
      str = str "|" $i; tel = tel "|" $i
  }
  if (mytest==1) { print str} else { print tel }
}
END {}
function getCallTime(dt,tm,gap,   ymd,yr,a,b,t,r) {
    # Input variables:
    #   dt  - in the format of MM/DD/YYYY
    #   tm  - in the format of hh:mm
    #   gap - in seconds
    #
    a[0] = split(dt, a, "/")
    b[0] = split(tm, b, ":")
    ymd  = sprintf("%04d%02d%02d.%02d%02d%02d", a[3], a[1], a[2],
           b[1], b[2], 0)
    yr = substr(ymd,1,4)                # get year
    t  = getSecondsInYear(ymd)   # year got dropped
    r  = t - gap;
    if (r < 0) {                 # not enough seconds in the year
        if (isLeapYear(yr)) {
            t = t + 366 * 86400  # 1 day = 86400 seconds
        } else {
            t = t + 365 * 86400
        }
        --yr;                    # reduce one year
        r = t - gap
    }
    return yr cvtSecondsToTime(r)
}
function cvtTimeFormat(tm, Y, M, D, hh, mm, ss) {
    # convert YYYYMMDD.hhmmss to DD/MM/YYYY|hh:mm:ss
    Y  = substr(tm, 1, 4)
    M  = substr(tm, 5, 2)
    D  = substr(tm, 7, 2)
    hh = substr(tm, 10,2)
    mm = substr(tm, 12,2)
    ss = substr(tm, 14,2)
    return sprintf("%02d/%02d/%04d|%02d:%02d:%02d",D,M,Y,hh,mm,ss)
}
function cvtSec2Tm(sec, hh, mm, ss, rst) {
  hr = int(sec / 3600); rst = sec - hr * 3600;
  mm = int (rst / 60);  
  ss = rst - mm * 60;
  return hr ":" mm ":" ss
}
# <a name="cvtSecondsToTime"></a>
function cvtSecondsToTime(sec,   rst,Y,M,D,hh,mm,ss,tm,ds,f,a,r) {
    # Input variables:
    #   tm - time in the format of YYYYMMDD.hhmmss
    # Local variables:
    #   f  - array of time factors
    #   a  - time array for tm
    #   sec - total seconds
    # Global variables used: None
    # Output global variables: None
    # Calls-To:
    #   decompTime("YYYYMMDD.hhmmss")
    #   getDaysInYear(tm)
    # Return: seconds
    #  
    # set constant variables
    f["ms"]  = 60          # 1  minute =         60 seconds
    f["hs"]  = 3600        # 1  hour   =       3600 seconds
    f["ds"]  = 86400       # 1  day    =      86400 seconds
    f["ys"]  = 31536000    # 1  year   =   31536000 seconds
    Y = int(sec / f["ys"]);  rst = sec -  Y * f["ys"]
    ds = int(rst / f["ds"]); rst = rst - ds * f["ds"]
    f["ymd"] = getNewDate(sprintf("%04d0000", Y), ds, 10)
    hh = int(rst / f["hs"]); rst = rst - hh * f["hs"]
    mm = int(rst / f["ms"]); rst = rst - mm * f["ms"]
    ss = rst
    if (Y==0) {
        r = substr(f["ymd"], 5,4) "." sprintf("%02d%02d%02d",hh,mm,ss)
    } else {
        r = f["ymd"] "." sprintf("%02d%02d%02d", hh, mm, ss)
    }
    return r
}
# <a name="getSecondsInYear"></a>
function getSecondsInYear(tm,  f,a,sec) {
    # Input variables:
    #   tm - time in the format of YYYYMMDD.hhmmss
    # Local variables:
    #   f  - array of time factors
    #   a  - time array for tm
    #   sec - total seconds
    # Global variables used: None
    # Output global variables: None
    # Calls-To:
    #   decompTime("YYYYMMDD.hhmmss")
    #   getDaysInYear(tm)
    # Return: seconds
    #  
    # set constant variables
    f["ms"]  = 60          # 1  minute =         60 seconds
    f["hs"]  = 3600        # 1  hour   =       3600 seconds
    f["ds"]  = 86400       # 1  day    =      86400 seconds
    f["ys"]  = 31536000    # 1  year   =   31536000 seconds
    # ymd is in format of YYYYMMDD or YYYYMMDD.hhmmss
    tm=tm ""; split(decompTime(tm), a, ":")
    sec = getDaysInYear(tm) * f["ds"]
    sec = sec + a[4] * f["hs"] + a[5] * f["ms"] + a[6]
    return sec
}
# <a name="decompTime"></a>
function decompTime(tm,  a) {
    # Input variable:
    #   tm - time in the format of YYYYMMDD.hhmmss or YYYYMMDD
    # Local variable:
    #   a - a string variable
    # Global variables used: None
    # Output global variables: None
    # Calls-To: None
    # Return:
    #   a  - the string of YYYY:MM:DD:hh:mm:ss
    #
    a = substr(tm, 1, 4)        # current year:  0 ~ N
    a = a ":" substr(tm, 5, 2)  # current month: 1~12
    a = a ":" substr(tm, 7, 2)  # current date:  1~31
    a = a ":" substr(tm, 10, 2) # current hour:  0~23
    a = a ":" substr(tm, 12, 2) # current minute: 0~59
    a = a ":" substr(tm, 14, 2) # current second: 0~59
    return a
}
# <a name="getDaysInYear"></a>
function getDaysInYear(ymd,  year,mon,days,mds,i) {
    # Input variables:
    #   ymd - input date in format of YYYYMMDD or YYYYMMDD.hhmmss
    # Local variables:
    #   year - a string of YYYY
    #   mon  - a string of MM
    #   days - a string of DD
    #   mds  - array of days in months
    #   i    - loop index
    # Global variables used: None
    # Output global variables: None
    # Calls-to: None
    # Return: days
    #
    year = substr(ymd, 1, 4) + 0  # current year:  0 ~ N
    mon  = substr(ymd, 5, 2) + 0  # current month: 1~12
    days = substr(ymd, 7, 2) + 0  # current date:  1~31
    if (year%4==0)
        split("31 29 31 30 31 30 31 31 30 31 30 31", mds, " ")
    else
        split("31 28 31 30 31 30 31 31 30 31 30 31", mds, " ")
    for (i=1; i<=(mon - 1); ++i)  
        days += mds[i]
    return days
}

# <a name="obtain_ymd"></a>
function getNewDate(ymd,gap,it,  cy,a,b,Y,M,D,i,n,r,z) {
    # Input variables:
    #   ymd - date in the format of YY/MM/DD or MM/DD/YY or DD/MM/YY
    #   gap - number of days from ymd
    #   it  - input date format: default is YY/MM/DD or YY MM DD
    #         1=MM/DD/YY 2=DD/MM/YY 10=YYYYMMDD
    # Local variables:
    #   cy  - Y2K cutoff year
    #   a,b - date and time array
    #   Y   - year in YYYY
    #   M   - month in MM
    #   D   - date in DD
    #   n   - total days in the year ( ymd + gap)
    #   i   - loop index
    #   z   - days in each month
    # Global varialbes used: None
    # Output global variables: None
    # Return: date in the format of YYYY/MM/DD
    #
    cy = 31
    split(ymd, a, " "); split(a[1], b, "/")
    if (it==1) {          # in_type is MM/DD/YY
        M = b[1];  D = b[2];  Y = b[3]
    } else if (it==2) {   # in_type is DD/MM/YY
        D = b[1];  M = b[2];  Y = b[3]
    } else if (it==10) {  # in_type is YYYYMMDD.hhmmss
        Y = substr(ymd, 1, 4)+0
        M = substr(ymd, 5, 2)+0
        D = substr(ymd, 7, 2)+0
    } else {
        Y = b[1]; M = b[2];  D = b[3]
    }
    if (Y>cy && Y<100 ) Y += 1900
    else if (Y<=cy)     Y += 2000
    n = getDaysInYear(sprintf("%04d%02d%02d", Y, M, D)) + gap + 0
    while (n < 0) {
        if (Y%4==0) { n += 366; --Y }
        else {        n += 365; --Y }
    }
    while (n > 365) {
        if (Y%4 == 0) { n -= 366; ++Y }
        else {          n -= 365; ++Y }
    }
    z[0] = split("31 28 31 30 31 30 31 31 30 31 30 31", z, " ")
    if( Y%4 == 0 ) { z[2]=29 }
    else z[2]=28
    for (i=1; i<=12; ++i) {
        if (n<=z[i]) {M=i; D=n; break}
        n -= z[i]
    }
    if (it==10) {
        return sprintf("%04d%02d%02d", Y, M, D)
    } else {
        return sprintf("%04d/%02d/%02d", Y, M, D)
    }
}
# <a name="isLeapYear"></a>
function isLeapYear(year) {
    # Input variables:
    #   year - year in the format of YYYY
    # Local variables: None
    # Global variables used: None
    # Output global variables: None
    # Calls-To: None
    # Return: 1 or 0  
    #
    if ((year % 400) == 0)      return 1
    else if ((year % 100) == 0) return 1
    else if ((year % 4) == 0)   return 1
    else                        return 0
}




$ ./tst03.awk tst03.txt
01/01/1999|01:18:54|0:1:6|12|678...
01/01/1999|02:17:54|1:2:6||4...
01/01/1999|05:40:54|0:0:6|1|77...
01/01/1999|00:39:30|0:1:30|1|77...
29/02/1996|23:47:30|0:13:30|1|77...
$ ./tst03.awk mytest=1 tst03.txt
  01/01/1999|01:20|0011|66|0:1:6|19990101.011854|12|678...
  01/01/1999|03:20|0621|3726|1:2:6|19990101.021754||4...
  01/01/1999|05:41|0001|6|0:0:6|19990101.054054|1|77...
  01/01/1999|00:41|0015|90|0:1:30|19990101.003930|1|77...
  03/01/1996|00:01|0135|810|0:13:30|19960229.234730|1|77...
$
Avatar of tel2

ASKER

Amazing response geotiger!  Looks really good on the test data I provided, but I don't understand why it's not working on some real data of mine (see below).  Note that the 2 spaces at the start of my previous test data were just to indent the records for "clarity", but I don't think this is the issue, since it still failed when I inserted spaces before my real data.  Real data also has digits etc in place of the "..." at the end of each line.  Here it is:

# cat real1
21/11/1999|17:28|0014||042382258|43363||
  21/11/1999|17:57|0001||078967783|43363||
  21/11/1999|17:58|0003||078966767|43363||

# ./tst03.awk real1
01/20/1999|11:01:17|0:1:24||042382258|43363||
01/20/1999|11:01:17|0:0:6||078967783|43363||
01/20/1999|11:01:17|0:0:18||078966767|43363||

Eg: 21/11/1999 17:28:00 - 00:01:24 = 21/11/1999 17:26:36 (not 11/20/1999 11:01:17).  Note that both the date and time are incorrect.


Any ideas, geotiger?
Avatar of tel2

ASKER

geotiger,

In my previous message, lines 2-3 of the data under "# cat real1" have spaces in front of them.  Please ignore these spaces.  Somehow I must have pasted them wrongly.  None of my data actually starts with spaces.

Thanks.
Sorry, I thought your input date in the first column was MM/DD/YYYY. It is very easy to fix, just use the following line to replace the corresponding line in getCallTime function:

ymd  = sprintf("%04d%02d%02d.%02d%02d%02d", a[3], a[2], a[1], b[1], b[2], 0)

You should not worry about the leading spaces. After splitting the field, the program treats the split fields as numerical. Try the about change and let me know.
   

Avatar of tel2

ASKER

geotiger,

I made that change, and here's some sample input:

21/11/1999|17:28|0014||042382258|43363||
21/11/1999|17:57|0001||078967783|43363||
01/01/1999|00:00|0006|NJ|98274|43363||

which results in this output:

20/11/1999|17:26:36|0:1:24||042382258|43363||
20/11/1999|17:56:54|0:0:6||078967783|43363||
00/01/1999|23:59:24|0:0:36|NJ|98274|43363||

As you can see, the 1st 2 dates are 1 day early, and the 3rd is an invalid date.  I look forward to hearing from you.  Thanks.
Change one place is not enough. Use the program I posted in this message. Look the example, the dates are coming out correctly.

$ more tst03.awk
#!/bin/nawk -f
# File name tst03.awk
# Purpose: 1. convert seconds into hh:mm:ss format
#          2. get dt_tm from dt_tm minus gap in seconds
#
#
BEGIN { FS="|"; OFS="|";
}
{ str = $1 "|" $2   # $1 - year in MM/DD/YYYY
                    # $2 - call end time in hh:mm
  tot = $3 * 6 ;    # $3 - count of each 6 second
  tm1 = cvtSec2Tm(tot)
  tm2 = getCallTime($1,$2,tot)
  str = str "|" $3 "|" tot
  str = str "|" tm1
  str = str "|" tm2
  tel = cvtTimeFormat(tm2) "|" tm1
  for (i=4; i<=NF; i++) {
      str = str "|" $i; tel = tel "|" $i
  }
  if (mytest==1) { print str} else { print tel }
}
END {}
function getCallTime(dt,tm,gap,   ymd,yr,a,b,t,r) {
    # Input variables:
    #   dt  - in the format of MM/DD/YYYY
    #   tm  - in the format of hh:mm
    #   gap - in seconds
    #
    a[0] = split(dt, a, "/")
    b[0] = split(tm, b, ":")
    ymd  = sprintf("%04d%02d%02d.%02d%02d%02d", a[3], a[2], a[1],
           b[1], b[2], 0)
    yr = substr(ymd,1,4)                # get year
    t  = getSecondsInYear(ymd)   # year got dropped
    # print ymd ":" t ":" yr ":" gap
    r  = t - gap;
    if (r < 0) {                 # not enough seconds in the year
        if (isLeapYear(yr)) {
            t = t + 366 * 86400  # 1 day = 86400 seconds
        } else {
            t = t + 365 * 86400
        }
        --yr;                    # reduce one year
        r = t - gap
    }
    # print t " - " gap " = " r
    return yr cvtSecondsToTime(yr, r)
}
function cvtTimeFormat(tm, Y, M, D, hh, mm, ss) {
    # convert YYYYMMDD.hhmmss to DD/MM/YYYY|hh:mm:ss
    Y  = substr(tm, 1, 4)
    M  = substr(tm, 5, 2)
    D  = substr(tm, 7, 2)
    hh = substr(tm, 10,2)
    mm = substr(tm, 12,2)
    ss = substr(tm, 14,2)
    return sprintf("%02d/%02d/%04d|%02d:%02d:%02d",D,M,Y,hh,mm,ss)
}
function cvtSec2Tm(sec, hh, mm, ss, rst) {
  hr = int(sec / 3600); rst = sec - hr * 3600;
  mm = int (rst / 60);  
  ss = rst - mm * 60;
  return hr ":" mm ":" ss
}
# <a name="cvtSecondsToTime"></a>
function cvtSecondsToTime(yr,sec,   rst,Y,M,D,hh,mm,ss,tm,ds,f,a,r) {
    # Input variables:
    #   tm - time in the format of YYYYMMDD.hhmmss
    # Local variables:
    #   f  - array of time factors
    #   a  - time array for tm
    #   sec - total seconds
    # Global variables used: None
    # Output global variables: None
    # Calls-To:
    #   decompTime("YYYYMMDD.hhmmss")
    #   getDaysInYear(tm)
    # Return: seconds
    #  
    # set constant variables
    f["ms"]  = 60          # 1  minute =         60 seconds
    f["hs"]  = 3600        # 1  hour   =       3600 seconds
    f["ds"]  = 86400       # 1  day    =      86400 seconds
    f["ys"]  = 31536000    # 1  year   =   31536000 seconds
    Y = int(sec / f["ys"]);  rst = sec -  Y * f["ys"]
    ds = int(rst / f["ds"]); rst = rst - ds * f["ds"]
    f["ymd"] = getNewDate(sprintf("%04d0101", yr), ds, 10)
    hh = int(rst / f["hs"]); rst = rst - hh * f["hs"]
    mm = int(rst / f["ms"]); rst = rst - mm * f["ms"]
    ss = rst
    if (Y==0) {
        r = substr(f["ymd"], 5,4) "." sprintf("%02d%02d%02d",hh,mm,ss)
    } else {
        r = f["ymd"] "." sprintf("%02d%02d%02d", hh, mm, ss)
    }
    return r
}
# <a name="getSecondsInYear"></a>
function getSecondsInYear(tm,  f,a,sec,t) {
    # Input variables:
    #   tm - time in the format of YYYYMMDD.hhmmss
    # Local variables:
    #   f  - array of time factors
    #   a  - time array for tm
    #   sec - total seconds
    # Global variables used: None
    # Output global variables: None
    # Calls-To:
    #   decompTime("YYYYMMDD.hhmmss")
    #   getDaysInYear(tm)
    # Return: seconds
    #  
    # set constant variables
    f["ms"]  = 60          # 1  minute =         60 seconds
    f["hs"]  = 3600        # 1  hour   =       3600 seconds
    f["ds"]  = 86400       # 1  day    =      86400 seconds
    f["ys"]  = 31536000    # 1  year   =   31536000 seconds
    # ymd is in format of YYYYMMDD or YYYYMMDD.hhmmss
    tm=tm ""; split(decompTime(tm), a, ":")
    t = getDaysInYear(tm)
    if (t>0) { sec = (t-1) * f["ds"] }   # so we get accurate seconds
    sec = sec + a[4] * f["hs"] + a[5] * f["ms"] + a[6]
    return sec
}
# <a name="decompTime"></a>
function decompTime(tm,  a) {
    # Input variable:
    #   tm - time in the format of YYYYMMDD.hhmmss or YYYYMMDD
    # Local variable:
    #   a - a string variable
    # Global variables used: None
    # Output global variables: None
    # Calls-To: None
    # Return:
    #   a  - the string of YYYY:MM:DD:hh:mm:ss
    #
    a = substr(tm, 1, 4)        # current year:  0 ~ N
    a = a ":" substr(tm, 5, 2)  # current month: 1~12
    a = a ":" substr(tm, 7, 2)  # current date:  1~31
    a = a ":" substr(tm, 10, 2) # current hour:  0~23
    a = a ":" substr(tm, 12, 2) # current minute: 0~59
    a = a ":" substr(tm, 14, 2) # current second: 0~59
    return a
}
# <a name="getDaysInYear"></a>
function getDaysInYear(ymd,  year,mon,days,mds,i) {
    # Input variables:
    #   ymd - input date in format of YYYYMMDD or YYYYMMDD.hhmmss
    # Local variables:
    #   year - a string of YYYY
    #   mon  - a string of MM
    #   days - a string of DD
    #   mds  - array of days in months
    #   i    - loop index
    # Global variables used: None
    # Output global variables: None
    # Calls-to: None
    # Return: days
    #
    year = substr(ymd, 1, 4) + 0  # current year:  0 ~ N
    mon  = substr(ymd, 5, 2) + 0  # current month: 1~12
    days = substr(ymd, 7, 2) + 0  # current date:  1~31
    if (year%4==0)
        split("31 29 31 30 31 30 31 31 30 31 30 31", mds, " ")
    else
        split("31 28 31 30 31 30 31 31 30 31 30 31", mds, " ")
    for (i=1; i<=(mon - 1); ++i)  
        days += mds[i]
    return days
}

# <a name="obtain_ymd"></a>
function getNewDate(ymd,gap,it,  cy,a,b,Y,M,D,i,n,r,z) {
    # Input variables:
    #   ymd - date in the format of YY/MM/DD or MM/DD/YY or DD/MM/YY
    #   gap - number of days from ymd
    #   it  - input date format: default is YY/MM/DD or YY MM DD
    #         1=MM/DD/YY 2=DD/MM/YY 10=YYYYMMDD
    # Local variables:
    #   cy  - Y2K cutoff year
    #   a,b - date and time array
    #   Y   - year in YYYY
    #   M   - month in MM
    #   D   - date in DD
    #   n   - total days in the year ( ymd + gap)
    #   i   - loop index
    #   z   - days in each month
    # Global varialbes used: None
    # Output global variables: None
    # Return: date in the format of YYYY/MM/DD
    #
    cy = 31
    split(ymd, a, " "); split(a[1], b, "/")
    if (it==1) {          # in_type is MM/DD/YY
        M = b[1];  D = b[2];  Y = b[3]
    } else if (it==2) {   # in_type is DD/MM/YY
        D = b[1];  M = b[2];  Y = b[3]
    } else if (it==10) {  # in_type is YYYYMMDD.hhmmss
        Y = substr(ymd, 1, 4)+0
        M = substr(ymd, 5, 2)+0
        D = substr(ymd, 7, 2)+0
    } else {
        Y = b[1]; M = b[2];  D = b[3]
    }
    if (Y>cy && Y<100 ) Y += 1900
    else if (Y<=cy)     Y += 2000
    # if (M==0) { M=1 }
    # if (D==0) { D=1 }
    n = getDaysInYear(sprintf("%04d%02d%02d", Y, M, D)) + gap + 0
    while (n < 0) {
        if (Y%4==0) { n += 366; --Y }
        else {        n += 365; --Y }
    }
    while (n > 365) {
        if (Y%4 == 0) { n -= 366; ++Y }
        else {          n -= 365; ++Y }
    }
    z[0] = split("31 28 31 30 31 30 31 31 30 31 30 31", z, " ")
    if( Y%4 == 0 ) { z[2]=29 }
    else z[2]=28
    for (i=1; i<=12; ++i) {
        if (n<=z[i]) {M=i; D=n; break}
        n -= z[i]
    }
    if (it==10) {
        return sprintf("%04d%02d%02d", Y, M, D)
    } else {
        return sprintf("%04d/%02d/%02d", Y, M, D)
    }
}
# <a name="isLeapYear"></a>
function isLeapYear(year) {
    # Input variables:
    #   year - year in the format of YYYY
    # Local variables: None
    # Global variables used: None
    # Output global variables: None
    # Calls-To: None
    # Return: 1 or 0  
    #
    if ((year % 400) == 0)      return 1
    else if ((year % 100) == 0) return 1
    else if ((year % 4) == 0)   return 1
    else                        return 0
}




$ more tst03.txt
  01/01/1999|01:20|0011|12|678...
  01/01/1999|03:20|0621||4...
  01/01/1999|05:41|0001|1|77...
  01/01/1999|00:41|0015|1|77...
  01/01/1999|00:01|1015|1|77...
  01/03/1996|00:01|0135|1|77...
$ ./tst03.awk tst03.txt
01/01/1999|01:18:54|0:1:6|12|678...
01/01/1999|02:17:54|1:2:6||4...
01/01/1999|05:40:54|0:0:6|1|77...
01/01/1999|00:39:30|0:1:30|1|77...
31/12/1998|22:19:30|1:41:30|1|77...
29/02/1996|23:47:30|0:13:30|1|77...

Avatar of tel2

ASKER

Again, apologies for the delay in getting back to you earlier geotiger, and thanks again for your efforts in fixing the last bug!

Unfortunately, after trying a bit more test data, I found another bug.  The following input:

01/01/1999|00:00|0001|1|77...
01/01/2000|00:00|0001|1|77...
01/01/2001|00:00|0001|1|77...

Produces this output:

31/12/1998|23:59:54|0:0:6|1|77...
99/19/1999|10:01:23|0:0:6|1|77...
30/12/2000|23:59:54|0:0:6|1|77...

As you can see, the first record is correct, but the next 2 are wrong (last should be 31/12/2000).  Any ideas?  If you want to give up that's OK, but if you can get it working perfectly, the 100 points are still yours.
This script is now taking 80 secs for 2500 records.  Still not too bad.

I might be asking EE for a Perl solution sometime.  I think Perl can be quite compact and fast.  It will be interesting to compare Perl & awk.  Do you know any Perl?
It is not the issue of languages. It is an issue of date and time calculation. Please post a new thread in "Unix programming area" and I will try to fix it. I think the problem is coming from leap year.
Avatar of tel2

ASKER

I know the language is not the issue.  It's just a logic error.
I'm interested in a Perl solution because I'd be interested to compare code size and performance with what you've written in awk.  I plan to repost the question sometime.  Thanks again.
I do not know whether you are still interested in awk solution. For the past a few days, I re-examine the code and finally got it to work. The debugginh is REALLY time consuming!!! The last three records are added from your message.

Hope the following codes deserve the 100 points.

$ cat tst03.awk
#!/bin/nawk -f
# File name tst03.awk
# Purpose: 1. convert seconds into hh:mm:ss format
#          2. get dt_tm from dt_tm minus gap in seconds
#
#
BEGIN { FS="|"; OFS="|";
}
{ str = $1 "|" $2   # $1 - year in MM/DD/YYYY
                    # $2 - call end time in hh:mm
  tot = $3 * 6 ;    # $3 - count of each 6 second
  tm1 = cvtSec2Tm(tot)
  tm2 = getCallTime($1,$2,tot)
  str = str "|" $3 "|" tot
  str = str "|" tm1
  str = str "|" tm2
  tel = cvtTimeFormat(tm2) "|" tm1
  for (i=4; i<=NF; i++) {
      str = str "|" $i; tel = tel "|" $i
  }
  if (mytest==1) { print str} else { print tel }
}
END {}
function getCallTime(dt,tm,gap,   ymd,yr,a,b,t,r) {
    # Input variables:
    #   dt  - in the format of MM/DD/YYYY
    #   tm  - in the format of hh:mm
    #   gap - in seconds
    #
    a[0] = split(dt, a, "/")
    b[0] = split(tm, b, ":")
    ymd  = sprintf("%04d%02d%02d.%02d%02d%02d", a[3], a[2], a[1],
           b[1], b[2], 0)
    yr = substr(ymd,1,4)                # get year
    t  = getSecondsInYear(ymd)   # year got dropped
    # print "\n" ymd ":" t ":" yr ":" gap
    r  = t - gap;
    if (r < 0) {                 # not enough seconds in the year
        if (isLeapYear(yr-1)) {
            t = t + 366 * 86400  # 1 day = 86400 seconds
        } else {
            t = t + 365 * 86400
        }
        --yr;                    # reduce one year
        r = t - gap
    }
    # print t " - " gap " = " r "   yr=" yr
    return yr cvtSecondsToTime(yr, r)
}
function cvtTimeFormat(tm, Y, M, D, hh, mm, ss) {
    # convert YYYYMMDD.hhmmss to DD/MM/YYYY|hh:mm:ss
    Y  = substr(tm, 1, 4)
    M  = substr(tm, 5, 2)
    D  = substr(tm, 7, 2)
    hh = substr(tm, 10,2)
    mm = substr(tm, 12,2)
    ss = substr(tm, 14,2)
    return sprintf("%02d/%02d/%04d|%02d:%02d:%02d",D,M,Y,hh,mm,ss)
}
function cvtSec2Tm(sec, hh, mm, ss, rst) {
  hr = int(sec / 3600); rst = sec - hr * 3600;
  mm = int (rst / 60);  
  ss = rst - mm * 60;
  return hr ":" mm ":" ss
}
# <a name="cvtSecondsToTime"></a>
function cvtSecondsToTime(yr,sec,   rst,Y,M,D,hh,mm,ss,tm,ds,f,a,r) {
    # Input variables:
    #   tm - time in the format of YYYYMMDD.hhmmss
    # Local variables:
    #   f  - array of time factors
    #   a  - time array for tm
    #   sec - total seconds
    # Global variables used: None
    # Output global variables: None
    # Calls-To:
    #   decompTime("YYYYMMDD.hhmmss")
    #   getDaysInYear(tm)
    # Return: seconds
    #  
    # set constant variables
    f["ms"]  = 60          # 1  minute =         60 seconds
    f["hs"]  = 3600        # 1  hour   =       3600 seconds
    f["ds"]  = 86400       # 1  day    =      86400 seconds
    if (isLeapYear(yr)) {
        f["ys"]  = 31622400    # 1 leap year = 31622400 seconds
    } else {
        f["ys"]  = 31536000    # 1  year   =   31536000 seconds
    }
    Y = int(sec / f["ys"]);  rst = sec -  Y * f["ys"]
    ds = int(rst / f["ds"]); rst = rst - ds * f["ds"]
    f["ymd"] = getNewDate(sprintf("%04d0101", yr), ds, 10)
    # print "ymd=" f["ymd"] "   ds=" ds
    hh = int(rst / f["hs"]); rst = rst - hh * f["hs"]
    mm = int(rst / f["ms"]); rst = rst - mm * f["ms"]
    ss = rst
    if (Y==0) {
        r = substr(f["ymd"], 5,4) "." sprintf("%02d%02d%02d",hh,mm,ss)
    } else {
        r = f["ymd"] "." sprintf("%02d%02d%02d", hh, mm, ss)
    }
    return r
}
# <a name="getSecondsInYear"></a>
function getSecondsInYear(tm,  f,a,sec,t) {
    # Input variables:
    #   tm - time in the format of YYYYMMDD.hhmmss
    # Local variables:
    #   f  - array of time factors
    #   a  - time array for tm
    #   sec - total seconds
    # Global variables used: None
    # Output global variables: None
    # Calls-To:
    #   decompTime("YYYYMMDD.hhmmss")
    #   getDaysInYear(tm)
    # Return: seconds
    #  
    # set constant variables
    f["ms"]  = 60          # 1  minute =         60 seconds
    f["hs"]  = 3600        # 1  hour   =       3600 seconds
    f["ds"]  = 86400       # 1  day    =      86400 seconds
    f["ys"]  = 31536000    # 1  year   =   31536000 seconds
    # ymd is in format of YYYYMMDD or YYYYMMDD.hhmmss
    tm=tm ""; split(decompTime(tm), a, ":")
    t = getDaysInYear(tm)
    if (t>0) { sec = (t-1) * f["ds"] }   # so we get accurate seconds
    sec = sec + a[4] * f["hs"] + a[5] * f["ms"] + a[6]
    return sec
}
# <a name="decompTime"></a>
function decompTime(tm,  a) {
    # Input variable:
    #   tm - time in the format of YYYYMMDD.hhmmss or YYYYMMDD
    # Local variable:
    #   a - a string variable
    # Global variables used: None
    # Output global variables: None
    # Calls-To: None
    # Return:
    #   a  - the string of YYYY:MM:DD:hh:mm:ss
    #
    a = substr(tm, 1, 4)        # current year:  0 ~ N
    a = a ":" substr(tm, 5, 2)  # current month: 1~12
    a = a ":" substr(tm, 7, 2)  # current date:  1~31
    a = a ":" substr(tm, 10, 2) # current hour:  0~23
    a = a ":" substr(tm, 12, 2) # current minute: 0~59
    a = a ":" substr(tm, 14, 2) # current second: 0~59
    return a
}
# <a name="getDaysInYear"></a>
function getDaysInYear(ymd,  year,mon,days,mds,i) {
    # Input variables:
    #   ymd - input date in format of YYYYMMDD or YYYYMMDD.hhmmss
    # Local variables:
    #   year - a string of YYYY
    #   mon  - a string of MM
    #   days - a string of DD
    #   mds  - array of days in months
    #   i    - loop index
    # Global variables used: None
    # Output global variables: None
    # Calls-to: None
    # Return: days
    #
    year = substr(ymd, 1, 4) + 0  # current year:  0 ~ N
    mon  = substr(ymd, 5, 2) + 0  # current month: 1~12
    days = substr(ymd, 7, 2) + 0  # current date:  1~31
    if (year%4==0)
        split("31 29 31 30 31 30 31 31 30 31 30 31", mds, " ")
    else
        split("31 28 31 30 31 30 31 31 30 31 30 31", mds, " ")
    for (i=1; i<=(mon - 1); ++i)  
        days += mds[i]
    return days
}

# <a name="obtain_ymd"></a>
function getNewDate(ymd,gap,it,  cy,a,b,Y,M,D,i,n,r,z,ds) {
    # Input variables:
    #   ymd - date in the format of YY/MM/DD or MM/DD/YY or DD/MM/YY
    #   gap - number of days from ymd
    #   it  - input date format: default is YY/MM/DD or YY MM DD
    #         1=MM/DD/YY 2=DD/MM/YY 10=YYYYMMDD
    # Local variables:
    #   cy  - Y2K cutoff year
    #   a,b - date and time array
    #   Y   - year in YYYY
    #   M   - month in MM
    #   D   - date in DD
    #   n   - total days in the year ( ymd + gap)
    #   i   - loop index
    #   z   - days in each month
    # Global varialbes used: None
    # Output global variables: None
    # Return: date in the format of YYYY/MM/DD
    #
    cy = 31
    split(ymd, a, " "); split(a[1], b, "/")
    if (it==1) {          # in_type is MM/DD/YY
        M = b[1];  D = b[2];  Y = b[3]
    } else if (it==2) {   # in_type is DD/MM/YY
        D = b[1];  M = b[2];  Y = b[3]
    } else if (it==10) {  # in_type is YYYYMMDD.hhmmss
        Y = substr(ymd, 1, 4)+0
        M = substr(ymd, 5, 2)+0
        D = substr(ymd, 7, 2)+0
    } else {
        Y = b[1]; M = b[2];  D = b[3]
    }
    if (Y>cy && Y<100 ) Y += 1900
    else if (Y<=cy)     Y += 2000
    # if (M==0) { M=1 }
    # if (D==0) { D=1 }
    n = getDaysInYear(sprintf("%04d%02d%02d", Y, M, D)) + gap + 0
    # print "n=" n
    while (n < 0) {
        if (isLeapYear(Y)==1) { n += 366; --Y }
        else {        n += 365; --Y }
    }
    if (isLeapYear(Y) == 1) { ds = 366; } else { ds = 365; }
    while (n > ds) {
        if (isLeapYear(Y) == 1) { n -= 366; ++Y }
        else {          n -= 365; ++Y }
    }
    z[0] = split("31 28 31 30 31 30 31 31 30 31 30 31", z, " ")
    if( Y%4 == 0 ) { z[2]=29 }
    else z[2]=28
    for (i=1; i<=12; ++i) {
        if (n<=z[i]) {M=i; D=n; break}
        n -= z[i]
    }
    if (it==10) {
        return sprintf("%04d%02d%02d", Y, M, D)
    } else {
        return sprintf("%04d/%02d/%02d", Y, M, D)
    }
}
# <a name="isLeapYear"></a>
function isLeapYear(year) {
    # Input variables:
    #   year - year in the format of YYYY
    # Local variables: None
    # Global variables used: None
    # Output global variables: None
    # Calls-To: None
    # Return: 1 or 0  
    #
    if ((year % 400) == 0)      return 1
    else if ((year % 100) == 0) return 1
    else if ((year % 4) == 0)   return 1
    else                        return 0
}




$ ./tst03.awk tst03.txt
01/01/1999|01:18:54|0:1:6|12|678...
01/01/1999|02:17:54|1:2:6||4...
01/01/1999|05:40:54|0:0:6|1|77...
01/01/1999|00:39:30|0:1:30|1|77...
31/12/1998|22:19:30|1:41:30|1|77...
29/02/1996|23:47:30|0:13:30|1|77...
31/12/1998|23:59:54|0:0:6|1|77...
31/12/1999|23:59:54|0:0:6|1|77...
31/12/2000|23:59:54|0:0:6|1|77...
$
Tel2, does it work for you and deserve  100 points?
Avatar of tel2

ASKER

geotiger,
Yes it does, and yes it is!
I hadn't forgotten you, geo.  I tested it yesterday, and I was going to reply today.  Your points are in Q_10492298
Good work.