Link to home
Start Free TrialLog in
Avatar of 9thTee
9thTee

asked on

Splitting a long field into multiple shorter fields.

On a Linux machine, I need to split product description fields that are up to 256 characters long, into multiple fields no longer than 76 characters.  The catch is, I need to split it up at a space.  So the multiple fields need to be as long as possible but split at a space and no longer than 76 characters.  I am assuming awk or sed can do this but not sure where to start.

Any help would be appreciated.
Avatar of Bill Prew
Bill Prew

Can you provide a sample data file for testing / clarification?  Is the product description the only thing on each line?

Do you want each "chunk" less than 76 characters output on a separate line?


»bp
Hi Tee,

What do you need to do with the items once they're split?

awk is probably the right tool.  It uses a space for the default separator.  The only question is once split, then what?
Perl is the way to go, identify all space positions.
Then cut the Sata accordingly.
Could probably be optimized slightly, but here's a basic AWK script that seems to get the job done.

BEGIN {
    # set max length of output lines
    maxLen = 76

    # initialize work variables for output line
    outLine = ""
    outLen = 0
}

{
    # loop through all space delimited fields in this input line
    for (i=1; i<=NF; i++) {
        # get lenth of this chunk
        l = length($i)

        # will this exceen max output line length?
        if (outLen + l + 1 > maxLen) {
            # print accumulated output line, and clear it
            print outLine
            outLine = ""
        }

        # if first chunck added to output line, no space seperator added
        if (outLine == "") {
            outLine = $i
            outLen = l
        } else {
            outLine = outLine " " $i
            outLen = outLen + l + 1
        }
    }

    # print any pending output line that was built
    if (outLine != "") {
        print outLine
    }

    # initialize work variables for output line
    outLine = ""
    outLen = 0
}

Open in new window

EDITED: Added comments, and fixed output length calculation.

»bp
Avatar of 9thTee

ASKER

Hi Bill,
This does exactly what I asked.  But I need a small change, for the output, I would like the newly created 76 maximum character fields to all be on one line and pipe "|" delimited.  Is that possible?

abc….76 characters max...xyz|abc….76 characters max...xyz|abc….76 characters max...xyz

Thanks,
Mark
ASKER CERTIFIED SOLUTION
Avatar of Bill Prew
Bill Prew

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of 9thTee

ASKER

Perfect, thanks for your help.
Welcome.


»bp