Can't seem to get my head around my loops

Posted on 2008-11-15
Last Modified: 2012-05-05
In printing, quite often we need to rearrange data so it comes out in a different order than it arrives.  Once it's been printed its set up on cutters and needs to come out in a stream order (it called north south splitting).  I need to write a 4-way north south which is to divide the file into (4) sections, determine any leftovers (in this case, 3 records:  19/4=4.5, round down to 4.  4x4=16 R3); example below.

The data comes in in this order; 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19

It needs to be rearranged so that when the records are printed and cut into 4 "ribbons" they are presented in this way:

1   6   11   16        Print direction
2   7   12   17                |
3   8   13   18                |
4   9   14   19                |
5  10  15                      V

So, the natural order of the file needs to be rebuilt as as an output file in this order; 1,6,11,16,2,7,12,17,3,8,13,18.........

Does python have a reasonably efficient way of doing this outside of seeking to file points inside loops and counters?  These files are quite large with record lengths in the hundreds, and record counts in the millions.

Question by:TommyMac501

    Expert Comment

    It looks like a matrix calculation.
    Try Numeric Python?
    LVL 39

    Expert Comment

    by:Roger Baklund
    You mention seeking, so I assume you have  a fixed record length. You can open the same file multiple times, and position each file handle at the right spot, then read from each filehandle in a loop. An example is provided below.
    class splitreader:
        def __init__(self,filename,reclen,splitcount):
            self.filename = filename
            self.reclen = reclen
            self.splitcount = splitcount
            self.fh = []
        def open(self):
            for i in range(self.splitcount):
            self.filesize = self.fh[0].tell()
            self.recordcount = self.filesize / self.reclen
            self.chunksize = (self.recordcount / self.splitcount) * self.reclen
            self.leftovers = self.recordcount % self.splitcount
            startpos = 0
            for i in range(self.splitcount):
                startpos += self.chunksize 
                if i < self.leftovers:
                    startpos += self.reclen
        def get_row(self):
            row = []
            for i in range(self.splitcount):
            return row
        def close(self):        
            for i in range(self.splitcount):
    def test():
        sr = splitreader('fixedreclen_3.dat',3,4)
        print 'recordcount =',sr.recordcount
        print 'leftovers =',sr.leftovers
        while sr.fh[-1].tell() < sr.filesize:
            row = sr.get_row()
            print ' '.join(row)
        row = sr.get_row()  # read final row with leftovers
        print ' '.join(row)
    if __name__ == '__main__':
    # fixedreclen_3.dat contains this single line:
    # output:
    recordcount = 19
    leftovers = 3
    111 666 BBB GGG
    222 777 CCC HHH
    333 888 DDD III
    444 999 EEE JJJ
    555 AAA FFF

    Open in new window

    LVL 28

    Expert Comment

    I do not know the background. Anyway, do you really want to print all of the "million records"? There probably is some selection of them. Isn't it? Then you should also explain why it is important to print it in columns instead of rows. It would be understandable if they were say pages of a phone list or the like. However, in such case, you want to make sections by pages -- say 300 items per page. If for some reasons the cutter needs the four columns arranged this way (probably because of the printed media being cut to four piles of say price cards for a shop that are put one pile to another to get the correct ordering.

    Still, the "millions of printed records" seems to be unrealistic for me. This way I guess that you really want to print the number of items that could easily fit into memory. In Python, you could read them into a list that could be accessed also through indexing.

    Please, write more details.
    LVL 17

    Expert Comment

    Some real world data would help a lot.

    Are the records fixed format? fixed length?

    Author Comment

    "I do not know the background. Anyway, do you really want to print all of the "million records"?"

    In direct mail, it's very common to print millions of pieces in a run.  Datasets normally run anywhere from 150,000 records to 6 or 7 million in a job.  It's normal.  The output structure is build for the reason you specified; In this particular case, the customer is printing post cards in columns of four across and 18" wide form.  They are "slit" into columns, then chopped, effectively creating postcards that are "stacked" at the end of the slitter/cutter.  Since you may have dozens of these cutters, the files are run like this so that they can be reassembled in natural order on skids.

    The data is almost always drawn from mainframe systems and are fixed length ASCII records with CRLF.  Since i need to "jump around the file" I am using file pointers (seeking) to the start of the next output record.  The problem I'm having is the algorithm.  

    My plan was to create a loop counter equal to the record count / 4, and have an array list with 4 elements; one to hold the seek position for each subsequent record in the output leg.

    My loops got out of control and I'm having a tough time keeping track of why I'm either getting more, or, less records than the input file.  

    In an aside, I'm using "wing" as an editor for the autocomplete and integrated debugger, but it's slow in step mode debugging.  Does anyone have a better editor suggestion to try?

    CXR:  I will gave that solution a try (after I read and understand it).. :)  
    LVL 17

    Expert Comment

    Please show us your code. Else we are shooting in the dark.
    LVL 28

    Expert Comment

    Then the crx's approach is probably the correct one. Try the alternative snippet below that simulates the printing by writing to the output file. It can easily be modified to produce the reorganized output file from the input source.

    The simulated intput reads the records as lines, however, it is possible to read, say, multiline fixed records.

    The simulated output looks like (997 records counted from zero):

    ... and the last ones


    import os
    # Generate some sample file.
    fname = 'test.dat'
    f = open(fname, 'w')
    for n in xrange(1000 - 3):  # i.e. simulate 3 missing items for the last case
        s = 'rec%050i\n' % n
    # Determine the fixed length (assumed) of the record.
    f = open(fname)
    pos1 = f.tell()
    s = f.readline()
    pos2 = f.tell()
    recsize = pos2 - pos1
    print 'Record size:', recsize
    # Determine the length of the file and compute the number
    # of records.
    fsize = os.stat(fname).st_size
    print 'File size:', fsize 
    nrec = fsize / recsize
    print 'No. of records:', nrec
    # Compute the seek offset. We know that we want 4 columns; hence, add 3 to get
    # the length of the longest column after the "floor division".
    nrows = (nrec + 3) // 4
    print 'Num of rows:', nrows
    offset = nrows * recsize
    print 'Offset:', offset
    # Simulate the printing by output to another file.
    fout = open('output.txt', 'w')
    # Open the four input files with different offsets.
    f1 = open(fname)
    f2 = open(fname)
    f3 = open(fname)
    f4 = open(fname)
    # Seek to the right offsets. * 2) * 3)
    # Loop the known number of times through the records.
    # The last column may contain empty printings at the end.
    rec4 = 'init'
    for n in xrange(nrows):
        # Read the four records...
        rec1 = f1.readline()
        rec2 = f2.readline()
        rec3 = f3.readline()
        if rec4 != '':           # not reading after EOF
            rec4 = f4.readline()
        # ... and put them to the output.
        # Visualize the printed row.
        fout.write('-' * recsize + '\n')
    # Close all input file objects and the output file.

    Open in new window


    Author Comment

    sorry, I've been away.  Thanks everyone for contributing.  I thought pepr's solution was the most unique, entering the file in four different places.  You all were a great help, and I'm happy to pay my mothly dues to belong here.

    LVL 39

    Accepted Solution

    Right... my code also entered the file in four different places, but I guess you did not understand it. That's my problem, not yours! :)
    LVL 28

    Expert Comment

    Yes, and I did emphasize that in the now accepted solution: "Then the crx's approach is probably the correct one." ;)

    Author Comment

    cxr: You are right, I didn't understand that.  It's my fault, not yours..  help me figure out how to assign some points to you and I'll happily oblige..  :)

    LVL 39

    Expert Comment

    by:Roger Baklund

    Author Comment

    Ok, it may be me, but I cannot find a "Request Attention" button anyplace...
    LVL 39

    Expert Comment

    by:Roger Baklund
    See the lower right corner of the original question, on the top of this page. Just above the google translate function. :)

    Featured Post

    What Is Threat Intelligence?

    Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

    Join & Write a Comment

    Less strange, but still introduction This introduction was added (1st August, 2011) to reflect some reactions.  Firstly, the term basics in the title of the article...  As any other word, it is a symbol with meaning attached to the word by some a…
    Here I am using Python IDLE(GUI) to write a simple program and save it, so that we can just execute it in future. Because when we write any program and exit from Python then program that we have written will be lost. So for not losing our program we…
    Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, T…
    Learn the basics of while and for loops in Python.  while loops are used for testing while, or until, a condition is met: The structure of a while loop is as follows:     while <condition>:         do something         repeate: The break statement m…

    728 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    18 Experts available now in Live!

    Get 1:1 Help Now