Can't seem to get my head around my loops

Posted on 2008-11-15
Medium Priority
Last Modified: 2012-05-05
In printing, quite often we need to rearrange data so it comes out in a different order than it arrives.  Once it's been printed its set up on cutters and needs to come out in a stream order (it called north south splitting).  I need to write a 4-way north south which is to divide the file into (4) sections, determine any leftovers (in this case, 3 records:  19/4=4.5, round down to 4.  4x4=16 R3); example below.

The data comes in in this order; 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19

It needs to be rearranged so that when the records are printed and cut into 4 "ribbons" they are presented in this way:

1   6   11   16        Print direction
2   7   12   17                |
3   8   13   18                |
4   9   14   19                |
5  10  15                      V

So, the natural order of the file needs to be rebuilt as as an output file in this order; 1,6,11,16,2,7,12,17,3,8,13,18.........

Does python have a reasonably efficient way of doing this outside of seeking to file points inside loops and counters?  These files are quite large with record lengths in the hundreds, and record counts in the millions.

Question by:TommyMac501
  • 4
  • 4
  • 3
  • +2

Expert Comment

ID: 22970296
It looks like a matrix calculation.
Try Numeric Python?
LVL 39

Expert Comment

by:Roger Baklund
ID: 22970525
You mention seeking, so I assume you have  a fixed record length. You can open the same file multiple times, and position each file handle at the right spot, then read from each filehandle in a loop. An example is provided below.
# splitreader.py
class splitreader:
    def __init__(self,filename,reclen,splitcount):
        self.filename = filename
        self.reclen = reclen
        self.splitcount = splitcount
        self.fh = []
    def open(self):
        for i in range(self.splitcount):
        self.filesize = self.fh[0].tell()
        self.recordcount = self.filesize / self.reclen
        self.chunksize = (self.recordcount / self.splitcount) * self.reclen
        self.leftovers = self.recordcount % self.splitcount
        startpos = 0
        for i in range(self.splitcount):
            startpos += self.chunksize 
            if i < self.leftovers:
                startpos += self.reclen
    def get_row(self):
        row = []
        for i in range(self.splitcount):
        return row
    def close(self):        
        for i in range(self.splitcount):
def test():
    sr = splitreader('fixedreclen_3.dat',3,4)
    print 'recordcount =',sr.recordcount
    print 'leftovers =',sr.leftovers
    while sr.fh[-1].tell() < sr.filesize:
        row = sr.get_row()
        print ' '.join(row)
    row = sr.get_row()  # read final row with leftovers
    print ' '.join(row)
if __name__ == '__main__':
# fixedreclen_3.dat contains this single line:
# output:
recordcount = 19
leftovers = 3
111 666 BBB GGG
222 777 CCC HHH
333 888 DDD III
444 999 EEE JJJ

Open in new window

LVL 29

Expert Comment

ID: 22971183
I do not know the background. Anyway, do you really want to print all of the "million records"? There probably is some selection of them. Isn't it? Then you should also explain why it is important to print it in columns instead of rows. It would be understandable if they were say pages of a phone list or the like. However, in such case, you want to make sections by pages -- say 300 items per page. If for some reasons the cutter needs the four columns arranged this way (probably because of the printed media being cut to four piles of say price cards for a shop that are put one pile to another to get the correct ordering.

Still, the "millions of printed records" seems to be unrealistic for me. This way I guess that you really want to print the number of items that could easily fit into memory. In Python, you could read them into a list that could be accessed also through indexing.

Please, write more details.
Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

LVL 17

Expert Comment

ID: 22972591
Some real world data would help a lot.

Are the records fixed format? fixed length?

Author Comment

ID: 22972816
"I do not know the background. Anyway, do you really want to print all of the "million records"?"

In direct mail, it's very common to print millions of pieces in a run.  Datasets normally run anywhere from 150,000 records to 6 or 7 million in a job.  It's normal.  The output structure is build for the reason you specified; In this particular case, the customer is printing post cards in columns of four across and 18" wide form.  They are "slit" into columns, then chopped, effectively creating postcards that are "stacked" at the end of the slitter/cutter.  Since you may have dozens of these cutters, the files are run like this so that they can be reassembled in natural order on skids.

The data is almost always drawn from mainframe systems and are fixed length ASCII records with CRLF.  Since i need to "jump around the file" I am using file pointers (seeking) to the start of the next output record.  The problem I'm having is the algorithm.  

My plan was to create a loop counter equal to the record count / 4, and have an array list with 4 elements; one to hold the seek position for each subsequent record in the output leg.

My loops got out of control and I'm having a tough time keeping track of why I'm either getting more, or, less records than the input file.  

In an aside, I'm using "wing" as an editor for the autocomplete and integrated debugger, but it's slow in step mode debugging.  Does anyone have a better editor suggestion to try?

CXR:  I will gave that solution a try (after I read and understand it).. :)  
LVL 17

Expert Comment

ID: 22972837
Please show us your code. Else we are shooting in the dark.
LVL 29

Expert Comment

ID: 22982898
Then the crx's approach is probably the correct one. Try the alternative snippet below that simulates the printing by writing to the output file. It can easily be modified to produce the reorganized output file from the input source.

The simulated intput reads the records as lines, however, it is possible to read, say, multiline fixed records.

The simulated output looks like (997 records counted from zero):

... and the last ones


import os
# Generate some sample file.
fname = 'test.dat'
f = open(fname, 'w')
for n in xrange(1000 - 3):  # i.e. simulate 3 missing items for the last case
    s = 'rec%050i\n' % n
# Determine the fixed length (assumed) of the record.
f = open(fname)
pos1 = f.tell()
s = f.readline()
pos2 = f.tell()
recsize = pos2 - pos1
print 'Record size:', recsize
# Determine the length of the file and compute the number
# of records.
fsize = os.stat(fname).st_size
print 'File size:', fsize 
nrec = fsize / recsize
print 'No. of records:', nrec
# Compute the seek offset. We know that we want 4 columns; hence, add 3 to get
# the length of the longest column after the "floor division".
nrows = (nrec + 3) // 4
print 'Num of rows:', nrows
offset = nrows * recsize
print 'Offset:', offset
# Simulate the printing by output to another file.
fout = open('output.txt', 'w')
# Open the four input files with different offsets.
f1 = open(fname)
f2 = open(fname)
f3 = open(fname)
f4 = open(fname)
# Seek to the right offsets.
f3.seek(offset * 2)
f4.seek(offset * 3)
# Loop the known number of times through the records.
# The last column may contain empty printings at the end.
rec4 = 'init'
for n in xrange(nrows):
    # Read the four records...
    rec1 = f1.readline()
    rec2 = f2.readline()
    rec3 = f3.readline()
    if rec4 != '':           # not reading after EOF
        rec4 = f4.readline()
    # ... and put them to the output.
    # Visualize the printed row.
    fout.write('-' * recsize + '\n')
# Close all input file objects and the output file.

Open in new window


Author Comment

ID: 23035931
sorry, I've been away.  Thanks everyone for contributing.  I thought pepr's solution was the most unique, entering the file in four different places.  You all were a great help, and I'm happy to pay my mothly dues to belong here.

LVL 39

Accepted Solution

Roger Baklund earned 1000 total points
ID: 23039267
Right... my code also entered the file in four different places, but I guess you did not understand it. That's my problem, not yours! :)
LVL 29

Expert Comment

ID: 23040036
Yes, and I did emphasize that in the now accepted solution: "Then the crx's approach is probably the correct one." ;)

Author Comment

ID: 23042870
cxr: You are right, I didn't understand that.  It's my fault, not yours..  help me figure out how to assign some points to you and I'll happily oblige..  :)

LVL 39

Expert Comment

by:Roger Baklund
ID: 23043255

Author Comment

ID: 23086992
Ok, it may be me, but I cannot find a "Request Attention" button anyplace...
LVL 39

Expert Comment

by:Roger Baklund
ID: 23087333
See the lower right corner of the original question, on the top of this page. Just above the google translate function. :)

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Sequence is something that used to store data in it in very simple words. Let us just create a list first. To create a list first of all we need to give a name to our list which I have taken as “COURSE” followed by equals sign and finally enclosed …
Dictionaries contain key:value pairs. Which means a collection of tuples with an attribute name and an assigned value to it. The semicolon present in between each key and values and attribute with values are delimited with a comma.  In python we can…
Learn the basics of strings in Python: declaration, operations, indices, and slicing. Strings are declared with quotations; for example: s = "string": Strings are immutable.: Strings may be concatenated or multiplied using the addition and multiplic…
Learn the basics of while and for loops in Python.  while loops are used for testing while, or until, a condition is met: The structure of a while loop is as follows:     while <condition>:         do something         repeate: The break statement m…

862 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question