Python convert long row into multiple columns

Posted on 2013-05-12
Last Modified: 2013-05-15
I have a CSV file that has just one  column but needs to be multiple columns.

Example only one column

row 1    aaaaaaa        aaaa    aaaaaaaaa     aaaaa a    aaaaaaa

I want it be with multiple columns

column(fixed length) column2(fixed length) column3(fixed length) column4(fixed length)

and the rows will be parsed into multiple fixed column lengths.

Anybody have an idea to do this in Python?
Question by:sonofstimpy1
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
LVL 29

Expert Comment

ID: 39160521
Firstly, CSV originally means "Comma Separated Values". These days, also tab or semicolon is used sometimes. Definitely, the fixed width of columns is something very different in principle. There is some ambiguity in your question. The example seems to be fixed width. But it is not clear whether you want the fixed width values after reading the resulting file (i.e. memory representation) while the file itself should be stored as CSV, or whether the wanted file should be stored using fixed width. In other words... Are you asking how to read the fixed-width columns file or to write the read values back as CSV?

Try the following code. The reading part shows how to extract the values from the fixed-width columns, the writing part shows how to write tha values to a CSV file.

import csv

fname_in = 'a.txt'
fname_out = 'a.csv'

cols = [0, 9, 24, 32, 46, 52, 57 ]  # positions of the columns (zero based)
with open(fname_in) as fin, open(fname_out, 'wb') as fout:  
                            # for the csv file, must be open in binary mode
    writer = csv.writer(fout)
    for line in fin:
        line = line.rstrip()  # removing the '\n' and other trailing whitespaces
        # if needed, you can remove only the '\n' char like this:
        # if line[-1] == '\n':
        #     line = line[:-1]

        # Get the row (the list) of the column values.        
        row = []                     # init -- empty list
        pos1 = cols[0]
        for pos2 in cols[1:]:
            value = line[pos1:pos2]  # slice the column value
            value = value.rstrip()   # if you want to remove trailing whitespaces
            row.append(value)        # append to the row of values    
            pos1 = pos2              # get ready for the next column
        value = line[pos1:]          # last column value
        value = value.rstrip()       # if you want to remove trailing whitespaces
        row.append(value)            # append to the row of values    
        # Process the row -- here print and writing to the CSV file.
        print row

Open in new window

The sample of 3 rows was saved to a.txt. It prints
['row 1', 'aaaaaaa', 'aaaa', 'aaaaaaaaa', 'aaaaa', 'a', 'aaaaaaa']
['row 2', 'aaaaaaa', 'aaaa', 'aaaaaaaaa', 'aaaaa', 'a', 'aaaaaaa']
['row 3', 'aaaaaaa', 'aaaa', 'aaaaaaaaa', 'aaaaa', 'a', 'aaaaaaa']

Open in new window

and it writes the following content to the a.csv:
row 1,aaaaaaa,aaaa,aaaaaaaaa,aaaaa,a,aaaaaaa
row 2,aaaaaaa,aaaa,aaaaaaaaa,aaaaa,a,aaaaaaa
row 3,aaaaaaa,aaaa,aaaaaaaaa,aaaaa,a,aaaaaaa

Open in new window


Author Comment

ID: 39165130
Thanks for replying. Lemmie be more specific.

I have a CSV file with 1,000+ rows.

Each row is just a single column, looking something like this...

Here is an example Here is a second example blah blah

I need this row should be cut into multiple columns. I know the fixed length of each column...

column 1 is the first 8 characters
column 2 is the next 4 characters
column 3 is the next 5 characters
column 4 is the next 6 characters

So I want to open the file and read, then write to CSV with the revised rows with columns.
LVL 29

Accepted Solution

pepr earned 500 total points
ID: 39167210
Firstly, do not call it CSV file as it is not a CSV file. I do not have problem with understanding you, but you may confuse someone who did not dig deep enough to your problem. It may have the .csv extension because you have saved it this way, but it is not CSV inside. This way, it is confusing to speak about a single column inside the file. It is just a text file.

Try the following code that reads the values as substrings of the columns of the predefined width. Notice your wanted widths assigned to the list of the name. Read the comments:

fname_in = 'b.txt'
fname_out = 'b.out'

# These are your widths of the columns.
widths = [8, 4, 5, 6]

# Let's build the list of the starting positions. Initialize the first one
# and loop through all the widths and calculate the starting positions.
pos = 0                 # position of the first column (zero based index)
start = [pos]           # first position in the list
for w in widths:
    pos = pos + w       # previous position plus the width is the next position
    start.append(pos)   # position of the next column appended to the list

# Just to see what was constructe (remove it from the production code).
# We have also calculated the position of the column that would be placed
# after the last one. The last position simplifies the slicing below.
# Also, there is no need to strip the '\n' -- it will be sliced off.
print 'Positions:', start

# Open the input file for reading and the output file for writing.
with open(fname_in) as fin, \
     open(fname_out, 'w') as fout:

    # Loop through the lines of the input file. The '\n' is the part
    # of the strings read from the line.
    for line in fin:
        # Build the row (the list) of the column values.
        row = []                     # init -- empty list
        pos1 = start[0]              # index of the first char of the column
        for pos2 in start[1:]:       # index just after the column
            value = line[pos1:pos2]  # slice the column substring
            row.append(value)        # append to the row of values
            pos1 = pos2              # get ready for the next column

        # Process the row -- here print and writing to the output file.
        # Here the length of the lines changed because we have added
        # the bar characters.
        print row   # just to see what was extracted -- remove in production code
        fout.write('|'.join(row) + '|\n')

Open in new window

For the attached file b.txt with the content
line 2  abc 12345    xx

Open in new window

(just two lines but it does not matter), it displays the following lines
Positions: [0, 8, 12, 17, 23]
['88888888', '4444', '55555', '666666']
['line 2  ', 'abc ', '12345', '    xx']

Open in new window

and it produces the b.out file with the following content
line 2  |abc |12345|    xx|

Open in new window

Of course, it depends on you how you want to process the values and how you want to store the result. Here the row is simply joined to a single string by the bar characters, and the bar is also added to the end of the line to visualize the end of the content.

Author Closing Comment

ID: 39167776
Thanks Pepr. I was re-checking your original post and it works exactly how I want it. I've noted your point about CSV files in general. Thanks for taking the time to help.
LVL 29

Expert Comment

ID: 39168091
I am glad if it helped. Anyway, there is another part of the problem when working with fixed-width columns -- the formatting of the value to the width. Feel free to continue in discussion just below if you have the related questions.

Featured Post

MS Dynamics Made Instantly Simpler

Make Your Microsoft Dynamics Investment Count  & Drastically Decrease Training Time by Providing Intuitive Step-By-Step WalkThru Tutorials.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Strings in Python are the set of characters that, once defined, cannot be changed by any other method like replace. Even if we use the replace method it still does not modify the original string that we use, but just copies the string and then modif…
If you have heard of RFC822 date formats, they can be quite a challenge in SQL Server. RFC822 is an Internet standard format for email message headers, including all dates within those headers. The RFC822 protocols are available in detail at:   ht…
Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, T…
Learn the basics of while and for loops in Python.  while loops are used for testing while, or until, a condition is met: The structure of a while loop is as follows:     while <condition>:         do something         repeate: The break statement m…

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question