Python convert long row into multiple columns

Posted on 2013-05-12
Last Modified: 2013-05-15
I have a CSV file that has just one  column but needs to be multiple columns.

Example only one column

row 1    aaaaaaa        aaaa    aaaaaaaaa     aaaaa a    aaaaaaa

I want it be with multiple columns

column(fixed length) column2(fixed length) column3(fixed length) column4(fixed length)

and the rows will be parsed into multiple fixed column lengths.

Anybody have an idea to do this in Python?
Question by:sonofstimpy1
  • 3
  • 2
LVL 29

Expert Comment

ID: 39160521
Firstly, CSV originally means "Comma Separated Values". These days, also tab or semicolon is used sometimes. Definitely, the fixed width of columns is something very different in principle. There is some ambiguity in your question. The example seems to be fixed width. But it is not clear whether you want the fixed width values after reading the resulting file (i.e. memory representation) while the file itself should be stored as CSV, or whether the wanted file should be stored using fixed width. In other words... Are you asking how to read the fixed-width columns file or to write the read values back as CSV?

Try the following code. The reading part shows how to extract the values from the fixed-width columns, the writing part shows how to write tha values to a CSV file.

import csv

fname_in = 'a.txt'
fname_out = 'a.csv'

cols = [0, 9, 24, 32, 46, 52, 57 ]  # positions of the columns (zero based)
with open(fname_in) as fin, open(fname_out, 'wb') as fout:  
                            # for the csv file, must be open in binary mode
    writer = csv.writer(fout)
    for line in fin:
        line = line.rstrip()  # removing the '\n' and other trailing whitespaces
        # if needed, you can remove only the '\n' char like this:
        # if line[-1] == '\n':
        #     line = line[:-1]

        # Get the row (the list) of the column values.        
        row = []                     # init -- empty list
        pos1 = cols[0]
        for pos2 in cols[1:]:
            value = line[pos1:pos2]  # slice the column value
            value = value.rstrip()   # if you want to remove trailing whitespaces
            row.append(value)        # append to the row of values    
            pos1 = pos2              # get ready for the next column
        value = line[pos1:]          # last column value
        value = value.rstrip()       # if you want to remove trailing whitespaces
        row.append(value)            # append to the row of values    
        # Process the row -- here print and writing to the CSV file.
        print row

Open in new window

The sample of 3 rows was saved to a.txt. It prints
['row 1', 'aaaaaaa', 'aaaa', 'aaaaaaaaa', 'aaaaa', 'a', 'aaaaaaa']
['row 2', 'aaaaaaa', 'aaaa', 'aaaaaaaaa', 'aaaaa', 'a', 'aaaaaaa']
['row 3', 'aaaaaaa', 'aaaa', 'aaaaaaaaa', 'aaaaa', 'a', 'aaaaaaa']

Open in new window

and it writes the following content to the a.csv:
row 1,aaaaaaa,aaaa,aaaaaaaaa,aaaaa,a,aaaaaaa
row 2,aaaaaaa,aaaa,aaaaaaaaa,aaaaa,a,aaaaaaa
row 3,aaaaaaa,aaaa,aaaaaaaaa,aaaaa,a,aaaaaaa

Open in new window


Author Comment

ID: 39165130
Thanks for replying. Lemmie be more specific.

I have a CSV file with 1,000+ rows.

Each row is just a single column, looking something like this...

Here is an example Here is a second example blah blah

I need this row should be cut into multiple columns. I know the fixed length of each column...

column 1 is the first 8 characters
column 2 is the next 4 characters
column 3 is the next 5 characters
column 4 is the next 6 characters

So I want to open the file and read, then write to CSV with the revised rows with columns.
LVL 29

Accepted Solution

pepr earned 500 total points
ID: 39167210
Firstly, do not call it CSV file as it is not a CSV file. I do not have problem with understanding you, but you may confuse someone who did not dig deep enough to your problem. It may have the .csv extension because you have saved it this way, but it is not CSV inside. This way, it is confusing to speak about a single column inside the file. It is just a text file.

Try the following code that reads the values as substrings of the columns of the predefined width. Notice your wanted widths assigned to the list of the name. Read the comments:

fname_in = 'b.txt'
fname_out = 'b.out'

# These are your widths of the columns.
widths = [8, 4, 5, 6]

# Let's build the list of the starting positions. Initialize the first one
# and loop through all the widths and calculate the starting positions.
pos = 0                 # position of the first column (zero based index)
start = [pos]           # first position in the list
for w in widths:
    pos = pos + w       # previous position plus the width is the next position
    start.append(pos)   # position of the next column appended to the list

# Just to see what was constructe (remove it from the production code).
# We have also calculated the position of the column that would be placed
# after the last one. The last position simplifies the slicing below.
# Also, there is no need to strip the '\n' -- it will be sliced off.
print 'Positions:', start

# Open the input file for reading and the output file for writing.
with open(fname_in) as fin, \
     open(fname_out, 'w') as fout:

    # Loop through the lines of the input file. The '\n' is the part
    # of the strings read from the line.
    for line in fin:
        # Build the row (the list) of the column values.
        row = []                     # init -- empty list
        pos1 = start[0]              # index of the first char of the column
        for pos2 in start[1:]:       # index just after the column
            value = line[pos1:pos2]  # slice the column substring
            row.append(value)        # append to the row of values
            pos1 = pos2              # get ready for the next column

        # Process the row -- here print and writing to the output file.
        # Here the length of the lines changed because we have added
        # the bar characters.
        print row   # just to see what was extracted -- remove in production code
        fout.write('|'.join(row) + '|\n')

Open in new window

For the attached file b.txt with the content
line 2  abc 12345    xx

Open in new window

(just two lines but it does not matter), it displays the following lines
Positions: [0, 8, 12, 17, 23]
['88888888', '4444', '55555', '666666']
['line 2  ', 'abc ', '12345', '    xx']

Open in new window

and it produces the b.out file with the following content
line 2  |abc |12345|    xx|

Open in new window

Of course, it depends on you how you want to process the values and how you want to store the result. Here the row is simply joined to a single string by the bar characters, and the bar is also added to the end of the line to visualize the end of the content.

Author Closing Comment

ID: 39167776
Thanks Pepr. I was re-checking your original post and it works exactly how I want it. I've noted your point about CSV files in general. Thanks for taking the time to help.
LVL 29

Expert Comment

ID: 39168091
I am glad if it helped. Anyway, there is another part of the problem when working with fixed-width columns -- the formatting of the value to the width. Feel free to continue in discussion just below if you have the related questions.

Featured Post

U.S. Department of Agriculture and Acronis Access

With the new era of mobile computing, smartphones and tablets, wireless communications and cloud services, the USDA sought to take advantage of a mobilized workforce and the blurring lines between personal and corporate computing resources.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
How to get all the API from website? 11 104
import 13 row from .csv file into another csv file 11 56
Master DB with Masterkey 1 34
Getting the NAO robot to play soccer 1 25
The purpose of this article is to demonstrate how we can upgrade Python from version 2.7.6 to Python 2.7.10 on the Linux Mint operating system. I am using an Oracle Virtual Box where I have installed Linux Mint operating system version 17.2. Once yo…
In this article I will describe the Detach & Attach method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
Learn the basics of if, else, and elif statements in Python 2.7. Use "if" statements to test a specified condition.: The structure of an if statement is as follows: (CODE) Use "else" statements to allow the execution of an alternative, if the …
Learn the basics of modules and packages in Python. Every Python file is a module, ending in the suffix: .py: Modules are a collection of functions and variables.: Packages are a collection of modules.: Module functions and variables are accessed us…

856 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question