Solved

Python convert long row into multiple columns

Posted on 2013-05-12
5
1,525 Views
Last Modified: 2013-05-15
I have a CSV file that has just one  column but needs to be multiple columns.

Example only one column

row 1    aaaaaaa        aaaa    aaaaaaaaa     aaaaa a    aaaaaaa

I want it be with multiple columns

column(fixed length) column2(fixed length) column3(fixed length) column4(fixed length)

and the rows will be parsed into multiple fixed column lengths.

Anybody have an idea to do this in Python?
0
Comment
Question by:sonofstimpy1
  • 3
  • 2
5 Comments
 
LVL 28

Expert Comment

by:pepr
Comment Utility
Firstly, CSV originally means "Comma Separated Values". These days, also tab or semicolon is used sometimes. Definitely, the fixed width of columns is something very different in principle. There is some ambiguity in your question. The example seems to be fixed width. But it is not clear whether you want the fixed width values after reading the resulting file (i.e. memory representation) while the file itself should be stored as CSV, or whether the wanted file should be stored using fixed width. In other words... Are you asking how to read the fixed-width columns file or to write the read values back as CSV?

Try the following code. The reading part shows how to extract the values from the fixed-width columns, the writing part shows how to write tha values to a CSV file.
#!python2

import csv

fname_in = 'a.txt'
fname_out = 'a.csv'

cols = [0, 9, 24, 32, 46, 52, 57 ]  # positions of the columns (zero based)
with open(fname_in) as fin, open(fname_out, 'wb') as fout:  
                            # for the csv file, must be open in binary mode
    writer = csv.writer(fout)
    for line in fin:
        line = line.rstrip()  # removing the '\n' and other trailing whitespaces
        # if needed, you can remove only the '\n' char like this:
        # if line[-1] == '\n':
        #     line = line[:-1]

        # Get the row (the list) of the column values.        
        row = []                     # init -- empty list
        pos1 = cols[0]
        for pos2 in cols[1:]:
            value = line[pos1:pos2]  # slice the column value
            value = value.rstrip()   # if you want to remove trailing whitespaces
            row.append(value)        # append to the row of values    
            pos1 = pos2              # get ready for the next column
            
        value = line[pos1:]          # last column value
        value = value.rstrip()       # if you want to remove trailing whitespaces
        row.append(value)            # append to the row of values    
        
        # Process the row -- here print and writing to the CSV file.
        print row
        writer.writerow(row)

Open in new window

The sample of 3 rows was saved to a.txt. It prints
c:\tmp\___python\sonofstimpy1\Q_28125767>py a.py
['row 1', 'aaaaaaa', 'aaaa', 'aaaaaaaaa', 'aaaaa', 'a', 'aaaaaaa']
['row 2', 'aaaaaaa', 'aaaa', 'aaaaaaaaa', 'aaaaa', 'a', 'aaaaaaa']
['row 3', 'aaaaaaa', 'aaaa', 'aaaaaaaaa', 'aaaaa', 'a', 'aaaaaaa']

Open in new window

and it writes the following content to the a.csv:
row 1,aaaaaaa,aaaa,aaaaaaaaa,aaaaa,a,aaaaaaa
row 2,aaaaaaa,aaaa,aaaaaaaaa,aaaaa,a,aaaaaaa
row 3,aaaaaaa,aaaa,aaaaaaaaa,aaaaa,a,aaaaaaa

Open in new window

0
 

Author Comment

by:sonofstimpy1
Comment Utility
Thanks for replying. Lemmie be more specific.

I have a CSV file with 1,000+ rows.

Each row is just a single column, looking something like this...

Here is an example Here is a second example blah blah

I need this row should be cut into multiple columns. I know the fixed length of each column...

column 1 is the first 8 characters
column 2 is the next 4 characters
column 3 is the next 5 characters
column 4 is the next 6 characters

So I want to open the file and read, then write to CSV with the revised rows with columns.
0
 
LVL 28

Accepted Solution

by:
pepr earned 500 total points
Comment Utility
Firstly, do not call it CSV file as it is not a CSV file. I do not have problem with understanding you, but you may confuse someone who did not dig deep enough to your problem. It may have the .csv extension because you have saved it this way, but it is not CSV inside. This way, it is confusing to speak about a single column inside the file. It is just a text file.

Try the following code that reads the values as substrings of the columns of the predefined width. Notice your wanted widths assigned to the list of the name. Read the comments:
#!python2

fname_in = 'b.txt'
fname_out = 'b.out'

# These are your widths of the columns.
widths = [8, 4, 5, 6]

# Let's build the list of the starting positions. Initialize the first one
# and loop through all the widths and calculate the starting positions.
pos = 0                 # position of the first column (zero based index)
start = [pos]           # first position in the list
for w in widths:
    pos = pos + w       # previous position plus the width is the next position
    start.append(pos)   # position of the next column appended to the list

# Just to see what was constructe (remove it from the production code).
# We have also calculated the position of the column that would be placed
# after the last one. The last position simplifies the slicing below.
# Also, there is no need to strip the '\n' -- it will be sliced off.
print 'Positions:', start

# Open the input file for reading and the output file for writing.
with open(fname_in) as fin, \
     open(fname_out, 'w') as fout:

    # Loop through the lines of the input file. The '\n' is the part
    # of the strings read from the line.
    for line in fin:
        # Build the row (the list) of the column values.
        row = []                     # init -- empty list
        pos1 = start[0]              # index of the first char of the column
        for pos2 in start[1:]:       # index just after the column
            value = line[pos1:pos2]  # slice the column substring
            row.append(value)        # append to the row of values
            pos1 = pos2              # get ready for the next column

        # Process the row -- here print and writing to the output file.
        # Here the length of the lines changed because we have added
        # the bar characters.
        print row   # just to see what was extracted -- remove in production code
        fout.write('|'.join(row) + '|\n')

Open in new window

For the attached file b.txt with the content
88888888444455555666666
line 2  abc 12345    xx

Open in new window

(just two lines but it does not matter), it displays the following lines
c:\tmp\___python\sonofstimpy1\Q_28125767>py b.py
Positions: [0, 8, 12, 17, 23]
['88888888', '4444', '55555', '666666']
['line 2  ', 'abc ', '12345', '    xx']

Open in new window

and it produces the b.out file with the following content
88888888|4444|55555|666666|
line 2  |abc |12345|    xx|

Open in new window

Of course, it depends on you how you want to process the values and how you want to store the result. Here the row is simply joined to a single string by the bar characters, and the bar is also added to the end of the line to visualize the end of the content.
0
 

Author Closing Comment

by:sonofstimpy1
Comment Utility
Thanks Pepr. I was re-checking your original post and it works exactly how I want it. I've noted your point about CSV files in general. Thanks for taking the time to help.
0
 
LVL 28

Expert Comment

by:pepr
Comment Utility
I am glad if it helped. Anyway, there is another part of the problem when working with fixed-width columns -- the formatting of the value to the width. Feel free to continue in discussion just below if you have the related questions.
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

If you have heard of RFC822 date formats, they can be quite a challenge in SQL Server. RFC822 is an Internet standard format for email message headers, including all dates within those headers. The RFC822 protocols are available in detail at:   ht…
When we want to run, execute or repeat a statement multiple times, a loop is necessary. This article covers the two types of loops in Python: the while loop and the for loop.
Learn the basics of strings in Python: declaration, operations, indices, and slicing. Strings are declared with quotations; for example: s = "string": Strings are immutable.: Strings may be concatenated or multiplied using the addition and multiplic…
Learn the basics of modules and packages in Python. Every Python file is a module, ending in the suffix: .py: Modules are a collection of functions and variables.: Packages are a collection of modules.: Module functions and variables are accessed us…

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now