• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 2074
  • Last Modified:

Python convert long row into multiple columns

I have a CSV file that has just one  column but needs to be multiple columns.

Example only one column

row 1    aaaaaaa        aaaa    aaaaaaaaa     aaaaa a    aaaaaaa

I want it be with multiple columns

column(fixed length) column2(fixed length) column3(fixed length) column4(fixed length)

and the rows will be parsed into multiple fixed column lengths.

Anybody have an idea to do this in Python?
0
sonofstimpy1
Asked:
sonofstimpy1
  • 3
  • 2
1 Solution
 
peprCommented:
Firstly, CSV originally means "Comma Separated Values". These days, also tab or semicolon is used sometimes. Definitely, the fixed width of columns is something very different in principle. There is some ambiguity in your question. The example seems to be fixed width. But it is not clear whether you want the fixed width values after reading the resulting file (i.e. memory representation) while the file itself should be stored as CSV, or whether the wanted file should be stored using fixed width. In other words... Are you asking how to read the fixed-width columns file or to write the read values back as CSV?

Try the following code. The reading part shows how to extract the values from the fixed-width columns, the writing part shows how to write tha values to a CSV file.
#!python2

import csv

fname_in = 'a.txt'
fname_out = 'a.csv'

cols = [0, 9, 24, 32, 46, 52, 57 ]  # positions of the columns (zero based)
with open(fname_in) as fin, open(fname_out, 'wb') as fout:  
                            # for the csv file, must be open in binary mode
    writer = csv.writer(fout)
    for line in fin:
        line = line.rstrip()  # removing the '\n' and other trailing whitespaces
        # if needed, you can remove only the '\n' char like this:
        # if line[-1] == '\n':
        #     line = line[:-1]

        # Get the row (the list) of the column values.        
        row = []                     # init -- empty list
        pos1 = cols[0]
        for pos2 in cols[1:]:
            value = line[pos1:pos2]  # slice the column value
            value = value.rstrip()   # if you want to remove trailing whitespaces
            row.append(value)        # append to the row of values    
            pos1 = pos2              # get ready for the next column
            
        value = line[pos1:]          # last column value
        value = value.rstrip()       # if you want to remove trailing whitespaces
        row.append(value)            # append to the row of values    
        
        # Process the row -- here print and writing to the CSV file.
        print row
        writer.writerow(row)

Open in new window

The sample of 3 rows was saved to a.txt. It prints
c:\tmp\___python\sonofstimpy1\Q_28125767>py a.py
['row 1', 'aaaaaaa', 'aaaa', 'aaaaaaaaa', 'aaaaa', 'a', 'aaaaaaa']
['row 2', 'aaaaaaa', 'aaaa', 'aaaaaaaaa', 'aaaaa', 'a', 'aaaaaaa']
['row 3', 'aaaaaaa', 'aaaa', 'aaaaaaaaa', 'aaaaa', 'a', 'aaaaaaa']

Open in new window

and it writes the following content to the a.csv:
row 1,aaaaaaa,aaaa,aaaaaaaaa,aaaaa,a,aaaaaaa
row 2,aaaaaaa,aaaa,aaaaaaaaa,aaaaa,a,aaaaaaa
row 3,aaaaaaa,aaaa,aaaaaaaaa,aaaaa,a,aaaaaaa

Open in new window

0
 
sonofstimpy1Author Commented:
Thanks for replying. Lemmie be more specific.

I have a CSV file with 1,000+ rows.

Each row is just a single column, looking something like this...

Here is an example Here is a second example blah blah

I need this row should be cut into multiple columns. I know the fixed length of each column...

column 1 is the first 8 characters
column 2 is the next 4 characters
column 3 is the next 5 characters
column 4 is the next 6 characters

So I want to open the file and read, then write to CSV with the revised rows with columns.
0
 
peprCommented:
Firstly, do not call it CSV file as it is not a CSV file. I do not have problem with understanding you, but you may confuse someone who did not dig deep enough to your problem. It may have the .csv extension because you have saved it this way, but it is not CSV inside. This way, it is confusing to speak about a single column inside the file. It is just a text file.

Try the following code that reads the values as substrings of the columns of the predefined width. Notice your wanted widths assigned to the list of the name. Read the comments:
#!python2

fname_in = 'b.txt'
fname_out = 'b.out'

# These are your widths of the columns.
widths = [8, 4, 5, 6]

# Let's build the list of the starting positions. Initialize the first one
# and loop through all the widths and calculate the starting positions.
pos = 0                 # position of the first column (zero based index)
start = [pos]           # first position in the list
for w in widths:
    pos = pos + w       # previous position plus the width is the next position
    start.append(pos)   # position of the next column appended to the list

# Just to see what was constructe (remove it from the production code).
# We have also calculated the position of the column that would be placed
# after the last one. The last position simplifies the slicing below.
# Also, there is no need to strip the '\n' -- it will be sliced off.
print 'Positions:', start

# Open the input file for reading and the output file for writing.
with open(fname_in) as fin, \
     open(fname_out, 'w') as fout:

    # Loop through the lines of the input file. The '\n' is the part
    # of the strings read from the line.
    for line in fin:
        # Build the row (the list) of the column values.
        row = []                     # init -- empty list
        pos1 = start[0]              # index of the first char of the column
        for pos2 in start[1:]:       # index just after the column
            value = line[pos1:pos2]  # slice the column substring
            row.append(value)        # append to the row of values
            pos1 = pos2              # get ready for the next column

        # Process the row -- here print and writing to the output file.
        # Here the length of the lines changed because we have added
        # the bar characters.
        print row   # just to see what was extracted -- remove in production code
        fout.write('|'.join(row) + '|\n')

Open in new window

For the attached file b.txt with the content
88888888444455555666666
line 2  abc 12345    xx

Open in new window

(just two lines but it does not matter), it displays the following lines
c:\tmp\___python\sonofstimpy1\Q_28125767>py b.py
Positions: [0, 8, 12, 17, 23]
['88888888', '4444', '55555', '666666']
['line 2  ', 'abc ', '12345', '    xx']

Open in new window

and it produces the b.out file with the following content
88888888|4444|55555|666666|
line 2  |abc |12345|    xx|

Open in new window

Of course, it depends on you how you want to process the values and how you want to store the result. Here the row is simply joined to a single string by the bar characters, and the bar is also added to the end of the line to visualize the end of the content.
0
 
sonofstimpy1Author Commented:
Thanks Pepr. I was re-checking your original post and it works exactly how I want it. I've noted your point about CSV files in general. Thanks for taking the time to help.
0
 
peprCommented:
I am glad if it helped. Anyway, there is another part of the problem when working with fixed-width columns -- the formatting of the value to the width. Feel free to continue in discussion just below if you have the related questions.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now