Link to home
Start Free TrialLog in
Avatar of Anne H
Anne H

asked on

How to transpose a wide file to a skinny file with key-value pairs using python

Hi all,
I am new to python and would appreciate your help in transpose a wide file to a key value pairs to be loaded to database. Here is the input and output file format. Thanks!

Input file          
visitor_id  param_1 param_2 param_3
1           vp1     vp2     vp3
2           vp12    vp22    vp23

Output file        
vistor_id   converted_key   converted_value
1           param_1         vp1
1           param_2         vp2
1           param_3         vp3
2           param_1         vp12    
2           param_2         vp22    
2           param_3         vp23
Avatar of Pawan Kumar
Pawan Kumar
Flag of India image

which Db are you using?
Avatar of Anne H
Anne H

ASKER

redshift
Avatar of Anne H

ASKER

we have over 13k fields in the file and the order could change. that's why we design the table this way.
Just to be clear, your input and output files are tab-delimited text files? Comma-delimited text files?
Avatar of Anne H

ASKER

thanks for responding. the input and output files can be any delimited file. And the order of the output lines do not matter either since I can sort them in the database.
ASKER CERTIFIED SOLUTION
Avatar of Flabio Gates
Flabio Gates

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Anne H

ASKER

Thank you so much Flabio!
I am using python3. After slightly change I got this to work.
import csv

if __name__ == '__main__':
    inpath = 'input.txt'
    outpath = 'output.txt'
    with open(inpath, 'r') as inf, open(outpath, 'w') as outf:
        reader = csv.reader(inf, delimiter='|')
        writer = csv.writer(outf, delimiter='|')
       
        header=next(reader)
        h0=header.pop(0)
        print (h0)
        writer.writerow((h0, 'converted_key', 'converted_value'))
       
        for row in reader:
            vi = row.pop(0)
            print (vi)
            for values in zip(header, row):
                print (values)
                writer.writerow((vi,)+values)

another question, since the order of original fields in the raw file could change, I might need to dynamically look for the visitor_id in the header and write the value of visitor id in the first column. How could I do that?

Thanks a lot!
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Anne H

ASKER

Thank you so much! It worked. :)
Question inactive