Link to home
Start Free TrialLog in
Avatar of catalini
catalini

asked on

parse data from txt files into csv

Hi! I've got some txt files that look like this:

ABCDEFG:
value1
GSFSTSS:
value2
UTUSJSJSSKKSK:
value3

and so on...

I would like to create a csv file with structure
ABCDEFG     GSFSTSS          UTUSJSJSSKKSK
value 1           value2               value3

with all the data from the txt files.
The column names are always the same across files.

Thank you soo much!
Avatar of catalini
catalini

ASKER

bdw. some of the values are on multiple lines...

e.g.

ABCDEFG:
value1
value1
value1
GSFSTSS:
value2
UTUSJSJSSKKSK:
value3
all the lines that are column names have a ":" at the end...
Avatar of ozo
what would be the structure of the file you would want to create from
ABCDEFG:
value1
value1
value1
GSFSTSS:
value2
UTUSJSJSSKKSK:
value3
I would like to create a csv file with structure
ABCDEFG                             GSFSTSS          UTUSJSJSSKKSK
value 1 value1 value1           value2               value3


(i.e. replace the \n with a space)
Are there spaces between value1 and value2 and between value2 and value2 and between ABCDEFG  and GSFSTSS and between GSFSTSS and UTUSJSJSSKKSK?
Will the column names always be in the same order?
Will every column always be in every file?
adam314:
not all columns will be in every file, but the order is the same

ozo:
i would like to have a csv file, so the real format should be something like this

"ABCDEFG","GSFSTSS","UTUSJSJSSKKSK"         (column headers)
"value 1 value1 value1","value2","value3"              (row from first txt file)
"value 1 value1 value1","value2","value3"            (row from second txt file)
.....
Can you be sure the first file will have all the columns?  Or can you define all the possible columns ahead of time?

If so, you can process one file at a time, and write the data.  Otherwise you need to process all files, and store them all in memory before writing the output.  This isn't difficult programming wise, but if you have a lot of large files, it can require a lot of memory.
adam, thanks for your help,

I can define all the columns I need before and skip the others... furthermore memory should not be a problem
ASKER CERTIFIED SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ozo thanks! how should I tell the script to run through all the files in the directory?
If you can define the columns ahead of time
@k=("ABCDEFG","GSFSTSS","UTUSJSJSSKKSK");
@k{@k}=@k;
print join(',',map"\"$_\"",@k),"\n";
while( <> ){
    chomp;
    $k=$1 and next if /^(\S+):\s*$/;
    push @{$v{$k}},$_ if $k{$k};
    print join(',',map"\"@$_\"",@v{@k} ),"\n" and %v=() if eof;
}
thanks ozo, but how do I call the script on all the files in a subdirectory?
script subdirectory/*
in that case I only get blank lines returned...
what do you get from
cat subdirectory/*
A lot of your questions seem to be about Microsoft programs
Id you are under MSWindows or DOS, you may have to add
@ARGV=<@ARGV>;
 in order to make * work
I'm running your code under Ubuntu...

cat subdirectory/*

returns all the text files at once...
I've found the mistake...

actually I wasn't totally precise, some column headers can be like

Aasdadas Basda:
value1

so not always capital letters :-(
change /^(\S+):\s*$/ to /(.+):\s*$/
assuming values never have :
values never have a trailing ":" but may have it in the middle... :-(
headers have always a ":" at the end...
it works like a charm!!!!!

thank you sooooooo much!!!!
ozo, you saved me hours of work! thank you so much!