catalini
asked on
parse data from txt files into csv
Hi! I've got some txt files that look like this:
ABCDEFG:
value1
GSFSTSS:
value2
UTUSJSJSSKKSK:
value3
and so on...
I would like to create a csv file with structure
ABCDEFG GSFSTSS UTUSJSJSSKKSK
value 1 value2 value3
with all the data from the txt files.
The column names are always the same across files.
Thank you soo much!
ABCDEFG:
value1
GSFSTSS:
value2
UTUSJSJSSKKSK:
value3
and so on...
I would like to create a csv file with structure
ABCDEFG GSFSTSS UTUSJSJSSKKSK
value 1 value2 value3
with all the data from the txt files.
The column names are always the same across files.
Thank you soo much!
ASKER
all the lines that are column names have a ":" at the end...
what would be the structure of the file you would want to create from
ABCDEFG:
value1
value1
value1
GSFSTSS:
value2
UTUSJSJSSKKSK:
value3
ABCDEFG:
value1
value1
value1
GSFSTSS:
value2
UTUSJSJSSKKSK:
value3
ASKER
I would like to create a csv file with structure
ABCDEFG GSFSTSS UTUSJSJSSKKSK
value 1 value1 value1 value2 value3
(i.e. replace the \n with a space)
ABCDEFG GSFSTSS UTUSJSJSSKKSK
value 1 value1 value1 value2 value3
(i.e. replace the \n with a space)
Are there spaces between value1 and value2 and between value2 and value2 and between ABCDEFG and GSFSTSS and between GSFSTSS and UTUSJSJSSKKSK?
Will the column names always be in the same order?
Will every column always be in every file?
Will every column always be in every file?
ASKER
adam314:
not all columns will be in every file, but the order is the same
ozo:
i would like to have a csv file, so the real format should be something like this
"ABCDEFG","GSFSTSS","UTUSJ SJSSKKSK" (column headers)
"value 1 value1 value1","value2","value3" (row from first txt file)
"value 1 value1 value1","value2","value3" (row from second txt file)
.....
not all columns will be in every file, but the order is the same
ozo:
i would like to have a csv file, so the real format should be something like this
"ABCDEFG","GSFSTSS","UTUSJ
"value 1 value1 value1","value2","value3" (row from first txt file)
"value 1 value1 value1","value2","value3" (row from second txt file)
.....
Can you be sure the first file will have all the columns? Or can you define all the possible columns ahead of time?
If so, you can process one file at a time, and write the data. Otherwise you need to process all files, and store them all in memory before writing the output. This isn't difficult programming wise, but if you have a lot of large files, it can require a lot of memory.
If so, you can process one file at a time, and write the data. Otherwise you need to process all files, and store them all in memory before writing the output. This isn't difficult programming wise, but if you have a lot of large files, it can require a lot of memory.
ASKER
adam, thanks for your help,
I can define all the columns I need before and skip the others... furthermore memory should not be a problem
I can define all the columns I need before and skip the others... furthermore memory should not be a problem
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
ozo thanks! how should I tell the script to run through all the files in the directory?
If you can define the columns ahead of time
@k=("ABCDEFG","GSFSTSS","U TUSJSJSSKK SK");
@k{@k}=@k;
print join(',',map"\"$_\"",@k)," \n";
while( <> ){
chomp;
$k=$1 and next if /^(\S+):\s*$/;
push @{$v{$k}},$_ if $k{$k};
print join(',',map"\"@$_\"",@v{@ k} ),"\n" and %v=() if eof;
}
@k=("ABCDEFG","GSFSTSS","U
@k{@k}=@k;
print join(',',map"\"$_\"",@k),"
while( <> ){
chomp;
$k=$1 and next if /^(\S+):\s*$/;
push @{$v{$k}},$_ if $k{$k};
print join(',',map"\"@$_\"",@v{@
}
ASKER
thanks ozo, but how do I call the script on all the files in a subdirectory?
script subdirectory/*
ASKER
in that case I only get blank lines returned...
what do you get from
cat subdirectory/*
cat subdirectory/*
A lot of your questions seem to be about Microsoft programs
Id you are under MSWindows or DOS, you may have to add
@ARGV=<@ARGV>;
in order to make * work
Id you are under MSWindows or DOS, you may have to add
@ARGV=<@ARGV>;
in order to make * work
ASKER
I'm running your code under Ubuntu...
cat subdirectory/*
returns all the text files at once...
cat subdirectory/*
returns all the text files at once...
ASKER
I've found the mistake...
actually I wasn't totally precise, some column headers can be like
Aasdadas Basda:
value1
so not always capital letters :-(
actually I wasn't totally precise, some column headers can be like
Aasdadas Basda:
value1
so not always capital letters :-(
change /^(\S+):\s*$/ to /(.+):\s*$/
assuming values never have :
assuming values never have :
ASKER
values never have a trailing ":" but may have it in the middle... :-(
ASKER
headers have always a ":" at the end...
ASKER
it works like a charm!!!!!
thank you sooooooo much!!!!
thank you sooooooo much!!!!
ASKER
ozo, you saved me hours of work! thank you so much!
ASKER
e.g.
ABCDEFG:
value1
value1
value1
GSFSTSS:
value2
UTUSJSJSSKKSK:
value3