Avatar of catalini
catalini asked on

parse data from txt files into csv

Hi! I've got some txt files that look like this:

ABCDEFG:
value1
GSFSTSS:
value2
UTUSJSJSSKKSK:
value3

and so on...

I would like to create a csv file with structure
ABCDEFG     GSFSTSS          UTUSJSJSSKKSK
value 1           value2               value3

with all the data from the txt files.
The column names are always the same across files.

Thank you soo much!
Perl

Avatar of undefined
Last Comment
catalini

8/22/2022 - Mon
ASKER
catalini

bdw. some of the values are on multiple lines...

e.g.

ABCDEFG:
value1
value1
value1
GSFSTSS:
value2
UTUSJSJSSKKSK:
value3
ASKER
catalini

all the lines that are column names have a ":" at the end...
ozo

what would be the structure of the file you would want to create from
ABCDEFG:
value1
value1
value1
GSFSTSS:
value2
UTUSJSJSSKKSK:
value3
This is the best money I have ever spent. I cannot not tell you how many times these folks have saved my bacon. I learn so much from the contributors.
rwheeler23
ASKER
catalini

I would like to create a csv file with structure
ABCDEFG                             GSFSTSS          UTUSJSJSSKKSK
value 1 value1 value1           value2               value3


(i.e. replace the \n with a space)
ozo

Are there spaces between value1 and value2 and between value2 and value2 and between ABCDEFG  and GSFSTSS and between GSFSTSS and UTUSJSJSSKKSK?
Adam314

Will the column names always be in the same order?
Will every column always be in every file?
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
ASKER
catalini

adam314:
not all columns will be in every file, but the order is the same

ozo:
i would like to have a csv file, so the real format should be something like this

"ABCDEFG","GSFSTSS","UTUSJSJSSKKSK"         (column headers)
"value 1 value1 value1","value2","value3"              (row from first txt file)
"value 1 value1 value1","value2","value3"            (row from second txt file)
.....
Adam314

Can you be sure the first file will have all the columns?  Or can you define all the possible columns ahead of time?

If so, you can process one file at a time, and write the data.  Otherwise you need to process all files, and store them all in memory before writing the output.  This isn't difficult programming wise, but if you have a lot of large files, it can require a lot of memory.
ASKER
catalini

adam, thanks for your help,

I can define all the columns I need before and skip the others... furthermore memory should not be a problem
Your help has saved me hundreds of hours of internet surfing.
fblack61
ASKER CERTIFIED SOLUTION
ozo

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
See how we're fighting big data
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
ASKER
catalini

ozo thanks! how should I tell the script to run through all the files in the directory?
ozo

If you can define the columns ahead of time
@k=("ABCDEFG","GSFSTSS","UTUSJSJSSKKSK");
@k{@k}=@k;
print join(',',map"\"$_\"",@k),"\n";
while( <> ){
    chomp;
    $k=$1 and next if /^(\S+):\s*$/;
    push @{$v{$k}},$_ if $k{$k};
    print join(',',map"\"@$_\"",@v{@k} ),"\n" and %v=() if eof;
}
ASKER
catalini

thanks ozo, but how do I call the script on all the files in a subdirectory?
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
ozo

script subdirectory/*
ASKER
catalini

in that case I only get blank lines returned...
ozo

what do you get from
cat subdirectory/*
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
ozo

A lot of your questions seem to be about Microsoft programs
Id you are under MSWindows or DOS, you may have to add
@ARGV=<@ARGV>;
 in order to make * work
ASKER
catalini

I'm running your code under Ubuntu...

cat subdirectory/*

returns all the text files at once...
ASKER
catalini

I've found the mistake...

actually I wasn't totally precise, some column headers can be like

Aasdadas Basda:
value1

so not always capital letters :-(
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
ozo

change /^(\S+):\s*$/ to /(.+):\s*$/
assuming values never have :
ASKER
catalini

values never have a trailing ":" but may have it in the middle... :-(
ASKER
catalini

headers have always a ":" at the end...
Experts Exchange is like having an extremely knowledgeable team sitting and waiting for your call. Couldn't do my job half as well as I do without it!
James Murphy
ASKER
catalini

it works like a charm!!!!!

thank you sooooooo much!!!!
ASKER
catalini

ozo, you saved me hours of work! thank you so much!