Link to home
Start Free TrialLog in
Avatar of imad imad
imad imad

asked on

filter a file to a table

I have a file that contains the following lines of code. here the file displays a schedules which are sorted one by one .

at 12:00 the schedule of james version1 is :
first_task:eating:nothing
second_task:rest:onehour
third_task:watching:nothing 

at 12:00 the schedule of james version2 is :
first_task:eating:fruits
second_task:rest:twohour
third_task:watching:manga 

at 12:00 the schedule of alex version1 is :
first_task:eating:fruit
second_task:rest:halfhour
third_task:watching:horrorfilm 

at 12:00 the schedule of alex version2 is :
first_task:eating:meal
second_task:rest:nothing
third_task:watching:nothing 

at 18:00 the schedule of james version1 is :
first_task:eating:fastfood
second_task:rest:twohours
third_task:watching:series 

at 18:00 the schedule of james version2 is :
first_task:eating:nothing
second_task:rest:onehours
third_task:watching:series 

at 18:00 the schedule of alex version1 is :
first_task:eating:vegetals
second_task:rest:threehours
third_task:watching:manga 

at 18:00 the schedule of alex version2 is :
first_task:eating:bread
second_task:rest:fivehours
third_task:watching:manga 

at 22:00 the schedule of james version1 is :
first_task:eating:nothing
second_task:rest:sevenhours
third_task:watching:nothing

at 22:00 the schedule of james version2 is :
first_task:eating:meal
second_task:rest:sixnhours
third_task:watching:nothing

at 22:00 the schedule of alex version1 is :
first_task:eating:vegetals
second_task:rest:sevehours
third_task:watching:manga 

at 22:00 the schedule of alex version2 is :
first_task:eating:icecream
second_task:rest:sevenhours
third_task:watching:nothing 

Open in new window


I've tried to sort it this way :
12:00 eating:fruit
18:00 eating:vegetals
22:00 eating:nothing

12:00 rest:onhour 
18:00 rest:threehour
22:00 rest:sevenhour

12:00 watching:horrorfilm 
18:00 watching:manga
22:00 watching:nothing

Open in new window


using these commands :

awk -F '[\ :]' '/the schedule is/{h=$2;m=$3} /eating/{print " "h":"m" watching:"$3}' f.txt
awk -F '[\ :]' '/the schedule is/{h=$2;m=$3} /rest/{print " "h":"m" rest:"$3}' f.txt
awk -F '[\ :]' '/the schedule is/{h=$2;m=$3} /watching/{print " "h":"m" watching:"$3}' f.txt

Open in new window


Now I am looking to improve the filtered file by ignoring all non significant words and sorting the most valuable information in a table , I've tried to think/search how get this format but in vain .


James version1,12:00,18:00,22:00
eating,nothing,fastfood,nothing
rest,onehour,halfhour,sevenhours
watching,nothing,series,nothing

James version2,12:00,18:00,22:00
eating,fruits,nothing,meal
rest,twohour,onehours,sixnhours
watching,manga,series,nothing 

alex version1,12:00,18:00,22:00
eating,fruit,vegetals,vegetals
rest,halfhour,threehours,sevehours
watching,horrorfilm,manga,manga 

alex version2,12:00,18:00,22:00
eating,meal,bread,icecream
rest,nothing,fivehours,sevenhours
watching,nothing,manga,nothing

Open in new window

Avatar of ozo
ozo
Flag of United States of America image

what are the non significant words, and what is the most valuable information?
Avatar of imad imad
imad imad

ASKER

The non significant words are :

first_task
second_task:
third_task:
the schedule of

the most valuable for example : sorted as an csv table

James version1,12:00,18:00,22:00
eating,nothing,fastfood,nothing
rest,onehour,halfhour,sevenhours
watching,nothing,series,nothing
SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
why "james version1" is one attribute and not two like "james", "version1"?
why eating,nothing is two (unrelated) attributes and not one like eating:nothing?
why did you turn relations like "James version2 12:00 eating:fruits" to be unrelated?

you could get two entities from the data:

entity 1: ID,Name,Version

1,james,1
2,james,2
3,alex,1
4,alex,2

Open in new window


entity 2: ID,Schedule,Task,Activity,What

1,12:00,1,eating,nothing
1,12:00,2,rest,onehour
1,12::00,3,watching,nothing
2,12:00,1,eating,fruits
2,12:00,2,rest,twohour
2,12::00,3,watching,manga
...

Open in new window



but I am not good at perl , Any awk command ?
i would write a little parser program with c++.

Sara
it works