Link to home
Start Free TrialLog in
Avatar of imad imad
imad imad

asked on

Filtering a file to table

I have a file that contains many logs :

at 10:00 carl 1 STR0 STR1 STR2 STR3 <STR4 STR5> [STR6 STR7] STR8:
academy/course1:oftheory:SMTGHO:nothing:
academy/course1:ofapplicaton:SMTGHP:onehour:

at 10:00 carl 2 STR0 STR1 STR2 STR3 <STR4 STR78> [STR6 STR111] STR8:
academy/course2:oftheory:SMTGHM:math:
academy/course2:ofapplicaton:SMTGHN:twohour:

at 10:00 david 1 STR0 STR1 STR2 STR3 <STR4 STR758> [STR6 STR155] STR8:
academy/course3:oftheory:SMTGHK:geo:
academy/course3:ofapplicaton:SMTGHL:halfhour:

at 10:00 david 2 STR0 STR1 STR2 STR3 <STR4 STR87> [STR6 STR74] STR8:
academy/course4:oftheory:SMTGH:SMTGHI:history:
academy/course4:ofapplicaton:SMTGHJ:nothing:

at 14:00 carl 1 STR0 STR1 STR2 STR3 <STR4 STR11> [STR6 STR784] STR8:
academy/course5:oftheory:SMTGHG:nothing:
academy/course5:ofapplicaton:SMTGHH:twohours:

at 14:00 carl 2 STR0 STR1 STR2 STR3 <STR4 STR86> [STR6 STR85] STR8:
academy/course6:oftheory:SMTGHE:music:
academy/course6:ofapplicaton:SMTGHF:twohours:

at 14:00 david 1 STR0 STR1 STR2 STR3 <STR4 STR96> [STR6 STR01] STR8:
academy/course7:oftheory:SMTGHC:programmation:
academy/course7:ofapplicaton:SMTGHD:onehours:

at 14:00 david 2 STR0 STR1 STR2 STR3 <STR4 STR335> [STR6 STR66] STR8:
academy/course8:oftheory:SMTGHA:philosophy:
academy/course8:ofapplicaton:SMTGHB:nothing:

Open in new window


Is there anyway to get a ride of these strings STR* and SMTGH* in order to get this output using awk / perl / script:

carl 1,10:00,14:00
applicaton,halfhour,onehours
theory,geo,programmation

carl 2,10:00,14:00
applicaton,nothing,nothing
theory,history,philosophy

david 1,10:00,14:00
applicaton,onehour,twohours
theory,nothing,nothing

david 2,10:00,14:00
applicaton,twohour,twohours
theory,math,music

Open in new window

Avatar of jmcg
jmcg
Flag of United States of America image

This was more complicated than first I thought. You are not simply asking to have the unwanted fields suppressed; you want fields completely re-organized. And, I have to assume that the example result file is an example of how you want things to generally look rather than the exact output expected from the offered input.

I don't know that this couldn't be accomplished in awk or shell, but Perl is up to it.

# perl

# for Experts-Exchange.com/questions/28693077

use strict;
use Data::Dumper;

# organize data with these
my %courseinfo = ();
my @namelist = ();

# these will carry over line boundaries
my ($keyname, $keytime);

while( <> ) {

	# for a line matching /^at/ we want to capture the name and time fields
	if( my @matches = m/^at (\d\d:\d\d) (\S+ \S+)/ ) {
		($keytime, $keyname) = @matches;
		unless (exists $courseinfo{$keyname}) {
			$courseinfo{$keyname} = {times=>[] };
			push @namelist, $keyname;
			}
		push @{$courseinfo{$keyname}{times}}, $keytime;
		next;
		}

	# for other lines, split on colons to find fields of interest, but first clean up SMTG fields
	s/:SMTG\w+//g;
	my ( $acad, $of, $wanted) = split /:/;
	next unless $acad =~ m/^academy/;
	
	(my $key2 = $of) =~ s/^of//; 
	$courseinfo{$keyname}{$key2} = [] unless exists $courseinfo{$keyname}{$key2};
	push @{$courseinfo{$keyname}{$key2}}, $wanted;
	}

	### print STDERR Data::Dumper->Dump( [ \%courseinfo], [ qw( *courseinfo ) ] ); ### DEBUG
	
# now, after all records have been read, put out the output...
foreach $keyname (@namelist) {
	printf "%s\n", join ',', $keyname, @{$courseinfo{$keyname}{times}};
	for my $key2 ( sort keys %{$courseinfo{$keyname}} ) {
		next if $key2 eq "times";
		printf "%s\n", join ',', $key2, @{$courseinfo{$keyname}{$key2}};
		}
	print "\n";
	}

Open in new window


So if I run that as imad1.pl against your offered input as imad1.txt

perl -f imad1.pl imad1.txt >imad3.txt

I get this:

carl 1,10:00,14:00
applicaton,onehour,twohours
theory,nothing,nothing

carl 2,10:00,14:00
applicaton,twohour,twohours
theory,math,music

david 1,10:00,14:00
applicaton,halfhour,onehours
theory,geo,programmation

david 2,10:00,14:00
applicaton,nothing,nothing
theory,history,philosophy

Open in new window


I had to make a number of generalizing assumptions about what variations might appear in your real input, so let me know if you run into trouble applying this script outside of the small sample input.
Avatar of ozo
perl -lan00e '
$n="@F[2,3]";
push @{$s{$n}{""}},$F[1];
push @{$s{$n}{$1}},$2 while/:of(\w+):.*:(\w+):/g;
END{
  print $k,map{join(",",$_,@{$v->{$_}}),"\n"}sort keys %$v  while ($k,$v)=each %s;
} ' <<HERE
at 10:00 carl 1 STR0 STR1 STR2 STR3 <STR4 STR5> [STR6 STR7] STR8:
academy/course1:oftheory:SMTGHO:nothing:
academy/course1:ofapplicaton:SMTGHP:onehour:

at 10:00 carl 2 STR0 STR1 STR2 STR3 <STR4 STR78> [STR6 STR111] STR8:
academy/course2:oftheory:SMTGHM:math:
academy/course2:ofapplicaton:SMTGHN:twohour:

at 10:00 david 1 STR0 STR1 STR2 STR3 <STR4 STR758> [STR6 STR155] STR8:
academy/course3:oftheory:SMTGHK:geo:
academy/course3:ofapplicaton:SMTGHL:halfhour:

at 10:00 david 2 STR0 STR1 STR2 STR3 <STR4 STR87> [STR6 STR74] STR8:
academy/course4:oftheory:SMTGH:SMTGHI:history:
academy/course4:ofapplicaton:SMTGHJ:nothing:

at 14:00 carl 1 STR0 STR1 STR2 STR3 <STR4 STR11> [STR6 STR784] STR8:
academy/course5:oftheory:SMTGHG:nothing:
academy/course5:ofapplicaton:SMTGHH:twohours:

at 14:00 carl 2 STR0 STR1 STR2 STR3 <STR4 STR86> [STR6 STR85] STR8:
academy/course6:oftheory:SMTGHE:music:
academy/course6:ofapplicaton:SMTGHF:twohours:

at 14:00 david 1 STR0 STR1 STR2 STR3 <STR4 STR96> [STR6 STR01] STR8:
academy/course7:oftheory:SMTGHC:programmation:
academy/course7:ofapplicaton:SMTGHD:onehours:

at 14:00 david 2 STR0 STR1 STR2 STR3 <STR4 STR335> [STR6 STR66] STR8:
academy/course8:oftheory:SMTGHA:philosophy:
academy/course8:ofapplicaton:SMTGHB:nothing:
                                 
HERE
Avatar of imad imad
imad imad

ASKER

@jmcg  Here is a smooth modification of my input :


academy/course1:offdf5D:SM<wbr ></wbr>TGHP:twohu<wbr ></wbr>r:
academy/course1:zfd6X:SMTG<wbr ></wbr>HP:nonehou<wbr ></wbr>r:
academy/course1:sd99R:SMTG<wbr ></wbr>HP:somthin<wbr ></wbr>g :
academy/course1:qs35H:SMTG<wbr ></wbr>HP:nothing<wbr ></wbr>:
academy/course1:odf33G:SMT<wbr ></wbr>GHP:onehou<wbr ></wbr>r:

at 10:00 carl 2 STR0 STR1 STR2 STR3 <STR4 STR78> [STR6 STR111] STR8:
academy/course2:thefsf8A:S<wbr ></wbr>MTGHM:math<wbr ></wbr>:
academy/course2:fdf5B:SMTG<wbr ></wbr>HN:twohour<wbr ></wbr>:
academy/course2:offdf5D:SM<wbr ></wbr>TGHP:twohu<wbr ></wbr>r:
academy/course2:zfd6X:SMTG<wbr ></wbr>HP:nonehou<wbr ></wbr>r:
academy/course2:sd99R:SMTG<wbr ></wbr>HP:somthin<wbr ></wbr>g :
academy/course2:qs35H:SMTG<wbr ></wbr>HP:nothing<wbr ></wbr>:
academy/course2:odf33G:SMT<wbr ></wbr>GHP:onehou<wbr ></wbr>r:

at 10:00 david 1 STR0 STR1 STR2 STR3 <STR4 STR758> [STR6 STR155] STR8:
academy/course3:thefsf8A:S<wbr ></wbr>MTGHK:geo:<wbr ></wbr>
academy/course3:fdf5B:SMTG<wbr ></wbr>HL:halfhou<wbr ></wbr>r:
academy/course3:offdf5D:SM<wbr ></wbr>TGHb:twohu<wbr ></wbr>r:
academy/course3:zfd6X:SMTG<wbr ></wbr>HPx:noneho<wbr ></wbr>ur:
academy/course3:sd99R:SMTG<wbr ></wbr>Hw:somthin<wbr ></wbr>g :
academy/course3:qs35H:SMTG<wbr ></wbr>HbP:nothin<wbr ></wbr>g:
academy/course3:odf33G:SMT<wbr ></wbr>GHPs:oneho<wbr ></wbr>ur:

at 10:00 david 2 STR0 STR1 STR2 STR3 <STR4 STR87> [STR6 STR74] STR8:
academy/course4:thefsf8A:S<wbr ></wbr>MTGH:SMTGH<wbr ></wbr>I:history:<wbr ></wbr>
academy/course4:fdf5B:SMTG<wbr ></wbr>HJ:nothing<wbr ></wbr>:
academy/course4:offdf5D:SM<wbr ></wbr>TGHd:twohu<wbr ></wbr>r:
academy/course4:zfd6X:SMTG<wbr ></wbr>Hg:nonehou<wbr ></wbr>r:
academy/course4:sd99R:SMTG<wbr ></wbr>Hs:somthin<wbr ></wbr>g :
academy/course4:qs35H:SMTG<wbr ></wbr>Hb:nothing<wbr ></wbr>:
academy/course4:odf33G:SMT<wbr ></wbr>GHs:onehou<wbr ></wbr>r:

at 14:00 carl 1 STR0 STR1 STR2 STR3 <STR4 STR11> [STR6 STR784] STR8:
academy/course5:thefsf8A:S<wbr ></wbr>MTGHG:noth<wbr ></wbr>ing:
academy/course5:fdf5B:SMTG<wbr ></wbr>HH:twohour<wbr ></wbr>s:
academy/course5:offdf5D:SM<wbr ></wbr>TGHf:twohu<wbr ></wbr>r:
academy/course5:zfd6X:SMTG<wbr ></wbr>Hgd:noneho<wbr ></wbr>ur:
academy/course5:sd99R:SMTG<wbr ></wbr>Hsf:somthi<wbr ></wbr>ng :
academy/course5:qs35H:SMTG<wbr ></wbr>Hbs:nothin<wbr ></wbr>g:
academy/course5:odf33G:SMT<wbr ></wbr>GHsf:oneho<wbr ></wbr>ur:

at 14:00 carl 2 STR0 STR1 STR2 STR3 <STR4 STR86> [STR6 STR85] STR8:
academy/course6:thefsf8A:S<wbr ></wbr>MTGHEx:mus<wbr ></wbr>ic:
academy/course6:fdf5B:SMTG<wbr ></wbr>HF:twohour<wbr ></wbr>s:
academy/course6:offdf5D:SM<wbr ></wbr>TGHdf:twoh<wbr ></wbr>ur:
academy/course6:zfd6X:SMTG<wbr ></wbr>Hs:nonehou<wbr ></wbr>r:
academy/course6:sd99R:SMTG<wbr ></wbr>Hqf:somthi<wbr ></wbr>ng :
academy/course6:qs35H:SMTG<wbr ></wbr>Hv:nothing<wbr ></wbr>:
academy/course6:odf33G:SMT<wbr ></wbr>GHw:onehou<wbr ></wbr>r:
at 14:00 david 1 STR0 STR1 STR2 STR3 <STR4 STR96> [STR6 STR01] STR8:
academy/course7:thefsf8A:S<wbr ></wbr>MTGHC:prog<wbr ></wbr>rammation:<wbr ></wbr>
academy/course7:fdf5B:SMTG<wbr ></wbr>HDs:onehou<wbr ></wbr>rs:
academy/course7:thefsf8A:S<wbr ></wbr>MTGHdx:mus<wbr ></wbr>ic:
academy/course7:fdf5B:SMTG<wbr ></wbr>HsF:twohou<wbr ></wbr>rs:
academy/course7:offdf5D:SM<wbr ></wbr>TGHqf:twoh<wbr ></wbr>ur:
academy/course7:zfd6X:SMTG<wbr ></wbr>Hws:noneho<wbr ></wbr>ur:
academy/course7:sd99R:SMTG<wbr ></wbr>Hwf:somthi<wbr ></wbr>ng :
academy/course7:qs35H:SMTG<wbr ></wbr>Hcv:nothin<wbr ></wbr>g:
academy/course7:odf33G:SMT<wbr ></wbr>GHv:onehou<wbr ></wbr>r:
at 14:00 david 2 STR0 STR1 STR2 STR3 <STR4 STR335> [STR6 STR66] STR8:
academy/course8:thefsf8A:S<wbr ></wbr>MTGHA:phil<wbr ></wbr>osophy:
academy/course8:fdf5B:SMTG<wbr ></wbr>HhB:nothin<wbr ></wbr>g:
academy/course8:offdf5D:SM<wbr ></wbr>TGeHqf:two<wbr ></wbr>hur:
academy/course8:zfd6X:SMTG<wbr ></wbr>Hfws:noneh<wbr ></wbr>our:
academy/course8:sd99R:SMTG<wbr ></wbr>Hdwf:somth<wbr ></wbr>ing :
academy/course8:qs35H:SMTG<wbr ></wbr>Hcvv:nothi<wbr ></wbr>ng:
academy/course8:odf33G:SMT<wbr ></wbr>GHbv:oneho<wbr ></wbr>ur:

Open in new window



The I input I would like :


carl 1,10:00,14:00
A, --,--
B, --,--
D --,--
X --,--
R --,--
H --,--
G --,--

carl 2,10:00,14:00
A, --,--
B, --,--
D --,--
X --,--
R --,--
H --,--
G --,--

david 1,10:00,14:00
A, --,--
B, --,--
D --,--
X --,--
R --,--
H --,--
G --,--

david 2,10:00,14:00
A, --,--
B, --,--
D --,--
X --,--
R --,--
H --,--
G --,--

Open in new window



the '--' refers to the values  ' onehour , nothing , math, .....' normaly they should be displayed
Are the blank lines between the last 3 entries really missing?

perl -ln00e 'for(split/(?=^at )/m){ ($t,$n)=/\s(\S+) (\S+ \S+)/;
push @{$s{$n}{""}},$t;
push @{$s{$n}{$1}},$2 while/:\w*(\w):.*:(\w+)</g;
}END{
  print $k,map{join(",",$_,@{$v->{$_}}),"\n"}sort keys %$v  while ($k,$v)=each %s;
} '
Oh mY BAD , here is the correct vesion of the Input and  Output :
at 10:00 carl 1 STR0 STR1 STR2 STR3 <STR4 STR78> [STR6 STR111] STR8:
academy/course1:thefsf8A:SMTGHM:Philo:
academy/course1:fdf5B:SMTGHN:twohour:
academy/course1:offdf5D:SMTGHP:twohur:
academy/course1:zfd6X:SMTGHP:nonehour:
academy/course1:sd99R:SMTGHP:somthing:
academy/course1:qs35H:SMTGHP:nothing:
academy/course1:odf33G:SMTGHP:onehour:
at 10:00 carl 2 STR0 STR1 STR2 STR3 <STR4 STR78> [STR6 STR111] STR8:
academy/course2:thefsf8A:SMTGHM:math:
academy/course2:fdf5B:SMTGHN:twohour:
academy/course2:offdf5D:SMTGHP:twohur:
academy/course2:zfd6X:SMTGHP:nonehour:
academy/course2:sd99R:SMTGHP:somthing:
academy/course2:qs35H:SMTGHP:nothing:
academy/course2:odf33G:SMTGHP:onehour:
at 10:00 david 1 STR0 STR1 STR2 STR3 <STR4 STR758> [STR6 STR155] STR8:
academy/course3:thefsf8A:SMTGHK:geo:
academy/course3:fdf5B:SMTGHL:halfhour:
academy/course3:offdf5D:SMTGHb:twohur:
academy/course3:zfd6X:SMTGHPx:nonehour:
academy/course3:sd99R:SMTGHw:somthing:
academy/course3:qs35H:SMTGHbP:nothing:
academy/course3:odf33G:SMTGHPs:onehour:
at 10:00 david 2 STR0 STR1 STR2 STR3 <STR4 STR87> [STR6 STR74] STR8:
academy/course4:thefsf8A:SMTGH:SMTGHI:history:
academy/course4:fdf5B:SMTGHJ:nothing:
academy/course4:offdf5D:SMTGHd:twohur:
academy/course4:zfd6X:SMTGHg:nonehour:
academy/course4:sd99R:SMTGHs:somthing :
academy/course4:qs35H:SMTGHb:nothing:
academy/course4:odf33G:SMTGHs:onehour:
at 14:00 carl 1 STR0 STR1 STR2 STR3 <STR4 STR11> [STR6 STR784] STR8:
academy/course5:thefsf8A:SMTGHG:nothing:
academy/course5:fdf5B:SMTGHH:twohours:
academy/course5:offdf5D:SMTGHf:twohur:
academy/course5:zfd6X:SMTGHgd:nonehour:
academy/course5:sd99R:SMTGHsf:somthing:
academy/course5:qs35H:SMTGHbs:nothing:
academy/course5:odf33G:SMTGHsf:onehour:
at 14:00 carl 2 STR0 STR1 STR2 STR3 <STR4 STR86> [STR6 STR85] STR8:
academy/course6:thefsf8A:SMTGHEx:music:
academy/course6:fdf5B:SMTGHF:twohours:
academy/course6:offdf5D:SMTGHdf:twohur:
academy/course6:zfd6X:SMTGHs:nonehour:
academy/course6:sd99R:SMTGHqf:somthing:
academy/course6:qs35H:SMTGHv:nothing:
academy/course6:odf33G:SMTGHw:onehour:
at 14:00 david 1 STR0 STR1 STR2 STR3 <STR4 STR96> [STR6 STR01] STR8:
academy/course7:thefsf8A:SMTGHC:programmation:
academy/course7:fdf5B:SMTGHDs:onehours:
academy/course7:thefsf8A:SMTGHdx:music:
academy/course7:fdf5B:SMTGHsF:twohours:
academy/course7:offdf5D:SMTGHqf:twohur:
academy/course7:zfd6X:SMTGHws:nonehour:
academy/course7:sd99R:SMTGHwf:somthing:
academy/course7:qs35H:SMTGHcv:nothing:
academy/course7:odf33G:SMTGHv:onehour:
at 14:00 david 2 STR0 STR1 STR2 STR3 <STR4 STR335> [STR6 STR66] STR8:
academy/course8:thefsf8A:SMTGHA:philosophy:
academy/course8:fdf5B:SMTGHhB:nothing:
academy/course8:offdf5D:SMTGeHqf:twohur:
academy/course8:zfd6X:SMTGHfws:nonehour:
academy/course8:sd99R:SMTGHdwf:somthing:
academy/course8:qs35H:SMTGHcvv:nothing:
academy/course8:odf33G:SMTGHbv:onehour:

Open in new window




here how the output should looks like :

carl 1,10:00,14:00
A,--,--
B,--,--
D,--,--
X,--,--
R,--,--
H,--,--
G,--,--

carl 2,10:00,14:00
A,--,--
B,--,--
D,--,--
X,--,--
R,--,--
H,--,--
G,--,--

david 1,10:00,14:00
A,--,--
B,--,--
D,--,--
X,--,--
R,--,--
H,--,--
G,--,--

david 2,10:00,14:00
A,--,--
B,--,--
D,--,--
X,--,--
R,--,--
H,--,--
G,--,--

Open in new window


the '--' refers to the values  ' onehour , nothing , math, .....' normaly they should be displayed
ASKER CERTIFIED SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I wrote this
perl -lane 'BEGIN{$/="at "} $n="@F[1,2]";
push @{$s{$n}{""}},$F[0];
push @{$s{$n}{$1}},$2 while/:\w*(\w):.*:(\w+):/g;
END{
  print $k,map{join(",",$_,@{$v->{$_}}),"\n"}sort keys %$v  while ($k,$v)=each %s;
} ' test20.txt

Open in new window


I got this :

String found where operator expected at 1.pl line 6, near "}'"
  (Might be a runaway multi-line '' string starting on line 1)
        (Missing semicolon on previous line?)
Bareword found where operator expected at 1.pl line 6, near "}' test20"
        (Missing operator before test20?)
syntax error at 1.pl line 6, near "}'"
Execution of 1.pl aborted due to compilation errors.

Open in new window


the command I have executed is :

perl  1.pl  

Open in new window

The command to execute would be
perl -lane 'BEGIN{$/="at "}
$n="@F[1,2]";
push @{$s{$n}{""}},$F[0];
push @{$s{$n}{$1}},$2 while/:\w*(\w):.*:(\w+):/g;
END{
  print $k,map{join(",",$_,@{$v->{$_}}),"\n"}sort keys %$v  while ($k,$v)=each %s;
} ' test20.txt
not
perl  1.pl
The relationship between input and output has become too obscure for me to work it out.