how to parse a text file and get substring in bash

Hi Friends,
pls check the attached sample input file.

InputFile.txt content
        [added] ProjectoNe/outbound/internal/wsdl/content/
        [added] projectTwo/Resources/Interfaces/
        [added] ProjectoNe/inbound/internal/xsd/collection/
        [added] ProjectThree/Resources/
        [added] ProjectoNe/outbound/external/wsdl/
        [added] ProjectoNe/inbound/internal/wsdl/ins/
        [added] ProjectFour
        [changed] ChangedProjectOne/Resources/Interfaces/_v1_0.WSDL  ( size 2418403 : 2418403 )
        [changed] ProjectoNe/outbound/external/xsd/credit/_3_.XMLSchema  ( size 122920 : 122920 )
        [changed] ChangedProjectTwo/Resources/Interfaces/CPlan.WSDL  ( size 2122404 : 2122404 )
        [removed] removedProjectOne/BusinessService/HTTPBS_v_1.BusinessService  ( size 2454 : 2547 )
        [changed] ProjectoNe/outbound/internal/xsd/_banking.XMLSchema  ( size 104564 : 104564 )
        [removed] removedProjectTwo/ProxyService/_v2_0.ProxyService  ( size 182292 : 182292 )
        [changed] ProjectoNe/outbound/internal/xsd/request.XMLSchema  ( size 294003 : 294003 )


Would need ur help in getting below output pls
added=ProjectoNe,projectTwo,ProjectThree,ProjectFour
changed=ChangedProjectOne,ChangedProjectTwo
removed=removedProjectOne,removedProjectTwo

Basically unique value (remove duplicates) and corresponding action property (added, changed, removed)

I'm trying but, couldn't figure out

Thanks in advance.
enthuguyAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

woolmilkporcCommented:
awk 'BEGIN {A="added=";C="changed=";R="removed="} {
     if($2~"/") S=substr($2,1,index($2,"/")-1); else S=$2;
     if($1~/added/) {if(A!~S) A=A S ","}
     if($1~/changed/) {if(C!~S) C=C S ","}
     if($1~/removed/) {if(R!~S) R=R S ","}}
             END {print substr(A,1,length(A)-1);
                  print substr(C,1,length(C)-1);
                  print substr(R,1,length(R)-1)}'  inputfile

Open in new window


How should we handle this line:

[changed] ProjectoNe/outbound/external/xsd/credit/_3_.XMLSchema  ( size 122920 : 122920 )

In your sample output "ProjectoNe" does not appear as "changed"! Is this a mistake of yours, or is it intended?

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
enthuguyAuthor Commented:
Thanks a lot woolmilkporc, that was great and interesting script :)

I checked my actual log file, for ProjectoNe which appears in both, but you are right, it is not possible. I will check my script which produced that log.

another request, could you explain each line for me please?
ozoCommented:
#!/bin/bash
cat > InputFile.txt <<EOF
        [added] ProjectoNe/outbound/internal/wsdl/content/
        [added] projectTwo/Resources/Interfaces/
        [added] ProjectoNe/inbound/internal/xsd/collection/
        [added] ProjectThree/Resources/
        [added] ProjectoNe/outbound/external/wsdl/
        [added] ProjectoNe/inbound/internal/wsdl/ins/
        [added] ProjectFour
        [changed] ChangedProjectOne/Resources/Interfaces/_v1_0.WSDL  ( size 2418403 : 2418403 )
        [changed] ProjectoNe/outbound/external/xsd/credit/_3_.XMLSchema  ( size 122920 : 122920 )
        [changed] ChangedProjectTwo/Resources/Interfaces/CPlan.WSDL  ( size 2122404 : 2122404 )
        [removed] removedProjectOne/BusinessService/HTTPBS_v_1.BusinessService  ( size 2454 : 2547 )
        [changed] ProjectoNe/outbound/internal/xsd/_banking.XMLSchema  ( size 104564 : 104564 )
        [removed] removedProjectTwo/ProxyService/_v2_0.ProxyService  ( size 182292 : 182292 )
        [changed] ProjectoNe/outbound/internal/xsd/request.XMLSchema  ( size 294003 : 294003 )
EOF
declare -A added changed removed
while read a b ; do
  eval ${a//[\[\]]}[${b%%/*}]=${b%%/*}
done < InputFile.txt
added=${added[*]}
changed=${changed[*]}
removed=${removed[*]}
echo added=${added// /,}
echo changed=${changed// /,}
echo removed=${removed// /,}
Acronis True Image 2019 just released!

Create a reliable backup. Make sure you always have dependable copies of your data so you can restore your entire system or individual files.

ozoCommented:
If it doesn't have to be bash
perl -F'\W+' -alne '$h{$F[1]}{$F[2]}++;END{print "$_=",join",",keys %{$h{$_}} for keys %h}'  InputFile.txt
woolmilkporcCommented:
Thx for the points - and here is the explanation:

BEGIN {A="added=";C="changed=";R="removed="}

sets initial values for the output lines, before reading any input (that's what "BEGIN" is meant for)

if($2~"/") S=substr($2,1,index($2,"/")-1); else S=$2;

finds out which value ("S") to add to the respective output line.
If there is a slash in the second field we take the substring up to (but not including)
that slash, otherwise we just take the second field as a whole (needed for lines like
" [added] ProjectFour"

if($1~/added/) {if(A!~S) A=A S ","}
if($1~/changed/) {if(C!~S) C=C S ","}
if($1~/removed/) {if(R!~S) R=R S ","}}


Here we parse the first field to decide which output line to manipulate (A/C/R).
If the respective output line doesn't yet contain the value found in $2 ("S", see above)
then we append it, followed by a comma.

END {print substr(A,1,length(A)-1);
                  print substr(C,1,length(C)-1);
                  print substr(R,1,length(R)-1)}'


A the end, when all input is read (that's what "END" is meant for) we just print out the results
which are now stored in the variables "A", "C" or "R", respectively,
Since we always had to append a comma along with the particular value
we now need to remove the very last, unnecessary comma.
There are several ways to do this, here I just took the substring of the respective output line
with a length which is one less than the actual length, thus eliminating the last character.
An alternative would have been using "sub" (substitute) instead of "substr":
sub(",$","",A); print A
enthuguyAuthor Commented:
Thanks Ozo

and thanks woolmilkporc for your explanation,..getting confidence now :)
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Shell Scripting

From novice to tech pro — start learning today.