Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 334
  • Last Modified:

Extract filer name from large text file.

I have a huge file with a list of filers in the following format

hostname|filer:/vol/path/path1/path on /private/mount type nfs (rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=120,nfsvers=3,timeo=600,addr=xxx.yyy.zz.qq)|filer:/vol/path/path1/path on /private/mountpoint type nfs (rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=120,nfsvers=3,timeo=600,addr=xxx.yyy.zz.qq)|


I need to get each unique filer name which in this case is svaashna02. How can I do that with regular expression?
0
jaxstorm
Asked:
jaxstorm
  • 4
  • 3
1 Solution
 
woolmilkporcCommented:
Assuming that the names in question always appear between "|" and ":" you could try this:

awk -F'[|:]' '{for (i=2;i<=NF;i+=2) print $i}' inputlist | uniq

or, for sorted output

awk -F'[|:]' '{for (i=2;i<=NF;i+=2) print $i}' inputlist | sort -u

wmp
0
 
woolmilkporcCommented:
... makes no real difference, but in this particular case

... i<NF ...

seems better.
0
 
jaxstormAuthor Commented:
Hi wmp,

That looked good until I realised that sometimes, there are multiple filers on a line, like in this example

hostname|filer:/vol/path/path1/path on /private/mount type nfs (rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=120,nfsvers=3,timeo=600,addr=xxx.yyy.zz.qq)|filer:/vol/path/path1/path on /private/mountpoint type nfs (rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=120,nfsvers=3,timeo=600,addr=xxx.yyy.zz.qq)|


So getting the filers between | and : is a good plan but I don't think it's gotten all the filers when there are multiple lines, or has it? The list is so big it's hard to tell!
0
Get your Disaster Recovery as a Service basics

Disaster Recovery as a Service is one go-to solution that revolutionizes DR planning. Implementing DRaaS could be an efficient process, easily accessible to non-DR experts. Learn about monitoring, testing, executing failovers and failbacks to ensure a "healthy" DR environment.

 
woolmilkporcCommented:
Sure it has!

Take just the line you posted and run my command against it, omitting the "sort" or "uniq" filters, and see what you get!
Then simply duplicate the line and rerun. You should see twice the number of names.



0
 
jaxstormAuthor Commented:
In that case, if you can explain each awk switch I'll give you the points. I'd like to improve my AWK knowledge and have no idea what your awk script did ;)
0
 
woolmilkporcCommented:
OK, here you go:

"-F" configures the field separator to be applied by awk.
"[|:]" means that each of both characters - the pipe as well as the colon - should indicate a field boundary.
By the way, this is the "regular expression" you've been asking for.

Taking these separators and looking at your data you'll see that the second field (the one following a first separator ("|" in this case) and terminated by the next separator (":" in this case, but fields between "|" and "|" or between ":" and ":" or any combination would also match) is one of the fields we're searching for.

Continuing the search we see that the next relevant fields are number 4, then 6, 8 and so on.
The odd fileds contain the stuff between the colon and the next pipe ("/vol/xxxxxxxx03_gti_xxxx_dev_01/ ....") which is irrelevant here.

So we only need to print out the even numbered fields.
To accomplish this we set up a "for" loop using the variable "i", starting at 2 ("i=2"), incrementing it by 2 ("i+=2") as long as  the end of the line is not yet reached ("i<NF", where "NF" is an internal awk variable meaning "number of fields").
"$i" will thus contain "$2", "$4" etc., which is the awk notation for field numbers.
Finally we simply "print" out these fields.  

awk takes its data from the file inputlist reading and processing line by line until the end of inputlist is reached.

You asked for "unique" names, so we must filter the output.
Sorry, I made a mistake above, "uniq" only works correctly on already sorted lists, so better use "sort -u".

"sort -u" sorts its input data in ascending order, suppressing all but one line in each set of identical lines, thus making the output lines "unique".

Have fun with awk!

wmp
0
 
jaxstormAuthor Commented:
Excellent, thorough answer. Well deserved
0

Featured Post

[Webinar] Cloud and Mobile-First Strategy

Maybe you’ve fully adopted the cloud since the beginning. Or maybe you started with on-prem resources but are pursuing a “cloud and mobile first” strategy. Getting to that end state has its challenges. Discover how to build out a 100% cloud and mobile IT strategy in this webinar.

  • 4
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now