Solved

Extract filer name from large text file.

Posted on 2011-03-10
7
329 Views
Last Modified: 2012-06-27
I have a huge file with a list of filers in the following format

hostname|filer:/vol/path/path1/path on /private/mount type nfs (rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=120,nfsvers=3,timeo=600,addr=xxx.yyy.zz.qq)|filer:/vol/path/path1/path on /private/mountpoint type nfs (rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=120,nfsvers=3,timeo=600,addr=xxx.yyy.zz.qq)|


I need to get each unique filer name which in this case is svaashna02. How can I do that with regular expression?
0
Comment
Question by:jaxstorm
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
7 Comments
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 35093548
Assuming that the names in question always appear between "|" and ":" you could try this:

awk -F'[|:]' '{for (i=2;i<=NF;i+=2) print $i}' inputlist | uniq

or, for sorted output

awk -F'[|:]' '{for (i=2;i<=NF;i+=2) print $i}' inputlist | sort -u

wmp
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 35093596
... makes no real difference, but in this particular case

... i<NF ...

seems better.
0
 
LVL 8

Author Comment

by:jaxstorm
ID: 35093646
Hi wmp,

That looked good until I realised that sometimes, there are multiple filers on a line, like in this example

hostname|filer:/vol/path/path1/path on /private/mount type nfs (rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=120,nfsvers=3,timeo=600,addr=xxx.yyy.zz.qq)|filer:/vol/path/path1/path on /private/mountpoint type nfs (rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=120,nfsvers=3,timeo=600,addr=xxx.yyy.zz.qq)|


So getting the filers between | and : is a good plan but I don't think it's gotten all the filers when there are multiple lines, or has it? The list is so big it's hard to tell!
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 68

Expert Comment

by:woolmilkporc
ID: 35093718
Sure it has!

Take just the line you posted and run my command against it, omitting the "sort" or "uniq" filters, and see what you get!
Then simply duplicate the line and rerun. You should see twice the number of names.



0
 
LVL 8

Author Comment

by:jaxstorm
ID: 35093879
In that case, if you can explain each awk switch I'll give you the points. I'd like to improve my AWK knowledge and have no idea what your awk script did ;)
0
 
LVL 68

Accepted Solution

by:
woolmilkporc earned 500 total points
ID: 35094106
OK, here you go:

"-F" configures the field separator to be applied by awk.
"[|:]" means that each of both characters - the pipe as well as the colon - should indicate a field boundary.
By the way, this is the "regular expression" you've been asking for.

Taking these separators and looking at your data you'll see that the second field (the one following a first separator ("|" in this case) and terminated by the next separator (":" in this case, but fields between "|" and "|" or between ":" and ":" or any combination would also match) is one of the fields we're searching for.

Continuing the search we see that the next relevant fields are number 4, then 6, 8 and so on.
The odd fileds contain the stuff between the colon and the next pipe ("/vol/xxxxxxxx03_gti_xxxx_dev_01/ ....") which is irrelevant here.

So we only need to print out the even numbered fields.
To accomplish this we set up a "for" loop using the variable "i", starting at 2 ("i=2"), incrementing it by 2 ("i+=2") as long as  the end of the line is not yet reached ("i<NF", where "NF" is an internal awk variable meaning "number of fields").
"$i" will thus contain "$2", "$4" etc., which is the awk notation for field numbers.
Finally we simply "print" out these fields.  

awk takes its data from the file inputlist reading and processing line by line until the end of inputlist is reached.

You asked for "unique" names, so we must filter the output.
Sorry, I made a mistake above, "uniq" only works correctly on already sorted lists, so better use "sort -u".

"sort -u" sorts its input data in ascending order, suppressing all but one line in each set of identical lines, thus making the output lines "unique".

Have fun with awk!

wmp
0
 
LVL 8

Author Closing Comment

by:jaxstorm
ID: 35094177
Excellent, thorough answer. Well deserved
0

Featured Post

Visualize your virtual and backup environments

Create well-organized and polished visualizations of your virtual and backup environments when planning VMware vSphere, Microsoft Hyper-V or Veeam deployments. It helps you to gain better visibility and valuable business insights.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The purpose of this article is to demonstrate how we can use conditional statements using Python.
Fine Tune your automatic Updates for Ubuntu / Debian
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question