Solved

Extract filer name from large text file.

Posted on 2011-03-10
7
326 Views
Last Modified: 2012-06-27
I have a huge file with a list of filers in the following format

hostname|filer:/vol/path/path1/path on /private/mount type nfs (rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=120,nfsvers=3,timeo=600,addr=xxx.yyy.zz.qq)|filer:/vol/path/path1/path on /private/mountpoint type nfs (rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=120,nfsvers=3,timeo=600,addr=xxx.yyy.zz.qq)|


I need to get each unique filer name which in this case is svaashna02. How can I do that with regular expression?
0
Comment
Question by:jaxstorm
  • 4
  • 3
7 Comments
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 35093548
Assuming that the names in question always appear between "|" and ":" you could try this:

awk -F'[|:]' '{for (i=2;i<=NF;i+=2) print $i}' inputlist | uniq

or, for sorted output

awk -F'[|:]' '{for (i=2;i<=NF;i+=2) print $i}' inputlist | sort -u

wmp
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 35093596
... makes no real difference, but in this particular case

... i<NF ...

seems better.
0
 
LVL 8

Author Comment

by:jaxstorm
ID: 35093646
Hi wmp,

That looked good until I realised that sometimes, there are multiple filers on a line, like in this example

hostname|filer:/vol/path/path1/path on /private/mount type nfs (rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=120,nfsvers=3,timeo=600,addr=xxx.yyy.zz.qq)|filer:/vol/path/path1/path on /private/mountpoint type nfs (rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=120,nfsvers=3,timeo=600,addr=xxx.yyy.zz.qq)|


So getting the filers between | and : is a good plan but I don't think it's gotten all the filers when there are multiple lines, or has it? The list is so big it's hard to tell!
0
Backup Your Microsoft Windows Server®

Backup all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

 
LVL 68

Expert Comment

by:woolmilkporc
ID: 35093718
Sure it has!

Take just the line you posted and run my command against it, omitting the "sort" or "uniq" filters, and see what you get!
Then simply duplicate the line and rerun. You should see twice the number of names.



0
 
LVL 8

Author Comment

by:jaxstorm
ID: 35093879
In that case, if you can explain each awk switch I'll give you the points. I'd like to improve my AWK knowledge and have no idea what your awk script did ;)
0
 
LVL 68

Accepted Solution

by:
woolmilkporc earned 500 total points
ID: 35094106
OK, here you go:

"-F" configures the field separator to be applied by awk.
"[|:]" means that each of both characters - the pipe as well as the colon - should indicate a field boundary.
By the way, this is the "regular expression" you've been asking for.

Taking these separators and looking at your data you'll see that the second field (the one following a first separator ("|" in this case) and terminated by the next separator (":" in this case, but fields between "|" and "|" or between ":" and ":" or any combination would also match) is one of the fields we're searching for.

Continuing the search we see that the next relevant fields are number 4, then 6, 8 and so on.
The odd fileds contain the stuff between the colon and the next pipe ("/vol/xxxxxxxx03_gti_xxxx_dev_01/ ....") which is irrelevant here.

So we only need to print out the even numbered fields.
To accomplish this we set up a "for" loop using the variable "i", starting at 2 ("i=2"), incrementing it by 2 ("i+=2") as long as  the end of the line is not yet reached ("i<NF", where "NF" is an internal awk variable meaning "number of fields").
"$i" will thus contain "$2", "$4" etc., which is the awk notation for field numbers.
Finally we simply "print" out these fields.  

awk takes its data from the file inputlist reading and processing line by line until the end of inputlist is reached.

You asked for "unique" names, so we must filter the output.
Sorry, I made a mistake above, "uniq" only works correctly on already sorted lists, so better use "sort -u".

"sort -u" sorts its input data in ascending order, suppressing all but one line in each set of identical lines, thus making the output lines "unique".

Have fun with awk!

wmp
0
 
LVL 8

Author Closing Comment

by:jaxstorm
ID: 35094177
Excellent, thorough answer. Well deserved
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Setting up Secure Ubuntu server on VMware 1.      Insert the Ubuntu Server distribution CD or attach the ISO of the CD which is in the “Datastore”. Note that it is important to install the x64 edition on servers, not the X86 editions. 2.      Power on th…
I. Introduction There's an interesting discussion going on now in an Experts Exchange Group — Attachments with no extension (http://www.experts-exchange.com/discussions/210281/Attachments-with-no-extension.html). This reminded me of questions tha…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.

911 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now