Solved

Extract filer name from large text file.

Posted on 2011-03-10
7
330 Views
Last Modified: 2012-06-27
I have a huge file with a list of filers in the following format

hostname|filer:/vol/path/path1/path on /private/mount type nfs (rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=120,nfsvers=3,timeo=600,addr=xxx.yyy.zz.qq)|filer:/vol/path/path1/path on /private/mountpoint type nfs (rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=120,nfsvers=3,timeo=600,addr=xxx.yyy.zz.qq)|


I need to get each unique filer name which in this case is svaashna02. How can I do that with regular expression?
0
Comment
Question by:jaxstorm
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
7 Comments
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 35093548
Assuming that the names in question always appear between "|" and ":" you could try this:

awk -F'[|:]' '{for (i=2;i<=NF;i+=2) print $i}' inputlist | uniq

or, for sorted output

awk -F'[|:]' '{for (i=2;i<=NF;i+=2) print $i}' inputlist | sort -u

wmp
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 35093596
... makes no real difference, but in this particular case

... i<NF ...

seems better.
0
 
LVL 8

Author Comment

by:jaxstorm
ID: 35093646
Hi wmp,

That looked good until I realised that sometimes, there are multiple filers on a line, like in this example

hostname|filer:/vol/path/path1/path on /private/mount type nfs (rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=120,nfsvers=3,timeo=600,addr=xxx.yyy.zz.qq)|filer:/vol/path/path1/path on /private/mountpoint type nfs (rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=120,nfsvers=3,timeo=600,addr=xxx.yyy.zz.qq)|


So getting the filers between | and : is a good plan but I don't think it's gotten all the filers when there are multiple lines, or has it? The list is so big it's hard to tell!
0
Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

 
LVL 68

Expert Comment

by:woolmilkporc
ID: 35093718
Sure it has!

Take just the line you posted and run my command against it, omitting the "sort" or "uniq" filters, and see what you get!
Then simply duplicate the line and rerun. You should see twice the number of names.



0
 
LVL 8

Author Comment

by:jaxstorm
ID: 35093879
In that case, if you can explain each awk switch I'll give you the points. I'd like to improve my AWK knowledge and have no idea what your awk script did ;)
0
 
LVL 68

Accepted Solution

by:
woolmilkporc earned 500 total points
ID: 35094106
OK, here you go:

"-F" configures the field separator to be applied by awk.
"[|:]" means that each of both characters - the pipe as well as the colon - should indicate a field boundary.
By the way, this is the "regular expression" you've been asking for.

Taking these separators and looking at your data you'll see that the second field (the one following a first separator ("|" in this case) and terminated by the next separator (":" in this case, but fields between "|" and "|" or between ":" and ":" or any combination would also match) is one of the fields we're searching for.

Continuing the search we see that the next relevant fields are number 4, then 6, 8 and so on.
The odd fileds contain the stuff between the colon and the next pipe ("/vol/xxxxxxxx03_gti_xxxx_dev_01/ ....") which is irrelevant here.

So we only need to print out the even numbered fields.
To accomplish this we set up a "for" loop using the variable "i", starting at 2 ("i=2"), incrementing it by 2 ("i+=2") as long as  the end of the line is not yet reached ("i<NF", where "NF" is an internal awk variable meaning "number of fields").
"$i" will thus contain "$2", "$4" etc., which is the awk notation for field numbers.
Finally we simply "print" out these fields.  

awk takes its data from the file inputlist reading and processing line by line until the end of inputlist is reached.

You asked for "unique" names, so we must filter the output.
Sorry, I made a mistake above, "uniq" only works correctly on already sorted lists, so better use "sort -u".

"sort -u" sorts its input data in ascending order, suppressing all but one line in each set of identical lines, thus making the output lines "unique".

Have fun with awk!

wmp
0
 
LVL 8

Author Closing Comment

by:jaxstorm
ID: 35094177
Excellent, thorough answer. Well deserved
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I. Introduction There's an interesting discussion going on now in an Experts Exchange Group — Attachments with no extension (http://www.experts-exchange.com/discussions/210281/Attachments-with-no-extension.html). This reminded me of questions tha…
This article will show, step by step, how to integrate R code into a R Sweave document
The viewer will learn how to dynamically set the form action using jQuery.
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.

717 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question