Solved

Extract filer name from large text file.

Posted on 2011-03-10
7
328 Views
Last Modified: 2012-06-27
I have a huge file with a list of filers in the following format

hostname|filer:/vol/path/path1/path on /private/mount type nfs (rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=120,nfsvers=3,timeo=600,addr=xxx.yyy.zz.qq)|filer:/vol/path/path1/path on /private/mountpoint type nfs (rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=120,nfsvers=3,timeo=600,addr=xxx.yyy.zz.qq)|


I need to get each unique filer name which in this case is svaashna02. How can I do that with regular expression?
0
Comment
Question by:jaxstorm
  • 4
  • 3
7 Comments
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 35093548
Assuming that the names in question always appear between "|" and ":" you could try this:

awk -F'[|:]' '{for (i=2;i<=NF;i+=2) print $i}' inputlist | uniq

or, for sorted output

awk -F'[|:]' '{for (i=2;i<=NF;i+=2) print $i}' inputlist | sort -u

wmp
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 35093596
... makes no real difference, but in this particular case

... i<NF ...

seems better.
0
 
LVL 8

Author Comment

by:jaxstorm
ID: 35093646
Hi wmp,

That looked good until I realised that sometimes, there are multiple filers on a line, like in this example

hostname|filer:/vol/path/path1/path on /private/mount type nfs (rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=120,nfsvers=3,timeo=600,addr=xxx.yyy.zz.qq)|filer:/vol/path/path1/path on /private/mountpoint type nfs (rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=120,nfsvers=3,timeo=600,addr=xxx.yyy.zz.qq)|


So getting the filers between | and : is a good plan but I don't think it's gotten all the filers when there are multiple lines, or has it? The list is so big it's hard to tell!
0
Resolve Critical IT Incidents Fast

If your data, services or processes become compromised, your organization can suffer damage in just minutes and how fast you communicate during a major IT incident is everything. Learn how to immediately identify incidents & best practices to resolve them quickly and effectively.

 
LVL 68

Expert Comment

by:woolmilkporc
ID: 35093718
Sure it has!

Take just the line you posted and run my command against it, omitting the "sort" or "uniq" filters, and see what you get!
Then simply duplicate the line and rerun. You should see twice the number of names.



0
 
LVL 8

Author Comment

by:jaxstorm
ID: 35093879
In that case, if you can explain each awk switch I'll give you the points. I'd like to improve my AWK knowledge and have no idea what your awk script did ;)
0
 
LVL 68

Accepted Solution

by:
woolmilkporc earned 500 total points
ID: 35094106
OK, here you go:

"-F" configures the field separator to be applied by awk.
"[|:]" means that each of both characters - the pipe as well as the colon - should indicate a field boundary.
By the way, this is the "regular expression" you've been asking for.

Taking these separators and looking at your data you'll see that the second field (the one following a first separator ("|" in this case) and terminated by the next separator (":" in this case, but fields between "|" and "|" or between ":" and ":" or any combination would also match) is one of the fields we're searching for.

Continuing the search we see that the next relevant fields are number 4, then 6, 8 and so on.
The odd fileds contain the stuff between the colon and the next pipe ("/vol/xxxxxxxx03_gti_xxxx_dev_01/ ....") which is irrelevant here.

So we only need to print out the even numbered fields.
To accomplish this we set up a "for" loop using the variable "i", starting at 2 ("i=2"), incrementing it by 2 ("i+=2") as long as  the end of the line is not yet reached ("i<NF", where "NF" is an internal awk variable meaning "number of fields").
"$i" will thus contain "$2", "$4" etc., which is the awk notation for field numbers.
Finally we simply "print" out these fields.  

awk takes its data from the file inputlist reading and processing line by line until the end of inputlist is reached.

You asked for "unique" names, so we must filter the output.
Sorry, I made a mistake above, "uniq" only works correctly on already sorted lists, so better use "sort -u".

"sort -u" sorts its input data in ascending order, suppressing all but one line in each set of identical lines, thus making the output lines "unique".

Have fun with awk!

wmp
0
 
LVL 8

Author Closing Comment

by:jaxstorm
ID: 35094177
Excellent, thorough answer. Well deserved
0

Featured Post

Free Tool: Postgres Monitoring System

A PHP and Perl based system to collect and display usage statistics from PostgreSQL databases.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Do you hate spam? I do, and I am willing to bet you do as well. I often wonder, though, "if people hate spam so much, why do they still post their email addresses on the web?" I'm not talking about a plain-text posting here. I am referring to the fa…
Google Drive is extremely cheap offsite storage, and it's even possible to get extra storage for free for two years.  You can use the free account 15GB, and if you have an Android device..when you install Google Drive for the first time it will give…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

828 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question