Bash sort / cut / uniq script to analyze domains

I have a list of domains that I'd like to trim down a little bit. I only want the domains listed and not the subdomains of each domain. For example, the list below:

ff.search.yahoo.com
files.opensuse.org
files.widgetbox.com
yahoo.com
forums.dpreview.com
forums.opensuse.org
images.dpreview.com
geo.yahoo.com
go.microsoft.com
updates.vista.microsoft.com

I want to reduce to:

yahoo.com.com
opensuse.org
widgetbox.com
dpreview.com
microsoft.com

It would just be the cat's pajamas if I cut use the Cut command, specify the '.' as field separator and return the RIGHT two fields instead of numbering from the left. But I don't think that's possible, at least from what I've read and searched so far.

To say it another way, II only want to deal with two fields (as separated by '.'), not 4, 6, or three. And field quantity is a variable, depending on how many subdomains need to be deleted. I can't use AWK, but SEDis available to my systeme, if Ihave to use something other than Cut or Sort.
LVL 9
thinkwelldesignsAsked:
Who is Participating?
 
ozoCommented:
sed 's/.*\.\(.*\.\)/\1/' | sort -u
0
 
TintinCommented:
Why the restriction on awk?

awk -F. '{printf"%s.%s\n", $(NF-1),$NF}' file  |sort -u

otherwise I was going to suggest the same sed option as ozo.
0
 
ozoCommented:
awk -F. '{d[$(NF-1),".",$NF]++}END{for( u in d) print u}'
0
 
thinkwelldesignsAuthor Commented:
Thanks, ozo. The sed command works perfectly. I'm not quite sure exactly HOW it works, and if you'd give me an explanation, I'd be grateful. But it does work exactly so points to you!

Thanks a million.

BTW, this script will be used on a number of gateway machines that don't have awk installed, but do have sed. I dowanna hafta install awk on every machine to make the script work.

But is awk "better" than sed?
0
 
TintinCommented:
awk isn't better than sed, it's just another tool that is better at some tasks than sed.  In this instance, there's not much difference between using sed or awk.  I'm surprised you don't have awk on your gateway servers, as it is a very standard utility.

Anyway, breaking the sed statement down:

.*  zero or more matches of any character
\.  a literal dot
\(  start of pattern capture
.*\.  zero of more matches of any character terminated by a literal dot
\)  end of pattern capture
\1  print the captured pattern
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.