Solved

extract ip addresses from any file.

Posted on 2008-10-23
8
2,609 Views
Last Modified: 2012-05-05
I need to extract ip addresses from an unformatted text file.  The addresses do not have brackets or any other delimiters around them.  I need to output this to another file.
0
Comment
Question by:jeffsmall
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
8 Comments
 
LVL 1

Expert Comment

by:LosBear
ID: 22787424
is the file tab delimited at least? can you provide a sample of the file you want to parse?

0
 
LVL 1

Expert Comment

by:WANM
ID: 22787698
assuming they at least have a space at each side of the address, and one address somewhere on each line:

cat file | sed -r 's/(.*?) ([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}) (.*)/\2/'

If you could provide and example of the file it would make this easier....
0
 

Author Comment

by:jeffsmall
ID: 22788244
Sorry, I want this to work on just about any text file and there may or may not be spaces around the address.  The addresses might be interspersed and there could be more than one on a line.

Here is some sample text

; generated by /sbin/dhclient-script
nameserver 192.168.1.147
nameserver 192.168.1.1
ip address=192.168.1.148
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:jeffsmall
ID: 22788450
search aus.us.siteprotect.com
nameserver 216.139.253.2
nameserver 216.139.253.3

the regex provided above matches the first line also.  I just need to pull the ip addresses and output them to a temp file.

>cat /etc/resolv.conf | sed -r 's/(.*?) ([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}) (.*)/\2/'
search aus.us.siteprotect.com
nameserver 216.139.253.2
nameserver 216.139.253.3
0
 
LVL 5

Accepted Solution

by:
zmo earned 125 total points
ID: 22788998
well to remove those lines you can do :

% cat /etc/resolv.conf | grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}'
nameserver 216.139.253.2
nameserver 216.139.253.3

but if you join both you'll still get :
% cat /etc/resolv.conf | grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | sed -r 's/(.*?) ([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}) (.*)/\2/'
nameserver 216.139.253.2
nameserver 216.139.253.3

and that's because of the needed ' ' between the regexp and the rest of the line.

A fix would be :
% cat /etc/resolv.conf | sed -r 's/^.* ([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}).*$/\1/' | grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}'
216.139.253.2
216.139.253.3

but you just can't match several IPs on the same line, because of how sed works internally (that's fun I said the same thing this morning on another topic about regexps). Sed works with finite stack automatas, which means it can't produce n times the same pattern in a s///. You have to do it programatically.

A solution would be to get all the first IPs of every lines containing IPs. Then use sed to remove all those IPs and pipe it to sed again so it gets all the first IPs of every lines, and so on until there are no more IPs.
0
 
LVL 5

Expert Comment

by:zmo
ID: 22789028
% cat /etc/resolv.conf | sed -r 's/^[^0-9]*([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}).*$/\1/' | grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}'
216.139.253.2
216.139.253.3

is what you want (ie no need for leading white space)
0
 

Author Closing Comment

by:jeffsmall
ID: 31509269
Thank you!  That does what I need very nicely. I appreciate your help and the knowledge I have gained from your work.
0
 
LVL 5

Expert Comment

by:zmo
ID: 22789190
to be more picky, and correct all previous mistakes, you could use :

% cat /etc/resolv.conf | sed -r 's/^.*?[^0-9](25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)[^0-9].*$/\1.\2.\3.\4/' | grep -E "(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"
216.139.253.2
216.139.253.3

the differences are :
1/ '' ^.*? '' : matches any character any time at the beginning of the line
2/ '' [^0-9] '' : match any character that is not a number one time
3/ '' (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) '' : matches an IP address that is valid (ie 265.24.256.3 is not valid)
4/ '' [^0-9] '' same as 2
5/ '' .*?$ '' : same as 1/ for the end

\1.\2.\3.\4 reconstructs the address from the parts in parenthesis in pattern 3/
0

Featured Post

What Is Transaction Monitoring and who needs it?

Synthetic Transaction Monitoring that you need for the day to day, which ensures your business website keeps running optimally, and that there is no downtime to impact your customer experience.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The purpose of this article is to demonstrate how we can upgrade Python from version 2.7.6 to Python 2.7.10 on the Linux Mint operating system. I am using an Oracle Virtual Box where I have installed Linux Mint operating system version 17.2. Once yo…
The Windows functions GetTickCount and timeGetTime retrieve the number of milliseconds since the system was started. However, the value is stored in a DWORD, which means that it wraps around to zero every 49.7 days. This article shows how to solve t…
Learn the basics of if, else, and elif statements in Python 2.7. Use "if" statements to test a specified condition.: The structure of an if statement is as follows: (CODE) Use "else" statements to allow the execution of an alternative, if the …
The viewer will learn how to count occurrences of each item in an array.

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question