Solved

grep awk text in variable positions text file

Posted on 2013-06-21
3
566 Views
Last Modified: 2013-06-21
I have text files I need to extract website links from, but they are in variable positions.

These lines will all be in one text file, and I need to find and pull out only the links.

Examples:

I received the link and it is http://www.google.com/
Bob sent me the best website and I sent it to him:http://www.yahoo.com
Please update to http://www.dropbox.com and I'll get it back to you asap.

dos or linux or python suggestions?
0
Comment
Question by:fkn
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
3 Comments
 
LVL 75

Accepted Solution

by:
käµfm³d   👽 earned 500 total points
ID: 39266773
In Python:

import re

# file read code modified from http://stackoverflow.com/questions/8369219/how-do-i-read-a-text-file-into-a-string-variable-in-python#answer-8369272
with open('C:\input.txt', 'r') as inFile:
	text = "".join(line.rstrip() for line in inFile)

for match in re.findall('http://[^ ]+', text):
	print(match)

Open in new window

0
 

Author Closing Comment

by:fkn
ID: 39266831
On the money.
0
 
LVL 48

Expert Comment

by:Tintin
ID: 39267352
Much easier to do

grep -Po "http://[\w-.]+" file

Open in new window

0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Linux Desktop suggestion for Dell Inspiron 3043 13 67
Samba Question 11 109
RHEL 6.7 Gnome Desktop on VMware 6 VM 9 116
Shell script issue 4 56
I. Introduction There's an interesting discussion going on now in an Experts Exchange Group — Attachments with no extension (http://www.experts-exchange.com/discussions/210281/Attachments-with-no-extension.html). This reminded me of questions tha…
Active Directory replication delay is the cause to many problems.  Here is a super easy script to force Active Directory replication to all sites with by using an elevated PowerShell command prompt, and a tool to verify your changes.
Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, T…
Learn the basics of modules and packages in Python. Every Python file is a module, ending in the suffix: .py: Modules are a collection of functions and variables.: Packages are a collection of modules.: Module functions and variables are accessed us…

696 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question