• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 217
  • Last Modified:

reg ex search for companies

I have a large html file that I am trying to scrape the names of companies out of. The company names are always in the following format:

<a href="offsite_quotes.asp?content=http://www.someplace.com">Some Place, Inc.</a>

I would want "Some Place, Inc." as the result here. The company names could be one or more words, they might even have special characters in the name. (@, -, etc) But they will always have "<a href="offsite_quotes.asp?content=" followed by a url and a "">", then the name of the company.

There might be more than one company name per line. If there is, it would be important to print each one per line. I don't know if doing this with a open file, and while loop would be the way to go or not.
0
stakor
Asked:
stakor
1 Solution
 
ozoCommented:
{local $/="</a>";
 while( <> ){
   chomp;
   print "$1\n" if /<a href="offsite_quotes.asp\?content=.*?>(.*)/s;
 }
}
0
 
stakorAuthor Commented:
Thank you very much.
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now