Solved

foxpro - finding domain name within URL with Regular Expressions

Posted on 2013-11-05
3
905 Views
Last Modified: 2013-11-07
Yesterday pcelba posted the exact code that I needed to find all email addresses in text:

www.experts-exchange.com/Microsoft/Applications/FoxPro/Q_28284572.html (see last reply in solution)

I'm going to use the same code for extracting the domain name from a URL, but don't know what regular expression pattern that I need.

eg.
from "www.google.com"
to extract 'google'

or

from "https://secure.experts-exchange.com""
to extract "experts-exchange"

I've tried various patterns found here:
http://stackoverflow.com/questions/569137/how-to-get-domain-name-from-url

but they don't seem to work in Foxpro with the code in the answer from yesterday.
Does anyone have a tried and tested pattern that works?
0
Comment
Question by:esak2000
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
3 Comments
 
LVL 42

Accepted Solution

by:
pcelba earned 167 total points
ID: 39623888
If you have URL already then you don't need RegExp to extract the domain name BUT you have to define what "domain" means for you... (It is stated in the StackOverflow already)

www.google.com ... google
www.celba.cz  ... celba
celba.cz  ... celba

that was easy... but

www.abc.co.uk ... co   or   abc.co   or   abc ?
etc.

And what about visualfoxpro.application etc ?

So you'll need to distinguish between domains and e.g. OLE class names as the above one and also other similar elements... and their names can be very similar if not equal to domains.

To define a list of TLDs or domain suffixes is good idea probably.

And to do the whole task you just need STREXTRACT() or AT() and SUBSTR() functions in FoxPro.
0
 
LVL 29

Assisted Solution

by:Olaf Doschke
Olaf Doschke earned 167 total points
ID: 39623981
Pavel has got the main point, there are much more top level domains and more are coming. I don't see that as a task for regexp anymore, too complicated.

Looking at http://en.wikipedia.org/wiki/URI_scheme#Generic_syntax
What you want to extract is only the second level domain part of the host name without the top level domain.

The same second level domain combined with other top level domains may be interesting for domain grabbers or researches for a potential product name, etc.

So there is use for extracting that info, but another top level domain normally means another hoster, maintainer, owner, service, brand, market, etc. Only the full hostname SLD.TLD is techncally relevant and stands for an IP (or a series of IPs related to the same host). Being able to extract or stripe off a subdomain is perhaps more important.

I'd let DNS do that for me. Extract the part between : and the first following / and see what IP you get from DNS, repeat with partial names. The shortest name giving you the same IP from DNS is the hostname, and the last part(s) of that is/are a TLD.

Bye, Olaf.
0
 
LVL 9

Assisted Solution

by:Derek Jensen
Derek Jensen earned 166 total points
ID: 39624832
The most common/useful regexp I usually use for locating a URL goes something like this:

/https?:\/\/(\w*\.?[^\/]+?\.\w{2,3})/

Open in new window

But as for grabbing the rest of the URL, I'll have to defer to prior experts' comments. :-)
0

Featured Post

Forrester Webinar: xMatters Delivers 261% ROI

Guest speaker Dean Davison, Forrester Principal Consultant, explains how a Fortune 500 communication company using xMatters found these results: Achieved a 261% ROI, Experienced $753,280 in net present value benefits over 3 years and Reduced MTTR by 91% for tier 1 incidents.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Regex or operator problems 8 40
Cannot locate the COMSPEC environment variable 8 185
Perl regex to replace any capital letters not preceded by ">" 6 167
Coldfusion RegEx 8 79
I have been reconstructing a PHP-based application that has grown into a full blown interface system over the last ten years by a developer that has now gone into business for himself building websites. I am not incredibly fond of writing PHP code o…
Do you hate spam? I do, and I am willing to bet you do as well. I often wonder, though, "if people hate spam so much, why do they still post their email addresses on the web?" I'm not talking about a plain-text posting here. I am referring to the fa…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

696 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question