TLD algorithm challenge!

Posted on 2005-04-10
Last Modified: 2008-02-01
How could you extract the subdomain, domain and tld from an FQDN?

I'm huge up on determining "second-level" tlds or whatever their proper name is.


The domain is "google" not "co", my current algo would return:

subdomain: ""
domain: "co"
tld: "uk"

Any thoughts on how to accurately handle the dotted tlds?  I don't know of any tld lists that include these.

Question by:elmoredaniel
    LVL 15

    Expert Comment

    Hi elmoredaniel,
    The simple answer is, that there isn't a simple answer.

    Each TLD (such as .com or .uk) has a registrar which defined what (if any) SLDs exist.
    Essentially, you'd need to check with each TLD registrar to find out whether or not they use second level domains.  Once you've got that info, you can parse the domain name correctly.

    You can find a list of country TLDs at and a list of "generic" TLDs at
    These lists will let you know who the administrator of each TLD is, so you can check on their website.

    You can find a more detailed explanation of this at

    Does that help?
    LVL 37

    Expert Comment

    by:Harisha M G
    Hi elmoredaniel,
        Since you are asking for an algorithm...
        1) Get the whole string ""
        2) Find whether the string has "://" and find its location, say x. In your case "://" is at the fifth position.
        3) Remove the characters upto x + 2. Now you are left with ""
        4) Now find the first occurence of "/". If exists, then remove the whole thing starting from that position. You are now left with ""
        5) Count the number of "." in the string.. 3
        6) Split the string in to substrings using functions similar to Split in VB.
        7) If the dots are 1, first one is domain and second one is tld (
        8) If the dots are 3, second one is domain, fourth one is tld, third one is subdomain. Ignore first substring(typically www) (
        9) If the dots are 2, check the first substring. If it is "www", then second is domain and third is tld. (
        10) If the dots are 2 and first substring is not "www", then first one is subdomain, second one is domain, third is tld (

    Hope this helps :)


    Author Comment

    scampgb,  I feared that was the case. Do you know any links to get more detailed information on the format of these SLDs, particularly I wonder if there is a length restriction. Two or three characters seems to be all that I see. If that's the case, then I could check if the TLD is a CC and then check the length of the SLD, if 2 or 3 I could probably conclude that it's "part" of the TLD. What do you think?

    Your links were very helpful!
    LVL 15

    Accepted Solution

    Sorry, once again it's not that straightforward :-(
    For example, there's an SLD for - this has the same number of characters as "google".

    I think what you'll have to do is go through all the CC registrars and see whether they use SLDs for administrative reasons (as "UK" does for example).  
    You could then build the TLD list into your process and lookup whether or not is uses SLDs.  If it does, you know how to treat the domain name.

    Well, that's the theory - Canada could prove a notable exception.

    LVL 84

    Expert Comment

    #!/usr/bin/perl -w
    $_ = "";
    print /([^.]*)\.\w*(?:\.(?:ac|at|au|be|ca|cn|co|ec|fr|hk|il|in|jp|kr|mc|mm|mx|pl|ro|ru|sg|th\

    LVL 7

    Expert Comment

    in javascript :)

    <script type="text/javascript">
     for (i=0;i<url.length;i++)
    LVL 55

    Expert Comment

    Just to add another spanner in the works,, bl is a domain rather than a ccSLD.
    LVL 15

    Expert Comment

    ozo: Just as a matter of interest, how did you get the list "ac|at|au|be|ca|cn|co|ec|fr|hk|il|in|jp|kr|mc|mm|mx|pl|ro|ru|sg|th|tr|uk|za" ?

    andyalder: Very good point - as is :-)
    LVL 55

    Expert Comment

    Good old Nominet. Notice that their whois (at least the web based version) knows who are but it doesn't resolve, and a handful of others that still have domains rather than ccSLDs under the UK ccTLD. Other bodies may have their own quirks but at least is back again ;)

    Featured Post

    Top 6 Sources for Identifying Threat Actor TTPs

    Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

    Join & Write a Comment

    Outlook Free & Paid Tools
    For both online and offline retail, the cross-channel business is the most recent pattern in the B2C trade space.
    This video discusses moving either the default database or any database to a new volume.
    Here's a very brief overview of the methods PRTG Network Monitor ( offers for monitoring bandwidth, to help you decide which methods you´d like to investigate in more detail.  The methods are covered in more detail in o…

    728 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    19 Experts available now in Live!

    Get 1:1 Help Now