Link to home
Start Free TrialLog in
Avatar of hankknight
hankknightFlag for Canada

asked on

Populate a database with domain names from an rdf file

Hello,

I use a script, provided by za-k/ (adrpo) to populate a database with URLs from an rdf file.
https://www.experts-exchange.com/questions/23120051/Placing-1-8-GB-of-data-in-database-without-hogging-resources.html

Here is the RDF file:
http://rdf.dmoz.org/rdf/content.rdf.u8.gz

It populates my database with over 400,000 URLs and many are sub-pages of the same domain.

Now I only want domain names in my database.

http://www.example.com 

     should be included as is but
     
http://www.domain.com/subdirectory/ 

         should be placed into the database as

http://www.domain.com/

because I only want the domain names and not the full URL.

Thanks for the help!
Avatar of Adam314
Adam314

Do you have a smaller version of the RDF file?

I took a quick look at the other script.  At first, I don't see a need to have the sleep.  That just slows it down.  It should be fine running without the sleep.
ASKER CERTIFIED SOLUTION
Avatar of Adrian Pop
Adrian Pop
Flag of Sweden image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial