Thanks for the quick reply.
To be honest, I have no clue on the PHP but realized there was probably a good way using the method. If you have any coding experience that can lend to this solution you proposed then that would be grateful. My weakness comes in parsing the pages and getting it to a DB (mySQL or otherwise). Once there I, like most, have ample tools to extract and manipulate.
So if you have any idea how to code the extraction, parsing and insertion into the DB please let me know.
Is tis too much to ask? If so, I can take it out and ask for it to be done commercially. Any savings in time for my wife and other parents would be worth a little investment I guess.
Main Topics
Browse All Topics





by: routinetPosted on 2008-09-23 at 11:39:38ID: 22552552
If you're doing this in PHP, you can use the curl library to pull the pages:
en/ref.cur l.php
en/domdocu ment.loadh tml.php
http://www.php.net/manual/
In the querystring for the tables, the var1 variable seems to indicate A-Z (AA for the '#' entry). Using the base querystring for the first page, you should be able to create the links necessary to grab them one at a time.
As far as parsing the content, you can either parse it the hard way through regex, but that could fail if the format of the page ever changes. Instead, try the DOM classes for PHP:
http://www.php.net/dom
Once you have a page's HTML from curl, load that string into a DOM instance with:
http://www.php.net/manual/
Then it's just a matter of drilling down into the document to find the table you need, and begin a row-by-row extraction. This all sounds very easy in theory, but I've never used the class myself, so I'm not sure how well it would go with the implementation. I'm happy to help experiment, though.. :)