I am looking for advanced information to develop a script similar to cutestat. I want to know how should I proceed? and how should I crawl pages / websites ? should I do it via php curl and store the html output or part of it into database or what else could be the best practice ?
I shall be doing custom digging and regex patterns into database later on but initially I just want to start with 1 million domains and want to know what's the fastest way to get html of all those million domains / sites ?
is php efficient enough ? or I have to use any other crawler?