asked on

Website Not being Crawled

Hi,

We have a website and I tried to check the links using a link checker software, the index page was checked and no other page. At first i thought that its the settings of the Software . So i checked using Gsite Crawler and still the same result. I tried the crawlers on our other sites and they worked correctly.

It seems like there is some problem with the settings or configuration for the website in the host.

we are using CPanel Host. Can anyone tell me what seems to be the problem? what settings can prevent us from crawling our website. We have already removed our robot text file and the htaccess file does not contain any restrictions but it still wont check the site.

giltjr

How are the links defined in the page? Some pages use JavaScript instead of href's for links and site crawlers (some, most, all?) will not work.

openaccount3

ASKER

we use normal href's code.

Geert G

is there a line the header section
<meta robots=none />
or something similar ?

openaccount3

ASKER

none, we don't use this code <meta robots=none /> or no such code similar to that

giltjr

I would try and run a packet capture on the box that you are running the site crawler and see if it is making additional requests.

If the site crawler is not making additional requests, then it doesn't like something the way the your html is written. Could you post your index.html code, with anything that should not be made public removed or "XXXX" out of course

openaccount3

ASKER

It was working before, there is no problem with the html file. The index page is the same as the other page we made. Only this one doesn't work, and the rest works properly.

ASKER CERTIFIED SOLUTION

giltjr

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

openaccount3

ASKER

thank you for the time and idea, it's working now.
i removed and back-up all the files from the host, uploaded the new files from my pc to the host except for htacces and robots file. then i try checking the broken links and it works.

giltjr

Have you always had a robots file? This file controls how web crawlers access your site, such preventing crawling totally. This was a important piece of information that you left out. I would suggest that you examine your robots file carefully.

Geert G

what was in the robots file ?

follow links = no ?