• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 374
  • Last Modified:

Website Not being Crawled

Hi,

We have a website and I tried to check the links using a link checker software, the index page was checked and no other page. At first i thought that its the settings of the Software . So i checked using Gsite Crawler and still the same result. I tried the crawlers on our other sites and they worked correctly.

It seems like there is some problem with the settings or configuration for the website in the host.

we are using CPanel Host. Can anyone tell me what seems to be the problem? what settings can prevent us from crawling our website. We have already removed our robot text file and the htaccess file does not contain any restrictions but it still wont check the site.
0
openaccount3
Asked:
openaccount3
  • 4
  • 4
  • 2
1 Solution
 
giltjrCommented:
How are the links defined in the page?  Some pages use JavaScript instead of href's for links and  site crawlers (some, most, all?)  will not work.
0
 
openaccount3Author Commented:
we use normal href's code.
0
 
Geert GruwezOracle dbaCommented:
is there a line the header section
<meta robots=none />
or something similar ?
0
Veeam Disaster Recovery in Microsoft Azure

Veeam PN for Microsoft Azure is a FREE solution designed to simplify and automate the setup of a DR site in Microsoft Azure using lightweight software-defined networking. It reduces the complexity of VPN deployments and is designed for businesses of ALL sizes.

 
openaccount3Author Commented:
none, we don't use this code <meta robots=none /> or no such code similar to that
0
 
giltjrCommented:
I would try and run a packet capture on the box that you are running the site crawler and see if it is making additional requests.

If the site crawler is not making additional requests, then it doesn't like something the way the your html is written.  Could you post your index.html code, with anything that should not be made public removed or "XXXX" out of course
0
 
openaccount3Author Commented:
It was working before, there is no problem with the html file. The index page is the same as the other page we made. Only this one doesn't work, and the rest works properly.
0
 
giltjrCommented:
--> It was working before, there is no problem with the html file.

This implies that you have had the same html file and were using the same crawler program.

--> The index page is the same as the other page we made.

This implies that the page has changed.  So it sounds like you have two pages, one that worked and one that does not work.  Are the pages for the same site or do you have two sites where you used the index page for one site as a template as the index page for the second site?

Doing a network trace would show what http traffic is going back and forth  and if the crawler is making the requests and the server is not responding or if the crawler is not making the requests.
0
 
openaccount3Author Commented:
thank you for the time and idea, it's working now.
i  removed and  back-up all the files from the host,  uploaded the new files  from my pc to the host except for htacces and robots file. then i try checking the broken links and it works.
0
 
giltjrCommented:
Have you always had a robots file? This file controls how web crawlers access your site, such preventing crawling totally.  This was a important piece of information that you left out.  I would suggest that you examine your robots file carefully.
0
 
Geert GruwezOracle dbaCommented:
what was in the robots file ?

follow links = no ?

0

Featured Post

Configuration Guide and Best Practices

Read the guide to learn how to orchestrate Data ONTAP, create application-consistent backups and enable fast recovery from NetApp storage snapshots. Version 9.5 also contains performance and scalability enhancements to meet the needs of the largest enterprise environments.

  • 4
  • 4
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now