Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Website Not being Crawled

Posted on 2008-06-11
13
Medium Priority
?
371 Views
Last Modified: 2010-05-18
Hi,

We have a website and I tried to check the links using a link checker software, the index page was checked and no other page. At first i thought that its the settings of the Software . So i checked using Gsite Crawler and still the same result. I tried the crawlers on our other sites and they worked correctly.

It seems like there is some problem with the settings or configuration for the website in the host.

we are using CPanel Host. Can anyone tell me what seems to be the problem? what settings can prevent us from crawling our website. We have already removed our robot text file and the htaccess file does not contain any restrictions but it still wont check the site.
0
Comment
Question by:openaccount3
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
  • 2
13 Comments
 
LVL 57

Expert Comment

by:giltjr
ID: 21765636
How are the links defined in the page?  Some pages use JavaScript instead of href's for links and  site crawlers (some, most, all?)  will not work.
0
 

Author Comment

by:openaccount3
ID: 21766141
we use normal href's code.
0
 
LVL 38

Expert Comment

by:Geert Gruwez
ID: 21767058
is there a line the header section
<meta robots=none />
or something similar ?
0
Get your Conversational Ransomware Defense e‑book

This e-book gives you an insight into the ransomware threat and reviews the fundamentals of top-notch ransomware preparedness and recovery. To help you protect yourself and your organization. The initial infection may be inevitable, so the best protection is to be fully prepared.

 

Author Comment

by:openaccount3
ID: 21767146
none, we don't use this code <meta robots=none /> or no such code similar to that
0
 
LVL 57

Expert Comment

by:giltjr
ID: 21773578
I would try and run a packet capture on the box that you are running the site crawler and see if it is making additional requests.

If the site crawler is not making additional requests, then it doesn't like something the way the your html is written.  Could you post your index.html code, with anything that should not be made public removed or "XXXX" out of course
0
 

Author Comment

by:openaccount3
ID: 21775994
It was working before, there is no problem with the html file. The index page is the same as the other page we made. Only this one doesn't work, and the rest works properly.
0
 
LVL 57

Accepted Solution

by:
giltjr earned 2000 total points
ID: 21778504
--> It was working before, there is no problem with the html file.

This implies that you have had the same html file and were using the same crawler program.

--> The index page is the same as the other page we made.

This implies that the page has changed.  So it sounds like you have two pages, one that worked and one that does not work.  Are the pages for the same site or do you have two sites where you used the index page for one site as a template as the index page for the second site?

Doing a network trace would show what http traffic is going back and forth  and if the crawler is making the requests and the server is not responding or if the crawler is not making the requests.
0
 

Author Comment

by:openaccount3
ID: 21783891
thank you for the time and idea, it's working now.
i  removed and  back-up all the files from the host,  uploaded the new files  from my pc to the host except for htacces and robots file. then i try checking the broken links and it works.
0
 
LVL 57

Expert Comment

by:giltjr
ID: 21784965
Have you always had a robots file? This file controls how web crawlers access your site, such preventing crawling totally.  This was a important piece of information that you left out.  I would suggest that you examine your robots file carefully.
0
 
LVL 38

Expert Comment

by:Geert Gruwez
ID: 21810521
what was in the robots file ?

follow links = no ?

0

Featured Post

Looking for a new Web Host?

Lunarpages' assortment of hosting products and solutions ensure a perfect fit for anyone looking to get their vision or products to market. Our award winning customer support and 30-day money back guarantee show the pride we take in being the industry's premier MSP.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Color can increase conversions, create feelings of warmth or even incite people to get behind a cause. If you want your website to really impact site visitors, then it is vital to consider the impact color has on them.
Q&A with Course Creator, Mark Lassoff, on the importance of HTML5 in the career of a modern-day developer.
Viewers will get an overview of the benefits and risks of using Bitcoin to accept payments. What Bitcoin is: Legality: Risks: Benefits: Which businesses are best suited?: Other things you should know: How to get started:
This tutorial walks through the best practices in adding a local business to Google Maps including how to properly search for duplicates, marker placement, and inputing business details. Login to your Google Account, then search for "Google Mapmaker…
Suggested Courses

618 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question