Solved

Need the way

Posted on 2015-01-08
50
54 Views
Last Modified: 2015-03-10
Hi,
what is the way to make the web-site searchable by Google, Yahoo and other Search engines?
0
Comment
Question by:HuaMinChen
  • 24
  • 23
  • 2
  • +1
50 Comments
 
LVL 4

Expert Comment

by:Tony Pitt
ID: 40537459
There are places on most of the search engines to submit your website.  Here's the one for Google in the UK, for example: http://www.google.co.uk/submityourcontent/website-owner/ - you'll need to use your local Google site instead.  Here's the one for Yahoo: http://search.yahoo.com/info/submit.html.  And here's the one for Bing: http://www.bing.com/toolbox/submit-site-url.

In general, you'll find lots of pointers by asking Google the question "Submit website to search engines" ...

/T
0
 
LVL 23

Expert Comment

by:Eirman
ID: 40537560
You should make a sitemap for your website and submit that to the search engines.
There are many free tools out there that will make the sitemap such as
https://www.xml-sitemaps.com/
0
 
LVL 10

Author Comment

by:HuaMinChen
ID: 40539442
Many thanks all.
Tony,
I did ever submit the web-site to

https://www.google.com/webmasters/tools/submit-url?hl=zh-TW&mesd=AB9YKzJkVAi3LK0sxvxzFDQ4SeEs27AV3YiHkgzAETCDmMaOkZoScvXSJk57yrYHTYDbjzpk9qcvTnqfqGBtCtGZKvhpYHBdQXFuXyLlHEpTJvgPdrvApYMEgZeUwmwFx4Fjr2v5VIE1

but after that, I cannot find out the relevant site, within Google. what do you think to this?
0
 
LVL 13

Expert Comment

by:frankhelk
ID: 40548627
When you have submitted your site, they drop your info into the lists for the search engine's crawlers "what to visit next" queue. They will fetch your content, index it and stuff it into their database. That usually needs time, depending on the current workload.

Just be patient.

BTW: I don't have an URL at hand, but I remember that there are meta-sites that submit your site to multiple (even small and specialized) search engines with one action.
0
 
LVL 10

Author Comment

by:HuaMinChen
ID: 40550460
Thanks.
I have one url

http://my-friend.co/Start

and I did ever submit it to Google long time ago but really nothing helps.
0
 
LVL 13

Expert Comment

by:frankhelk
ID: 40550877
Hmmm ...

I' sure that you're in Google's indexes already. I've done a search on Google for a phrase on your site's landing page ("Are you still on the way to seek your 2nd half there?") and got exactly one hit: Your site's landing page.

If you simply don't get more hits, I presume your page simply doesn't attract visitors ... they don't find it due to their search terms, the results don't reach the top rank in the search, or the results shown by Google don't invite them to click thru. If that's the case, you need some SEO (serch engine optimization), a technique that helps to attract visitors by optimizing your site for better ranking. There are many tutorials on the net for that, and there are numerous commercial services waitng to do it for you ...

A first start would be Google's webmaster tools.

By the way: If I do a site specific search (just "site:my-friend.co" in the search field), I get only that one hit, which tells me that your site is very small (just that page) or very locked up. You should add some free content to it, like previews, a tour, a limited test account, pricing sheme info, etc. ... the more information, the better you get ranked.

Hope that helps.
0
 
LVL 10

Author Comment

by:HuaMinChen
ID: 40556875
Thanks. Can I know what you put into Google, to do the search?
0
 
LVL 13

Expert Comment

by:frankhelk
ID: 40557213
Sure - I literally entered the phrase

"Are you still on the way to seek your 2nd half there?"

(with the double quotes to tell Google to search the exact phrase) as it appeared on your landing page.

Here's what I got:
Search and Result
0
 
LVL 10

Author Comment

by:HuaMinChen
ID: 40557224
Sorry, I try to put this description into Google and then click "Search" but then I cannot find out my site.
0
 
LVL 13

Expert Comment

by:frankhelk
ID: 40557474
Hmmm - that's weird. I can reproduce that result whenever I try. May I ask for the region you're in ?

The top level domain of your site points to Colombia ...

If possible for you, maybe you could try the same search with TorBrowser, maybe several times with restarting TorBrowser for each try ? It's not for the encryption thing but for the fact that you probably get another exit node every time, thereby testing access from elsewhere. Please ignore if Google nags about "irregular traffic" etc. - just restart TorBrowser again (it's caused from other traffic thru that exit node ...).
0
 
LVL 10

Author Comment

by:HuaMinChen
ID: 40559182
I am in Hong Kong and only get these

https://www.google.com.hk/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=Are%20you%20still%20on%20the%20way%20to%20seek%20your%202nd%20half%20there%3F

within Google, after having put "Are you still on the way to seek your 2nd half there?" to search.
0
 
LVL 13

Expert Comment

by:frankhelk
ID: 40559428
I've tried the link ... it looks like you have omitted the double quotes ( " ) around the phrase. That would search for pages containing the words, but not necessary in that order ... and there are numerous pages meeting these criteria.

Please try again with double quotes ...
0
 
LVL 10

Author Comment

by:HuaMinChen
ID: 40561615
Many thanks Frank.
How to further enhance the search, like to enable

Friend search
Find friend

to be able to locate the current site?
0
 
LVL 10

Author Comment

by:HuaMinChen
ID: 40561617
And it seems, we usually do not quote the words to do the search, right?
0
 
LVL 13

Expert Comment

by:frankhelk
ID: 40561641
OK - seems you need a little intro into searching on Google ;-)

I you type in only the words (w/o quotes), Google searches for pages containing all given words (or, depending on the results, most of 'em). The space character serves as deliniter in that case. If you want to search for a phrase, i.e. "friend search" exactly as given (omitting pages with "search friend", only "friend" and only "search"), you tell Google that wish by enclosing that part into double quotes.

The next step: If you want to narrow your search down to only on site, you enter a site address prefixed with

  site:

An example: Searching for

  "Friend Search" site:my-friend.co

would list all (known to Google) occurences of the phrase "Friend Seach" on pages at my-friend.co. (That search doesn't find any results, by the way.)

Be aware that Google could ony find texts it could reach and read, therefore text in images is generally hidden from Google (as long as there's not alterative text coded in the HTML). Everything that needs a login to be shown is generally hidden, too. If present, your ROBOTS.TXT file (please search on Google for that topic) might prevent Google's crawler from indexing mst of your site, too.
0
 
LVL 10

Author Comment

by:HuaMinChen
ID: 40561662
If present, your ROBOTS.TXT file (please search on Google for that topic) might prevent Google's crawler from indexing mst of your site, too.

How to prevent from this?
0
 
LVL 4

Expert Comment

by:Tony Pitt
ID: 40561681
To prevent the ROBOTS.TXT file preventing Google's crawler from indexing most of your site, you simply don't have a ROBOTS.TXT file.  If there is one, then delete it.  If it's present, you can't override the crawler's behaviour - that's what the file is there to do.

/T
0
 
LVL 13

Accepted Solution

by:
frankhelk earned 500 total points
ID: 40561752
To make that more clear:

The file ROBOTS.TXT, located in the root of your site (that means http://my-friend.co/robots.txt), specifies which parts of your site should be indexed by search engine crawlers, and for which parts that is allowed. It means not that every crawler respects it, but the big engines like Google, Yahoo, Bing, etc. do.

If that file is non-existent - like Tony Pitt recommended - you tell the crawlers to fetch simply everything they could get their hands on.

I've just checked if I could get that file. I got a 404 error which means that it just don't exist.

I've taken the freedom to have a look at your landing page HTML source. I think the problem is that most of your page's code is dynamic with cryptic javascript etc.

I presume that Google's crawler is not dumb, but to index your site it needs links that it could follow. In its current state, the only thing that Google would know of your entire site is the textual content of your landing page. Besides of that, everything besides
landing page
"Register" page
"Login" page and
"Support" page
is hidden behind the login and therefore invisible to search engines. You could submit the other 3 pages to Google, too, but that won't bring you much further. If you want to attract more visitors to register and enter, you need a heap of free pages, like a preview (or "Tour") or samples or "Single of the day" (a daily changing profile preview with most of the info hidden and tha profile image blurred), some broad info pages, an impressum, a "who we are" page and one about "what could I expect from that site". The more free information about the site you provide, the more attraction you get.
0
 
LVL 10

Author Comment

by:HuaMinChen
ID: 40563477
Many thanks all.
Is there any specific format to ROBOTS.TXT?
0
 
LVL 13

Assisted Solution

by:frankhelk
frankhelk earned 500 total points
ID: 40563750
yes - there sure is a specific format ... the file is to be read by the crawler software, not by humans ;-)

If you just google "robots.txt" you'll find numeraous pages about it ... just have a look at some results and you'll find everything you need.

But keep in mind that having no robots.txt implies a "read whatever you gould get your hands on" directive.

I think that for attracting traffic to your site there's no simple alternative to "provding attractive free content" ...
0
 
LVL 10

Author Comment

by:HuaMinChen
ID: 40563773
I create one

robots.txt

file, within
C:\inetpub\wwwroot

does it mean I should be able to read it like

http://my-friend.co/robots.txt

for the Search engine to read it?
0
 
LVL 13

Expert Comment

by:frankhelk
ID: 40563919
If you use the default settings of IIS, I would think so.

(If you use a content management system that plugs in as ISAPI filter, that might override the regular root ... depending on the CMS ... I don't know if all of them allow mixing of dynamic content with static files.)
0
 
LVL 10

Author Comment

by:HuaMinChen
ID: 40569929
But why can't I access

http://my-friend.co/robots.txt

now?
0
 
LVL 13

Expert Comment

by:frankhelk
ID: 40570021
I can't access it, too. Maybe sth in your system blocks it from being accessed. Probably you need to code it into your CMS instead of dropping it into the IIS wwwroot folder ?

I don't know anything about your CMS (I asume by your landing page code, it seems that you use one), and I'm not familiar with most of them. So I could only speculate.

If you use an ISP provided server, it might be that anything not in the CMS is blocked from web access, or similar things ... I don't know ... possibly you need to read your ISP's documentation in that case to find out.
0
 
LVL 10

Author Comment

by:HuaMinChen
ID: 40570032
Thanks a lot.
Can I have more details of CMS?
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 13

Assisted Solution

by:frankhelk
frankhelk earned 500 total points
ID: 40570185
Sorry for using an abreviation. CMS stands for "Content Management System". It's a piece of software that simplifies the management of a web site for both developer and designer.

Basically, it separates the technical part of the web site (all the HTML, Javascript, Java, ASP, etc., etc.) stuff from the content (text, images, videos, data). This
allows the developer to focus on the technical part without being disturbed by the content,
minimizes the technical efforts by using the same tempaltes for many pages, thereby easing design, management ans troubleshooting
allows the designer to focus on visual design and conten by shielding him from mangling with the underlying web technologies
usually allows collaborative work on the content
.

There are numerous CMS systems around, some complex and expensive, some simple and cheap, some even free. The big ones store the content in SQL database systems and deliver life data from complex processes and sources.

If you rent a web server from an ISP, you might have a CMS on the server included to prevent you from having to deal with HTML etc.

For more info I recommend to see the Wikipadia articles about "Content management system" and "Content management".
0
 
LVL 10

Author Comment

by:HuaMinChen
ID: 40574713
Many thanks.
To enable the search, to Google, Yahoo, using robots.txt file

does it mean I should be able to access

http://my-friend.co/robots.txt

or not?
0
 
LVL 13

Expert Comment

by:frankhelk
ID: 40575187
Basically:

                 Yes.

If you want to restrict crawlers from accessing parts of your site:

                 Definitely (it's the way to tell them what to index and what not).

If it's OK when crawlers index everything they could get hands on
(which inludes parts only available with a login):

                 Not really - in that case you could omit robots.txt, which means
                 "no restrictions defined" to the crawlers.
0
 
LVL 10

Author Comment

by:HuaMinChen
ID: 40576648
Thanks a lot.
I'm able to access
https://my-friend.co/robots.txt

but not to

http://my-friend.co/robots.txt

how to resolve this?
0
 
LVL 13

Expert Comment

by:frankhelk
ID: 40576976
I don't know exactly, but I presume it's buried in the settings of IIS or (if present) your CMS. Seems it allows only encrypted access to the root.

By the way: I've tried to d/l your robots.txt successfully:
User-agent: Google
Disallow:

User-agent: *
Disallow: /

Open in new window

But I wonder about the settings ... you want to allow Google only to index your site, and leave the others (Yahoo, Bing, et al) out of your site ? If you want to draw traffic, that would be contraproductive.

Nonetheless: If you inspect the access log of your site, you could check if and how often Google visits your pages. It could also reveal which pages it visits (look for the user agent info).

And I want to remind you that the presence of /robots.txt is not the key to drive Google into indexing your site. The crawler definitely needs to find links on your landing page that lead the way deeper into the site (which I fear your source code doesn't provide), and it will automatically stop exploring the way when a login page comes along. It could only index contet it could find a linked way to and could get free assec w/o any login procedure.

I would recommend you to dig a bit into sites with focus on "search engine optimization" and "search engine technologies" to get a deeper understanding of that matter.
0
 
LVL 10

Author Comment

by:HuaMinChen
ID: 40576977
Thanks.

And I want to remind you that the presence of /robots.txt is not the key to drive Google into indexing your site. The crawler definitely needs to find links on your landing page that lead the way deeper into the site (which I fear your source code doesn't provide), and it will automatically stop exploring the way when a login page comes along. It could only index contet it could find a linked way to and could get free assec w/o any loggin procedure.

Here is the start-up page
http://my-friend.co/Start

how to ensure crawler could reach it easily?
0
 
LVL 13

Expert Comment

by:frankhelk
ID: 40576992
As I told before, that page seems to be found correctly - if I enter the text phrase on the page ino the Google search field, I get that page as result. Definitely: It reaches the page.

If you're unsure: Read the logs and search for the user agent string
Googlebot/2.X (+http://www.googlebot.com/bot.html
or more generic for "googlebot".

I think your problem is mostly buried in the structure of that page, cause Google reaches that page and I think it's even unable to understand the links to the subsequent pages "Register", "Login", "Support". The links are not realized in plain HTML (which Google's crawler understands very well) but in Javascript, and the link targets are not detectable anywhere in the page source code. The same is true for the subsequent pages themselves.

The second problem is that the "crawler accessible" part of your site consists only of these 4 pages ... after that, the pages either contain no usable further links or need login credentials / form data to go further. Your login page even contains some kind of simple captcha. If there's deeper content after the login, Google would never find it that way (and simply can't).

I don't like to repeat myself, but if you want to traw traffic to your site, you'll need to
make your pages better readable for crawlers, and
provide free accessible content like a preview or tour.
0
 
LVL 10

Author Comment

by:HuaMinChen
ID: 40579219
I think your problem is mostly buried in the structure of that page, cause Google reaches that page and I think it's even unable to understand the links to the subsequent pages "Register", "Login", "Support". The links are not realized in plain HTML (which Google's crawler understands very well) but in Javascript, and the link targets are not detectable anywhere in the page source code. The same is true for the subsequent pages themselves.

What changes should be applied to such pages? Thanks.
0
 
LVL 13

Expert Comment

by:frankhelk
ID: 40579541
To explain that matter is IMHO a bit too complex for a question. To make it short: In HTML a usual link consists of some code like this:
<a href="http://my-friend.co/somepage">Some Page</a>

Open in new window

The important part is the quoted text after href= because that's what crawlers need: The link to the connected page.

On your page, for example, the link to your support page reads (you might ignore the class, id and style info for that example):
<a id="lb_supp" class="lb_log" href="javascript:__doPostBack(&#39;lb_supp&#39;,&#39;&#39;)" style="color:#0A2757;font-family:Times New Roman;font-size:8pt;">Support</a>

Open in new window

The href info tells the browser to call a javascript function with some cryptic parameters and receive the needed link from there. Asfar as I could see, the javascript code submits a hidden form to the server and receives the link from there.

I believe the crawlers of most search engines like Google to be pretty smart when trying to follow links. But that kind of content generation is higly dynamic - and due to that nature I bet that the crawlers not even WANT to follow those links. That style of page is sufficient for database driven things and pages that change content without reloading. But it's a dungeon of doom for every crawler.
0
 
LVL 10

Author Comment

by:HuaMinChen
ID: 40588149
Thanks a lot. Is this
<a href="http://my-friend.co/somepage">Some Page</a>

Open in new window


the line I should ensure it is existing within robots.txt?
0
 
LVL 13

Expert Comment

by:frankhelk
ID: 40588615
No - what I've told you is not related to the robots.txt file. It belongs to your content pages, i.e. the page that is shown when you open http://my-friend.co/Start in your bowser.

To view your page source code, open i.e. your landing page in your browser. Click the right mouse button on some area where no link is active. In the context menu, click on "View Page Source" (or sth. similar, depends on Browser brand and version).

You'll be shown the code that describe the content, how it is to be arranged, where integrated images should be loaded from and much more. The language for that is called HTML (HyperText Markup Language), and the scripting parts of the source are done in a languange named JavaScript. Formatting is done with some descripiton language calle CSS (Cascading Style Sheets). Explaining these languages is far beyond the scope of this question, if you're not familiar with it, you should do at least some online tutorial about each of 'em.
0
 
LVL 10

Author Comment

by:HuaMinChen
ID: 40596589
Thanks. Regarding robots file, should this

http://my-friend.co/robots.txt

be accessible for crawlers to check?
0
 
LVL 13

Expert Comment

by:frankhelk
ID: 40596878
Since that file tells the crawles what parts of the site to ignore and what to see: Definitely accessible.

But again: If you want all (reachable) pages of your site to be crawled, just omit that file.
0
 
LVL 10

Author Comment

by:HuaMinChen
ID: 40599987
To view your page source code, open i.e. your landing page in your browser. Click the right mouse button on some area where no link is active. In the context menu, click on "View Page Source" (or sth. similar, depends on Browser brand and version).

You'll be shown the code that describe the content, how it is to be arranged, where integrated images should be loaded from and much more. The language for that is called HTML (HyperText Markup Language), and the scripting parts of the source are done in a languange named JavaScript. Formatting is done with some descripiton language calle CSS (Cascading Style Sheets). Explaining these languages is far beyond the scope of this question, if you're not familiar with it, you should do at least some online tutorial about each of 'em.

The pages are created using Visual studio. Can you please show with more details, for the way to adjust the pages better? Thanks
0
 
LVL 13

Expert Comment

by:frankhelk
ID: 40600415
Sorry, I would, but THAT definitely busts the scope of a question at EE.

Since you're not that familiar with the coding behind the surface of web pages, I wouldn't recommend you to use a system like VS (which creates complex, dynamic web pages) ... especially if you try to optimize them.

I think you're best served with some online tutorial about how to author web pages. If you want to master the pitfalls of the highly complex pages which VS (and other CMS) generate, you should master writing your own pages with Notepad in the first place.

I recommend you to begin with these EE article for a first glance:

HTML for the Beginner - My First Web Page

Then you could i.e. try some online tutorials like

HTML Tutorial - W3Schools

or
HTML Beginner Tutorial

If you're thru with those tutorials, just take a look at your own pages sourcecode and try to understand what's going on there ... and after that, try (or at least imagine) to produce some code that's able to understand that, too.

I think that you've just picked the wrong system for creating your web pages ... bout that's a subject for another question.
0
 
LVL 10

Author Comment

by:HuaMinChen
ID: 40609463
Nonetheless: If you inspect the access log of your site, you could check if and how often Google visits your pages. It could also reveal which pages it visits (look for the user agent info).

Where is the access log?
0
 
LVL 13

Expert Comment

by:frankhelk
ID: 40609718
That depends on your system ... I can't know.

Read the manuals or (if your server is rented) ask your service provider's hotline staff.
0
 
LVL 10

Author Comment

by:HuaMinChen
ID: 40618363
Hi Frank,
I do adjust meta names of the starting page of the site, that is
http://my-friend.co/Start

 My question is, if I expect to reach it, by entering "make friend" on Google, without double quotes, what to further adjust, per the given "Search engine optimization starter guide"?
0
 
LVL 13

Expert Comment

by:frankhelk
ID: 40618419
I'm sorry, but I'm not a professional search engine optimizing coach ... I know some basic concepts.

The main point is that search engines like Google try to put the most interesting pages on top of the results. The algorithms for that resemble fine art or magic, and they're top secret. But there are some ever important things to mention who influence the ranking:

There must be interesting content, meaning the pages have to contain text that is not part of images (so the crawlers could read and parse it)
The coding of the pages must be easy to interpret for the machines
The page should be referenced, which means that other pages of other sites contain links to it (the more liks exist to a page, the more it seems to have interesting content)
The aforementioned links MUST NOT BE on pages called "free for all link farms", which describes pages where everybody could place a link to his own page ... being listed there has a bad influence

To keep it short, a page like yours - with just a nice image and some buzzwords on it - wouldn't motivate Google to list it top rank for the search you intend it to.

There's a saying that success is 1% inspiration and 99% transpiration - you've seem to already have that 1%, so you'll have to go on and get the 99% by generating FREE TO THE PUBLIC content for your pages.

Besides of making the HTML coding of your pages more simple and interpretable, begin with a description of your site (who are we, what do we want, what's the intention of the site, and why is it better than each of the bazillion other friendmaking sites out there). Then generate some preview pages to show the user what he will get after registration and login. Show some "member of the week" pages where the images and personal infos are blurred. Create a page with a pricing plan ... and so on.

A good advice would be to meditate about this: If I were a person that stumbles on that page the first time, would it lure me into disclosing my personal info to get in ?
0
 
LVL 10

Author Comment

by:HuaMinChen
ID: 40648603
Sorry, by following the Search engine optimization guide, do you think I can have the site really searchable, or each to reach, by Google, Yahoo and etc?
0
 
LVL 13

Expert Comment

by:frankhelk
ID: 40648683
To make it short ... anything hidden behind a login page is hidden to any search engine as it is to not-logged-in visitors, and can't be made searchable - due to simple logic reasoning.

If you need the interior of your site searchable to your users (provide ease of use without the need to attract additional visitors), you'll have to install some search engine software into your site. I'm sure that there a numerous products available for that (even while I lack experience with), and the CMS you use might contain such things already ...
0
 
LVL 10

Author Comment

by:HuaMinChen
ID: 40648703
Sorry, what is CMS?
0
 
LVL 13

Expert Comment

by:frankhelk
ID: 40648719
Sorry, what is CMS?

I've already explained that in a previous comment about 5 weeks ago ...
0
 
LVL 10

Author Comment

by:HuaMinChen
ID: 40648735
Do you have any examples of "search engine software"? thanks a lot
0
 
LVL 13

Expert Comment

by:frankhelk
ID: 40648818
Sorry, I don't have such software examples that I've tested myself.

I fear you have to do a  Google search for it. I did a quick one for "intranet site search engine" (because intranets are intentionally invisible to Google et. al.) and it revealed i.e.

http://www.wrensoft.com/zoom/
http://www.searchblox.com/solutions/intranet-search
http://www.searchtools.com/

Seems there are dozens of such tools available ...
0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

Windows 7 does not have the best desktop search built in. This is something Windows 7 users have struggled with. You type something in, and your search results don’t always match what you are looking for, or it doesn’t actually work at all. There ar…
Styling your websites can become very complex. Here I'll show how SASS can help you better organize, maintain and reuse your CSS code.
Viewers will learn one way to get user input in Java. Introduce the Scanner object: Declare the variable that stores the user input: An example prompting the user for input: Methods you need to invoke in order to properly get  user input:
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now