how does this site work..?

what is the trick behind the scenes here at EE, I mean how come all question are .htm files, how does it work and why is it like that..? would it be ASP equivalent if I do something like this?

write out entire and then set content type like so..?

> Response.ContentType = "text/html"
LVL 13
davidlars99Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

aprestoCommented:
you would prob need an experience EE expert such as Ftb, alorentz sean powell or CD& (and other) to fully answer this.

>>>would it be ASP equivalent if I do something like this?

as far as i know EE was developed (volounterily) by a number of experts, you would need to put in some SERIOUS man hours to do something like this.  

Now that you meantion it...why are the files htm files, good spot, never noticed that?! You know i will be reall yinterested in the outcome of this question, is this open for discussion aswell as a potential answer?
jmelikaCommented:
I did notice the .html thing when I first joined.  I was very curious too, but not curious enough to start a question on EE to find out :D

One thing though, I did notice about a week or so ago that the site crashed.  It happened for barely a few seconds, but I was able to find out that the web server is Apache.

If you look at the source sode, everything is JSP.  This form I am typing in right now submits to
<form onSubmit="return submitOnce();" name=answerQuestionForm method=POST action="answerQuestion.jsp">

I think what answerQuestion.jsp does is it BUILDS an HTML file out of your answer.  So the HTML files are not pulling the "data" of the questions out of a database, but they are static HTML files built when the form is submitted.  I have to say it's really smart because imagine your overhead with a database if you had to handle such a highly demanded site like this?

The database is probably only handling the Subscribe, Bookmark, Experts Levels and their ranks, etc.  The questions and the answers are held inside static HTML files.

That's what I concluded.

JM
humeniukCommented:
These are all dynamic pages, but the URLs are changed using an Apache feature called Mod Rewrite.  What it does is parses URLs in a way that allows the use of more user friendly (and search engine friendly) URLs than the standard complex and confusing ones that often appear with dynamic sites.

"would it be ASP equivalent if I do something like this?"
Actually, it's JSP in the case of E-E, although it could be done with PHP or ASP.  Check out the un-parsed URL for the Printer Friendly version of this page: www.experts-exchange.com/Web/viewQuestionPrinterFriendly.jsp?qid=21227030.  (Actually, it's 'less-parsed', not un-parsed - as you can see, you can still reach it at www.experts-exchange.com/viewQuestionPrinterFriendly.jsp?qid=21227030 showing that the '/Web/' represents the parsed URL, not the inherent directory structure of the website).

This was discussed fairly extensively here: 'HTML Pages in E-E' -  www.experts-exchange.com/Web/Q_21221680.html.  Take a look for more info.
Learn SQL Server Core 2016

This course will introduce you to SQL Server Core 2016, as well as teach you about SSMS, data tools, installation, server configuration, using Management Studio, and writing and executing queries.

humeniukCommented:
" If you look at the source sode, everything is JSP.  This form I am typing in right now submits to
<form onSubmit="return submitOnce();" name=answerQuestionForm method=POST action="answerQuestion.jsp"> "
Exactly, the actual un-parsed URL would be www.experts-exchange.com/answerQuestion.jsp?qid=21227030 (or something like that), much like the Printer Friendly version above.  Click on that URL and see where you end up.
davidlars99Author Commented:
"would it be ASP equivalent if I do something like this?" mean same as "would it be same in ASP if I do something like this?"
davidlars99Author Commented:
I asked same thing a while ago and I got this in response and I guess it does make sense

http://www.experts-exchange.com/Community_Support/Q_21143159.html
humeniukCommented:
Yes, it could be the same.  The 'Mod Rewrite' is a server-based solution, not specific to scripting language (Apache how-to:  http://httpd.apache.org/docs-2.0/misc/rewriteguide.html).  It's a bit different if you're hosting on Windows/IIS (you need something like ISAPI ReWrite: www.isapirewrite.com).
davidlars99Author Commented:
ok, I sent email to COBOLdinosaur, I guess he might know...
davidlars99Author Commented:
I also would like to know experts who built this site, see the  profiles at least...   :)
davidlars99Author Commented:
no wonder why it is so fast, if I'm not mistaken there are 10 individual servers just for EE..? is that what it means..?
http://uptime.netcraft.com/up/graph/?host=www.experts-exchange.com

and company behind EE hosts pretty interestning company's websites too
http://uptime.netcraft.com/up/hosted?netname=LC-ORG-ARIN,64.152.0.0,64.159.255.255
humeniukCommented:
" if I'm not mistaken there are 10 individual servers just for EE..? is that what it means..?"
That represents the last ten servers E-E was hosted on, although several may be the same machine, but with an OS upgrade.  As you can see, the last 7 are all the same IP.  Presumably, the only real swich came with the switch from  206.169.61.185 on a TCSN connection to  64.156.132.140 on a Level 3 connection.  At the same time, that could be just the primary server with backup servers elsewhere or as part of a server cluster.  Likewise, the database is probably hosted on a stand-alone db server rather than on the web server machine.

"and company behind EE hosts pretty interestning company's websites too"
Actually, Level 3 (www.level3.com) isn't a hosting company.  They own a huge communications/internet backbone that lots of companies, data centers, etc. use for a variety of purposes, including internet access.  They also offer a lot of related services.
riyasjefCommented:
COBOLdinosaurCommented:
I'll post an evolution of the codebase a little latter.  It makes it easier to undstand how it works and what it takes to get there.

Cd&
COBOLdinosaurCommented:
Evolution of the site code:

The codebase for the site started out in the late Summer of 2000.  The whole site was converted from a mix of static page, perl, and a patchwork of pieces.  Everything was converted to JAVA and Oracle databases, by a bunch of contractors hired by the former owners who managed to blow 5 million dollars in venture capital on the project.  

In November 2000 a group of us beta tested it... we told them flat out it did not work.  It was full of problems and was no where read to go in.  However they were just about out of money so they put it in anyway.

When the new version was put in, the site was scheduled to be down for 2 days to completely re-organize the database.  It was actually down 6 days and when it came back the site was basically trashed.  The servers were crashing all the time.  We had about 30% uptime during the first month.  When they were up response time was so slow, that the browser often timed out. The load balancing did not work and the few of us that continued to work the site found work arounds for a lot of the problems, or just learned to live with some things.

For load balalncing we changed www. to www1, www2, or www3; or came in through the partner sites we had at the time like itworld.

There were rendering poblems.  If you had & or < in your post, when it got redisplay and showed as &amp; or &lt; because the java parsers were screwed up.  The work around was to enter them as &amp;amp; or &amp;lt;... then the parser got it right. All pages were returned as .jsp, and we did not show up much in search engines.

Any post over 40 lines of text returned an error page with a JAVA dump, so you had to break up your posts and hope you could get it in before the server went down.

Some questions went into a locked state so they could not be posted to, deleted, or accessed except for display.  The search crashed of there were more than 20 hits (one page), but it did not matter because it return the wrong information anyway.

The autodelete and autograder had to be dropped because the delete was putting threads in a suspended state instead of actually deleting, and the autograder was grading everything with a C grade.  To address those problems a few of us started the cleanup effort in February of 2001.  

Plus all kinds of random renderig problems; wrong returns from queries; email problems and screwed up points.  For example in the WEBDEV TA half the questions and points awarded for them disapeared overnight many of them were never recovered.

Something else we discovered was that all the old question prior to December 2000 were screwed up in the DB re-org. Most of the comments were out of order, and there was no way to reverse the problem, until almost 2 years later.

By Feruary 2001, the site was out of money, the dotcom bubble had burst and there was no ad revenue, and the partner sites had all pulled out.  In desperation they brought in Knowledge Pro ... a paid service that latter became the Premium we have now. The programming for that used up the last of the money.

By this time most of the experts had left.  Those of us that stayed kept offering to fix the site, but the owners refused citing "security" concerns.

They brought back some of the orginal developers of the site, who started patching and trying to get things to work.  Those folks took some of our advice and managed to get the site reasnonably stable; but the site was going broke.

By September 2001 it was clear the site was going to go bankrupt.  There were a number of experts using screen scrapers to download the database and 5 or 6 alternate sites were set up by experts.  There were no developers; no engineers; no mods; and Wes Lennon was the only admin.  As for experts, some days there was no more than 25 of us answering questions; and nothing worked right.

When the servers crashed we posted on one of the alternate sites, where a former employee of the site would see the post and go down the EE office and re-boot.  It was not unsual for the site to be down all day.

In November, 2001 Austin Miller and Randy Redberg bought the site out of bankruptcy.  They hired Ken Bell, who set about getting experts back on the site by spending time out on the alternate sites discussing it with us. Austin promised to fix the site, and give the experts input into what was going on.

In January 2002 Austin asked Dennis Waldron and I to form and Experts' Advisory Board; which we did.  Jansuper and Brian were brought in as site engineers to fix the site.  That is one of the really amazing parts of this.  The two of them completely re-built the codebase.  What we have today is that codebase extended and enhance; and they did it on the fly.  Plus they were willing to work for a fraction of the going rates because they believed in the site; just like those of us who stuck it out and kept this site from dying.

By Novenmber 2003, we were ready to switch to the new format of generating everything from the database and return HTML pages (primitive then compared to the way it is done today).  That was necessary so that Google could fully index the site.  That was the big boost that the site need in less than a month we went a 10000th rating on Alexa to the top 1000 and we have been there ever since; and in our market(tech help/support) we are the number one.

AFAIK the database has three primay components... the question database; the comments database, and the members database.  Virtually every query generates off of those three.  Point totals are derived, though there is some caching of common information like the top 15 listing.  That is why the top-15 are not realtime.  They only get updated a couple of times a day.  

Since the indexing by By Google and later improvements to Premium, the site generates enough money that they have been able to add servers to improve reliability and response time, and they have been able to hire additional engineering staff.  As a result a lot of improvements were made and continue to be made.  Such as the ability to split points; Page Editor tools; improvement to the mod tools; better support for cleanup; the ability to create new Topic without major outages; improvements in email; site documentation; member comments; feedback comments.  There is a complete re-organization of nav in alpha, and expanded capabilities for editors that will allow us to post and maintain articles right on the site, instead of having to post them elsewhere and point members there with links.

The key to the whole thing working is JSP and the complete seperation of content from programming.  They have a solid stable database and the data can be presented anyway they want simply by packaging a set of classes to give the desired result; and it does not affect existing functionality.  There is no way you would be able to duplicate it using PHP or ASP, but C#.net could probably be up to the task.  However C#.NET is not going to work well except with M$ servers, and a good part of why EE is so stable is the Apache/Tomcat configuration on the back end.

Cd&

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
davidlars99Author Commented:
that was great story, I'm sure everybody including myself truly enjoyed it, I have some questions though, you said " the question database; the comments database, and the members database."... are these three saparate servers with oracle database or something else...?
COBOLdinosaurCommented:
I have no idea what the current physical setup is, or detail of hoe the database is organized.  I undestand the logical design, but even on that it is at a high level.  The only ones who would have that information are the engineers and staff, who are working with the system.  It is not like something they would be willing to share, adn they certainly would not publish it on a public forum for security reason.  

I know they have added a lot of hardware, but how much is for database, I don't know.  It would be substantial.  There are well over a million threads, and I would guess probably 7 or 8 million comments, add to that the index hashes necessary for searches on all that text, and you are already into a lot of storage requirements.

It is prbably safe to say that the DB is distributed, but there are a number of different ways to organize it based on the characteristics of the tables being used and what you are optimizing for.

Cd&
davidlars99Author Commented:
thanks again
COBOLdinosaurCommented:
Glad we could help.  Thanks for the A. :^)

Cd&
aprestoCommented:
Ditto, Thanks :o)
davidlars99Author Commented:
hi, finally I figured it out by using a custom HttpHandler in .NET and it's very cool, however there is still one thing which bugs me... at this website no matter how many comments you post and then click the browser's back button it doesn't display those posted pages, it skips them all like they never been there and takes you to right where you came from, how is this possible? I'll open onother question if anybody knows how
davidlars99Author Commented:
nevermind I got it, I used

HttpContext.Current.Response.Cache.SetAllowResponseInBrowserHistory(False)
Response.Redirect("to itself")
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Web Development

From novice to tech pro — start learning today.