Highavailability For Apache server

Hi

i have quite good knowledge on Linux and trying to find out the best suitable way to allow multiple Apache server to read its content

Here , I will have 6 Apache server for only one website.
that website will serving pages from this 6 physical Apache server simultaneously  .
There will be very frequent upload and download from this server.
so basically.. when user will upload or change a file ( file in file system,not database)  that changes will have to be replicated to rest 5 Apache server.

now 2 solution is in my mind

(a) allow all apache server to read from one location

so have Centralized location (nfs server  or DRBD , example : /data   and allow all 6 apache server to read from there .

: tried this way.. dont like this concept .


(b)  allow all apache server to read from their own file system but some how synchronized those file between 6 file server  ( currently this is suitable but trying to find out the effective way to do )

(c) dont know what kind of facilities SAN storage has ...

(d) Synchronizing software (example aronis, or some thing) is not a options
so I want to go with option B, but synchronizing is a big problem .. ( simple RSYNC is not a solution )


if you have this kind of production environment.. can you tell me how are you doing that ??

Thanks

 
LVL 29
fosiul01Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

jeremycrussellCommented:
One option is to use a source code control system... like CVS, Subversion, etc...

Essentially, setup the root of your webserver as a project in the SCCS, have the web folks commit their changes to it.

Then, on each webserver, just checkout the project into the doc root.  Then, when you want to push changes out, all you have to do is run  an update (for instance, in you were using Subversion, on each machine, cd to docroot and run svn update).  This would be easily scripted, and you gain some benefits...

You can schedule changes to where the web folks can make many, and you update when you want to "publish" them.  Another feature would be the ability to easily "roll back" a change if there is a problem, or something went out that didn't need to...  you will have a versioned history of your website, good audit trail.


Something else you could look into is some type of CMS, where your website is mostly dynamically created and read from a backend DB...  Many open source/freely available CMS out there.  This would mean the "files" on the servers wouldn't change much, if at all, but content changes would be kept in a DB that all the servers would connect...

Even one more, would be to use the two previous methods together...
fosiul01Author Commented:
@jeremycrussell
Thanks

but here i am not worried about Source Control . I am using puppet + Fenc
which i can easily deploy all source code to every server.


here i am talking about users point of view. where user of that website they will upload and download files ( ms word, excel and others )

so i need those files to be replicated over 6 apache server without any delay
hope this make sense

thanks
jeremycrussellCommented:
Ah, I see now...


Ok, well, that may be where you want to use a CMS or some sort, so the documents are actually inserted into a RDB, not sure what kind of programming skills are available to you, but this could be accomplish with some rather simple perl, php or python CGIs.  Of course, you would have to have a DB infrastructure as well.

It sounds like you really wanting a distributed or HA filesystem of some sort...  you might check out LAFS or Lustre...
OWASP Proactive Controls

Learn the most important control and control categories that every architect and developer should include in their projects.

fosiul01Author Commented:
It sounds like you really wanting a distributed or HA filesystem of some sort...  you might check out LAFS or Lustre... :


yes some sort of distributed file system...

because changes happening on file system level.


what you meant by lafs ???

i can use GFS but same.. i need some sort of synchronizing between directories ..
jeremycrussellCommented:
I was refering to Tahoe LAFS, an open source dfs project... but, I don't think it would suit this purpose...

GFS doesn't really suit this as well due to how it works...  Lustre is the closest fit to what you're trying to do...  However, I'm not certain on how involved it would be to get it working in this scenario.
fosiul01Author Commented:
hmm
its almost same concept as AFS

i might go with AFs ...

arnoldCommented:
How are users uploading files?  Do they upload using FTP or are they using a web based interface?
If it is a web based interface, part of the upload process would be to copy the uploaded data to the other servers.

DRBD works, but it adds a significan amount of network traffic which presumably this is what you do not like.

If they are in the same location, NFS is likely the best option.
This way provided the servers have two nics, you can have one that will be serving the pages while the other accessing the NFS share.
fosiul01Author Commented:
hi arnold thanks

if i use nfs then i will have to use same locaion to read and write by 6 apache server. Then its a single point of failure.. So with drbd i can prevent that by adding 2 nodes. But it will put significant amount of presure on this server and data have have to travel apache then nfs then back again.

So i want to use each apache server to read and write from  their own hardrive but want to replicate that data to every server.


Here user uploading via ftp and web interface.

arnoldCommented:
What you could do is "cluster" the ftp/upload server to write to an NFS share. While the reading apache will have a cron job that will synchronize the data from the NFS to the local directory.

i.e. to upload data the user has to use upload.domain.com while to view data they will go to www.domain.com.
upload.domain.com -> ftp/web upload only

The difficulty here is you either have to trigger the sync, or the uploaded files will have to wait the set interval to get the data copied.

The most complex issue is when the user deletes a file/directory.

Having 6 options where a user can upload/access could lead to a conflict if the user delegates updates to multiple people.  The version control option  jeremycrussell suggest in the first response, could be useful.


fosiul01Author Commented:
The difficulty here is you either have to trigger the sync, or the  uploaded files will have to wait the set interval to get the data  copied.    :

Yup this is the issue

and cron will not be able to handle ..this

so i m trying to find other idea without using cron ..

any advise on this ??

or i will check on Tahoe LAFS .. which is distributed file system



arnoldCommented:
It does not have to be cron, you can setup a script that will continually monitor the location for newly uploaded files. and then trigger the event which is likely what the items suggested do.

For FTP you could monitor the ftptransfer log, and trigger based on that as well as monitor the apache log and trigger based on that.
fosiul01Author Commented:
hmmm...
deep down we know that rsync is not the correct soltuon is not it ??
there would be race condition always

and you cant even guarantee that
you upload a file in Server A (example : 5 mb)

and another user from Server B, is trying to acess that file at almost same time

because, Apache request will go via round robin

you can not guarantee that script will copy as efficently as it should

can you ??



arnoldCommented:
There are always alternatives.  Upload to one location (master) and then synchronize to the others versus upload to any location and synchronize to others without clobbering what might have been uploaded to the others.
Si


The scenario you describe is often the case even when there is a single server and a single place to upload.  This deals with the "cache" setting on the apache server.

Monitoring the directory for changes. i.e. you have an infinite loop that checks the directories of interest for newly added files since the last scan.  You monitor the ftp transfer log for the deletion. The problem could be that should a person delete documenta and then upload documenta.

There is no such thing as efficiency where you have to setup scripts/process and logic.  You mean efficiency in terms of only transferring a file that has changed without contributing to the load of the server.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
fosiul01Author Commented:
hi hay

i just installed AFS file system (andrew file system)

its the one i want ..

it has a cell server , you linked other server to those cell.

you can give permission to a  user who can access cell, and what ever you upload into cell server.. it will automatically replicate to rest of server depends on your permission..
its basically Rsync + DRBD mixture ..

i just implemented in 2 server. looks amazing






legolasthehansyCommented:
Hi Fosiul01,

Is there a link available for download for AFS? I would like to try this.

Thanks!
Lego
fosiul01Author Commented:
Hi  legolasthehansy

Main website is :

http://docs.openafs.org/index.html

Bellow link is More or less same of afs website  , but without explanation
http://redflo.de/tiki-index.php?page=AFS+Software+installation&structure=SSO+and+Central+Administration+with+Kerberos+and+LDAP&page_ref_id=100 

for fedora
http://openafs-wiki.stanford.edu/AFSLore/FedoraAFSInstall/


But follow the first link. just follow all the commands , it not that hard to implement .


but remember, you need Kerberos  authentication system for AFs file sytem
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Programming

From novice to tech pro — start learning today.