Rsync Mirroring Problem

This question has two parts and so I will award it either together or half and half.  I am trying to do mirroring between two servers so that i can start load balancing Apache.  I am running Pound on another machine to do the balancing.  None of that really matters other than background.

I am planning to run a bash script that loops continually, and runs an rsync command to push the files.  It actually works great except that:

1) When the loop fires around again (I have a 30 second wait) rsync hits like 50-90% CPU use.  I think this is while it is compiling the list to send.  Is this normal for rsync?????

2) How do I get this to run in the background?  I can get it to go away, and get back to the command shell (I forget what I did).  But when I exit the ssh session, it stops running.  I don't want it to be cron because it just loops.. so I just want to start it once and go.

Here is the script

#!/bin/bash

SLAVE="slave.server.com"

BASE_DIR=/home/

EXCLUDES=/etc/myscripts/excludes

# endless loop
while [ : ]
do
      /usr/bin/rsync -Cavz -e ssh --exclude-from=$EXCLUDES $BASE_DIR $SLAVE:$BASE_DIR
      
   sleep 30

done
fiscAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

jonesy2kCommented:
call it up with nohup and background it :

nohup ./script.sh &

that will keep running even if you log out.

HTH
Jonesy
jonesy2kCommented:
As an alternative, you could use cron, remove the loop from your script, and just run it every minute with cron.
I know you didn't want to use cron, but that is another option....
Jonesy

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
fiscAuthor Commented:
nohup will probably work... that's a good command to know to add to my growing Linux vocabulary.  I actually got it to work by doing at xx:xx script.sh where xx:xx was one minute in the future--that got the job done.  I'll try out nohup and if it works award you the points for that part.  The reason I don't want to use cron every minute is because I'm afraid of the cron jobs running into eachother and getting the system into a downward spiral.

What about the CPU usage?  It's only for a few seconds, but when I have watched it there hasn't been any new files either so I'm worried about system performance.  But maybe it's just normal for it to shoot up that high at first, I don't really know.  I'm kind of hoping to hear from someone who has used rsync and experienced it.  It will go up like 70-90-80-60-30-10 and out in about a 3-4 second period.

Jason
Your Guide to Achieving IT Business Success

The IT Service Excellence Tool Kit has best practices to keep your clients happy and business booming. Inside, you’ll find everything you need to increase client satisfaction and retention, become more competitive, and increase your overall success.

decoleurCommented:
rsync is starting its process by performing a differential analysis of the stores to be synched and the resource utilization for the intial startup will allways be a bit tight and it will only get worse as your stores get more convoluted.

rsync is probably not your friend for this, it handles open files poorly and has a hard time playing nice with other instances of itself.

now what i am interested in finding out is what lead you down this path to begin with. there are less kludgie solutions to provide load balancing.

regards,

-t
fiscAuthor Commented:
decoleur - What do you suggest instead?  I read a lot of similar posts on Experts Exchange and elsewhere of people wanting to do mirroring and rsync was suggested much of the time.  It seems great in that it only transfers the diffs of the files, and not the whole file.  

Bottom line is I want a cheap, easy solution for both load balancing and failover, and this seems to provide it; although, I'm open to other ideas that accomplish the same.  

This rsync script has been running for a few days now with no noticable performance hit; however, in the summer our web site also gets about 1/3 the traffic it does in the fall.
decoleurCommented:
rsync is great for making backups, I use it in production environments to backup a fleet or linux and redhat servers.

It is hard to design a load balancing solution without understanding your application.

I also think you have two different issues that would need to be addressed in that design:
distributing load between your servers and if it is necessary to stick a user to a single box for the duration of their session.

what we do is provide a load balancer in front of a bunch of web servers that redirects users to a specific box and keeps them on it. our web applications are based on jsp and deployed in JBoss/tomcat with a single database server store that is accessed by all the web servers. The users are able to jump from boxes to boxes until they authenticate to the application and from that point on we have them stuck to a single server.

It would be possible for you to use apache to cluster or load balance between the tomcat servers. I like JBoss because you can drop the current site into a "farm" directory and it will deploy it to the other ervers in the cluster in less than 30 seconds.

all of this is open source.

google apache load balance or cluster for more

some winners look like:

http://www.backhand.org/mod_backhand/
http://www.samag.com/documents/s=1155/sam0101a/0101a.htm
http://raibledesigns.com/tomcat/
http://www.jboss.org/products/index

HTH

-t
fiscAuthor Commented:
Thanks.  I am using Pound, which as I understand it has the ability to direct a user to the same machine every time based on their sessid.  I am either going to do that OR set up NFS to share the session directory; although, I suppose that would then again create a single point of failure.... so maybe I'll stick with the single session, single machine thing.

I have 5 servers.......    1 MySQL, 2 Apache, 1 DNS/Mail/Pound, 1 that is Apache now but I will probably be turning into a caching server (memcached) and our Adserver (Apache/PHPAdsNew)  all Feodora Core 2/3

fiscAuthor Commented:
The site is PHP/MySQL, heavily dynamic with news content and forums.  I'll be setting up a MySQL slave as well to remove that single point of failure.
decoleurCommented:
so where is the need for rsync? what role was it filling?
fiscAuthor Commented:
I need it to sync the two apache servers.... they will be mirrored and load balanced between them through the pound server.
decoleurCommented:
I am trying to understand what needs to be mirrored, if your site is dynamic aren't you using the db to hand the dynamic bits?
If you are having the sites dynamically update text pages then you are in for it, rsync will not help.

if you had two servers A and B and they were both hosting copies of the same dynamic blog that is built on flat files...

if a user added content to A, it would affect the content on A and it would get queued to update itself on B

but what if before B got updated another user updated the same file on B, rsynch could still overwrite it and cause the addition on B to be lost.

rsync isn't that sophisticated good for duplicating an original to a back up but not so hot if both sides are live.

IMO your best bet is to have a single store that is on a drive array that all your web servers can access.

HTH

-t
fiscAuthor Commented:
*sigh* Apache hosts the PHP scripts that call the database information, hosts photos, limited static content, etc.  A lot of our text content comes from the database, yet, but you're missing the point.

B will never be updated, because A is where any FTPed files will go.

Situation #1 - Machine goes down.... I don't really care if the data is safe in a RAID Array or not, it's still down.  If I have them mirrored, Pound will see it's down and direct all to the other.

Situation #2 - We're getting pounded with traffic and Apache and/or PHP is bogging down... we want load balancing.  Again RAID array or not... it doesn't matter that server is still swamped!  Load balancing!
fiscAuthor Commented:
Okay, I didn't think what I was doing was too overly complicated or rare, but let me explain exactly what is going on here.

Servers.....
A = Apache (Master)
B = Apache (Slave--never updated directly, only mirrored from A, would be the rsync destination)
C = MySQL
D = DNS/Mail/Pound

A hosts photos, PHP scripts that call database content, and limited flat html files.  A pushes its updates to B.

C hosts the database and has articles, forum posts, etc pulled from it.

D is the first stop and Pound (running on this server) will balance the load between A and B.

I just want to get the best method to mirror the two.  Right now I think rsync will work great, but I can be convinced otherwise.  
decoleurCommented:
alright call me a primate, I get what you are doing now.

it is not overly complicated and you should be able to do what you need with rsync.

issue 1 you can mitigate the processor and IO uitlization and by using the bwlimit flag for rsync, in looking at a local switched network you can try --bwlimit=5000

confirm by running while watching top on the rsync initiator

issue two would be to create your script and run it as a cron job every half hour. that way you should be able to move the files over in a timely fashion and not have to worry about overlapping rsync.

sorry to be so dense.

-t
fiscAuthor Commented:
I'm sorry if I came off harsh.. I just wasn't looking to be talked out of the mirroring/load balancing, but rather get suggestions on how to best implement it.  I have thought it through quite a bit and think this is definitely a step in the right direction.

This is not hardware I have physical access to so I can't add boxes and network them at will... rent dedicated servers from a host in Jersey.

I'll look at the bandwidth switch, that might be a good idea.  I don't want to wait every half hour because that might not be frequent enough.  If we post an article with photos (photos are stored in the file system, not db) then if someone gets balanced to the B server they won't see those photos until it synces up 30 minutes later.  So it has to be as near instantaneous as possible.

I'm pretty good, but by no means an expert... I think at this point clustering is a bit over my head and this is not yet full time so I don't have the time to delve into it too much. But I'll take a look at the links you gave.  I was looking at Linux Virtual Server for one.

For the CPU usage issue... will the bandwidth flag help with that?  It seems like the CPU usage is because it is compiling the list of files (i.e. before it transmits them to the mirror server) so seems like limiting the bandwidth wouldn't effect that at all.  I was thinking maybe making it a low priority process...?  I'll award points, just want to be sure that either I have something I can do to try to use less CPU % or decide it's nothing to worry about.
decoleurCommented:
no worries it isn't easy trying to fix the worlds problems for free.

the bandwidth switch will help, but i am not sure if rsync is going to give you the fast response that you want...

i just went round asking for a few bits from the locals, and here is what they said:

redhat cluster has a global file system that allows you to share files over multiple servers and can add or drop servers on the fly.

also why not use your databases, you can drop the files into a master db with a slave on the other box.

for higher volume you will want to move the db off of the web servers, but wait you already have that... I think you could leverage the db to store your images, but then you probably already decided against it.

the general consensus is that there really isn't good real time file system based replication off the cuff, the rh solution costs @500, but databases have to do it to scale to be able to process the volume they can be given.

HTH

-t
fiscAuthor Commented:
I've thought about putting the photos in the database, obviouslly... but at this point it doesn't work for me becasue a) the database is already getting hammered so until I get memcached and slave servers going I want to be careful how much load I put on it and b) we sometimes publish photo galleries with hundreds of photos... what a pain it would be to load each into mysql one at a time when we can just ftp up an entire directory.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Linux

From novice to tech pro — start learning today.