Link to home
Start Free TrialLog in
Avatar of fisc
fisc

asked on

Rsync Mirroring Problem

This question has two parts and so I will award it either together or half and half.  I am trying to do mirroring between two servers so that i can start load balancing Apache.  I am running Pound on another machine to do the balancing.  None of that really matters other than background.

I am planning to run a bash script that loops continually, and runs an rsync command to push the files.  It actually works great except that:

1) When the loop fires around again (I have a 30 second wait) rsync hits like 50-90% CPU use.  I think this is while it is compiling the list to send.  Is this normal for rsync?????

2) How do I get this to run in the background?  I can get it to go away, and get back to the command shell (I forget what I did).  But when I exit the ssh session, it stops running.  I don't want it to be cron because it just loops.. so I just want to start it once and go.

Here is the script

#!/bin/bash

SLAVE="slave.server.com"

BASE_DIR=/home/

EXCLUDES=/etc/myscripts/excludes

# endless loop
while [ : ]
do
      /usr/bin/rsync -Cavz -e ssh --exclude-from=$EXCLUDES $BASE_DIR $SLAVE:$BASE_DIR
      
   sleep 30

done
Avatar of jonesy2k
jonesy2k

call it up with nohup and background it :

nohup ./script.sh &

that will keep running even if you log out.

HTH
Jonesy
ASKER CERTIFIED SOLUTION
Avatar of jonesy2k
jonesy2k

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of fisc

ASKER

nohup will probably work... that's a good command to know to add to my growing Linux vocabulary.  I actually got it to work by doing at xx:xx script.sh where xx:xx was one minute in the future--that got the job done.  I'll try out nohup and if it works award you the points for that part.  The reason I don't want to use cron every minute is because I'm afraid of the cron jobs running into eachother and getting the system into a downward spiral.

What about the CPU usage?  It's only for a few seconds, but when I have watched it there hasn't been any new files either so I'm worried about system performance.  But maybe it's just normal for it to shoot up that high at first, I don't really know.  I'm kind of hoping to hear from someone who has used rsync and experienced it.  It will go up like 70-90-80-60-30-10 and out in about a 3-4 second period.

Jason
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of fisc

ASKER

decoleur - What do you suggest instead?  I read a lot of similar posts on Experts Exchange and elsewhere of people wanting to do mirroring and rsync was suggested much of the time.  It seems great in that it only transfers the diffs of the files, and not the whole file.  

Bottom line is I want a cheap, easy solution for both load balancing and failover, and this seems to provide it; although, I'm open to other ideas that accomplish the same.  

This rsync script has been running for a few days now with no noticable performance hit; however, in the summer our web site also gets about 1/3 the traffic it does in the fall.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of fisc

ASKER

Thanks.  I am using Pound, which as I understand it has the ability to direct a user to the same machine every time based on their sessid.  I am either going to do that OR set up NFS to share the session directory; although, I suppose that would then again create a single point of failure.... so maybe I'll stick with the single session, single machine thing.

I have 5 servers.......    1 MySQL, 2 Apache, 1 DNS/Mail/Pound, 1 that is Apache now but I will probably be turning into a caching server (memcached) and our Adserver (Apache/PHPAdsNew)  all Feodora Core 2/3

Avatar of fisc

ASKER

The site is PHP/MySQL, heavily dynamic with news content and forums.  I'll be setting up a MySQL slave as well to remove that single point of failure.
so where is the need for rsync? what role was it filling?
Avatar of fisc

ASKER

I need it to sync the two apache servers.... they will be mirrored and load balanced between them through the pound server.
I am trying to understand what needs to be mirrored, if your site is dynamic aren't you using the db to hand the dynamic bits?
If you are having the sites dynamically update text pages then you are in for it, rsync will not help.

if you had two servers A and B and they were both hosting copies of the same dynamic blog that is built on flat files...

if a user added content to A, it would affect the content on A and it would get queued to update itself on B

but what if before B got updated another user updated the same file on B, rsynch could still overwrite it and cause the addition on B to be lost.

rsync isn't that sophisticated good for duplicating an original to a back up but not so hot if both sides are live.

IMO your best bet is to have a single store that is on a drive array that all your web servers can access.

HTH

-t
Avatar of fisc

ASKER

*sigh* Apache hosts the PHP scripts that call the database information, hosts photos, limited static content, etc.  A lot of our text content comes from the database, yet, but you're missing the point.

B will never be updated, because A is where any FTPed files will go.

Situation #1 - Machine goes down.... I don't really care if the data is safe in a RAID Array or not, it's still down.  If I have them mirrored, Pound will see it's down and direct all to the other.

Situation #2 - We're getting pounded with traffic and Apache and/or PHP is bogging down... we want load balancing.  Again RAID array or not... it doesn't matter that server is still swamped!  Load balancing!
Avatar of fisc

ASKER

Okay, I didn't think what I was doing was too overly complicated or rare, but let me explain exactly what is going on here.

Servers.....
A = Apache (Master)
B = Apache (Slave--never updated directly, only mirrored from A, would be the rsync destination)
C = MySQL
D = DNS/Mail/Pound

A hosts photos, PHP scripts that call database content, and limited flat html files.  A pushes its updates to B.

C hosts the database and has articles, forum posts, etc pulled from it.

D is the first stop and Pound (running on this server) will balance the load between A and B.

I just want to get the best method to mirror the two.  Right now I think rsync will work great, but I can be convinced otherwise.  
alright call me a primate, I get what you are doing now.

it is not overly complicated and you should be able to do what you need with rsync.

issue 1 you can mitigate the processor and IO uitlization and by using the bwlimit flag for rsync, in looking at a local switched network you can try --bwlimit=5000

confirm by running while watching top on the rsync initiator

issue two would be to create your script and run it as a cron job every half hour. that way you should be able to move the files over in a timely fashion and not have to worry about overlapping rsync.

sorry to be so dense.

-t
Avatar of fisc

ASKER

I'm sorry if I came off harsh.. I just wasn't looking to be talked out of the mirroring/load balancing, but rather get suggestions on how to best implement it.  I have thought it through quite a bit and think this is definitely a step in the right direction.

This is not hardware I have physical access to so I can't add boxes and network them at will... rent dedicated servers from a host in Jersey.

I'll look at the bandwidth switch, that might be a good idea.  I don't want to wait every half hour because that might not be frequent enough.  If we post an article with photos (photos are stored in the file system, not db) then if someone gets balanced to the B server they won't see those photos until it synces up 30 minutes later.  So it has to be as near instantaneous as possible.

I'm pretty good, but by no means an expert... I think at this point clustering is a bit over my head and this is not yet full time so I don't have the time to delve into it too much. But I'll take a look at the links you gave.  I was looking at Linux Virtual Server for one.

For the CPU usage issue... will the bandwidth flag help with that?  It seems like the CPU usage is because it is compiling the list of files (i.e. before it transmits them to the mirror server) so seems like limiting the bandwidth wouldn't effect that at all.  I was thinking maybe making it a low priority process...?  I'll award points, just want to be sure that either I have something I can do to try to use less CPU % or decide it's nothing to worry about.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of fisc

ASKER

I've thought about putting the photos in the database, obviouslly... but at this point it doesn't work for me becasue a) the database is already getting hammered so until I get memcached and slave servers going I want to be careful how much load I put on it and b) we sometimes publish photo galleries with hundreds of photos... what a pain it would be to load each into mysql one at a time when we can just ftp up an entire directory.