Link to home
Start Free TrialLog in
Avatar of cloudtechnician
cloudtechnician

asked on

RDS Single AZ vs Multi AZ Backup & Restore Benchmarking

Hi,

I need to perform a backup and restore benchmarking for Amazon RDS in Single AZ vs Multi AZ and share the data with client so that he can make some decision based on my inputs with his customers before signing an SLA.

Could someone please give me some ideas how to do benchmarking, what and all tests do I need to perform.

Thanks in advance
Avatar of btan
btan

I see it more of testing the durability and availability stand points between single and multiple AZ. And since user is to pay based on what is used, these may be test metric to measure the difference if the use cases. E.g. from the RDS FAQs

- Throughput related-
•DB Instance hours – Based on the class (e.g. Standard Small, Large, Extra Large) of the DB Instance consumed. Partial DB Instance hours consumed are billed as full hours.
•I/O requests per month – Total number of storage I/O requests you have (for Amazon RDS Magnetic Storage only)
•Provisioned IOPS per month – Provisioned IOPS rate, regardless of IOPS consumed (for Amazon RDS Provisioned IOPS (SSD) Storage only)
•Data transfer –Internet data transfer in and out of your DB Instance.

- Storage related -
•Storage (per GB per month) – Storage capacity you have provisioned to your DB Instance. If you scale your provisioned storage capacity within the month, your bill will be pro-rated.
•Backup Storage – Backup storage is the storage associated with your automated database backups and any active database snapshots you have taken. RDS provides backup storage up to 100% of your provisioned database storage at no additional charge. Backup storage is only free for active DB Instances.

Not sure the best test means can really proof using the metric effectively to proof the point to your user. Cases such as live data replication, or shorter maintenance time is transparent to user. Maybe the strategy is to work on verifying one important aspects of running multi AZ (against single AZ) on its availability:- e.g.

A) If one of the AZ in a region goes down, the production application traffic is automatically routed to the RDS in the alternate AZ.

B) If DB maintenance and upgrades are applied to the RDS per AZ basis (for a multi-AZ RDS), there would be minimal impact to uptime while user is still accessing the application.

Henceforth, with respect to cost, it is totally up to the nature of your user application hosted to map its degree of downtime tolerance that it can sustain. Its down still to user appetite on a
> Cost (e.g. multi-AZ proof on long run saving not short run like single on cheaper out front subscription) and
> Uptime trade-off (e.g. multi-AZ handled better in unexpected and uncalled circumstances above, minimally user can have a "sleep better" assurance compared to single AZ outages can have significant downtime impact on critical "live" apps, if applicable ).
Avatar of cloudtechnician

ASKER

Thanks btan for sharing your comment.

I'm mainly looking to find out the different ways to test the High Availability, time to restore in case of a failover, RTO, RPO etc., For example, if my DB instance in single AZ goes down how much time would it take to restore from RDS snapshot, test point in time recovery etc and benchmark all these.
ASKER CERTIFIED SOLUTION
Avatar of btan
btan

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks Btan. One last thing, basically I want to measure the performance when app is connected to RDS in the same AZ and test again when app is connected to RDS in another AZ, what are the ways to test it and make sure we do the correct benchmarking.

Thank you!
I suggest instead to focus on two area
- user performance: measure for single at the two AZ via client app with network monitor (req/resp stats e.g. wireshark or even window perfmon) and also appl based stats (esp if using browser to turn  on its developer mode and see the req/resp stats). Move on to concurrent multiple user and re-measure those performance.
-rds performance: this remains as mentioned on those stats available in AZ, no difference. The stats will differs based on user performance scenario (single vs multiple).

They will form a baseline average if you can get a median tier range acceptable (from slowest to fastest req/resp use case above)...pardon as just seeing a high level inputs instead of details.