When using AWS as your chosen public cloud provider you will ultimately come to a point where you need to decide and define what your storage requirements are for your data that you wish to store on AWS. There are a variety of options to choose from depending on your needs, each with different attributes ranging from: temporary storage, permanent storage, highly available object based storage and even cold archival storage.
This article has been written to give you a high level overview, hopefully containing enough information to guide you in selecting the most appropriate storage service you require. Additional information has been provided for each option with links to offici al AWS documentation.
The remainder of this article will be broken down in to the following sections:
- Defining you Storage Needs
- AWS Storage Services
- Moving Data into AWS
Defining your Storage needs
To understand what you need from your storage solution you need to understand and ascertain what elements are important to the data being stored. You need to ask yourself the following questions to help and guide you to the correct service/solution for your storage.
- How critical is this data?
- How sensitive is the data you need to store?
- How often will the data be accessed?
- How large is the data?
- Who requires access to the data?
- How much are you prepared to pay for to store the data?
- Where is your data coming from?
AWS Storage Services
AWS provides a number of common services each with different attributes to allow you to select the best method of storage your data, these main services and options fall into the categories below:
- EC2 (Elastic Cloud Compute) Storage capabilities
- S3 (Simple Storage Service)
- AWS Glacier
EC2 (Elastic Cloud Compute) Storage Capabilities
When creating your EC2 instances from an AMI (Amazon Machine Image) you will typically have the choice of either using EBS backed or Instance backed
what does this mean?
Storage in relation to EC2 refers to the local storage volumes for that EC2 host, these include your boot partitions and any additional volumes to store other data formatted with a file system
like you would have with your standard server within your organisation.
EBS (Elastic Block Store) backed storage:
EBS is storage service offered by AWS and has the following key points:
(Image Source: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html)
- Permanent storage - This is one of the most important points, EBS volumes provide a permanent storage option for your instances. This means if the EC2 instance that the volume is attached to is powered off, shutdown or even terminated - the data remains on the disk. This allows you to re-attach the volume to another EC2 instance should this be required. For information on how to attached a volume to an instance click here
- Highly available - AWS automatically replicates your data 'behind the scenes' within the same AZ (Availability Zone) should your EBS volume fail in any way
- Low latency - EBS volumes are network attached volumes that are located physically within the same AZ providing very low latency of traffic. You can have more than one volume at a time attached to an instance, however you can only have 1 EC2 instance attached to a volume at any time
- Block Storage - EBS allows you create a file systems on top of any volume to allow you to use the volume as you would normally with other block level storage
- Scalable - With the elasticity of cloud computing on AWS you are easily able to increase or decrease the amount of storage your require on your instances enabling you to only have the capacity you need and wish to pay for
- Easily backed up - EBS supports the use of snapshots. Snapshots provide an incremental backup of the volume and store these snapshots on AWS S3 (Simple Storage Service). The snapshot can then be restored to any other volume as and when necessary with just a few clicks. For more information, click AWS snapshots and AWS S3
- 3 different volume types - EBS provides 3 different levels of volume types:
- General Purpose (SSD)
- Provisioned IOPS (SSD)
- Magnetic volumes
Further information on all aspects of Elastic Block Store
can be found here
Instance backed storage:
Unlike EBS, instance backed storage is not a separate service, it simply refers to the storage system used whereby the volumes for the EC2 instance are disks physically and locally attached to your instance. The key points of Instance backed storage are:
- Temporary storage - The key difference between EBS volumes and Instance Store storage is that instance store is only temporary block level storage. Should the drive fail, the instance terminated or stopped then all stored data on that instance store is deleted! This makes this type of storage unsuitable for critical data or data than is unable to be easily reproduced
- Physically attached to host - Instance store backed EC2 instances have their storage volumes physically attached to the underlying host like the local hard drive of your PC/Laptop
- Less flexible than EBS - You are unable to detach and reattach and instance store backed drive to another instance as you can with EBS
Further information on Instance Backed Store can be found here
S3 (Simple Storage Service)
S3 Is one of the most widely used storage options and has a multitude of use cases from Disaster Recovery to being a platform for hosting static websites. The key points of S3 are:
- Unlimited Storage capacity - S3 offers you the ability to store an unlimited amount of data which makes this a very viable storage area for backups of data
- Object Based Storage - S3 provides an object based storage service rather than block based storage like EBS
- Highly Available - S3 automatically replicates your data stored to other AZ's giving you 99.999999999% durability of your data, and 99.99% availability. This makes it an incredibly Highly Available service to use for critical data
- Accessible from Internet - It offers the ability to make files stored on S3 publically available on the Internet providing each object with a specific URL
- Only pay for what you use - You only pay for the amount of data you use on S3 and there are no setup costs
- Reduced Redundancy Option - There is a reduced cost option called 'Reduced Redundancy' which provides durability of your data to 99.99% instead of the 11 9's of durability offered with their standard storage service.
- Interaction with other AWS Services - S3 works extremely well with other AWS services such as EC2, EBS, CloudFront among others! As well as some 3rd party vendors
- Data Lifecycle Management - S3 provides an option for introducing lifecycle rules of your data for retention and deletion. For example, data can be moved or another service (Glacier) or deleted when certain criteria is met which is handled automatically by your lifecycle policies. In addition to this you are able to add version control to your documents
- Data Encryption - S3 offers SSE (Server Side Encryption) at rest, with the ability of uploading/downloading data using SSL-encrypted endpoints
- File size restrictions - The largest file you can upload to S3 is 5TB
- Use of Buckets - S3 uses Buckets to manage and organise Data that are globally unique in their name. You can have a maximum of 100 buckets per AWS account, however you can then have as many folders under these buckets as you require. You are able to set permissions and other attributes against these buckets. More information on S3 Buckets can be found here
- Access Controlled - You can control access to your data held within S3 using Bucket policies or IAM (Identify Access Management) policies. These policies can be granular in only allowing certain users read access, or write access to a certain bucket. Between Bucket and IAM policies you are able to administer strict access on your S3 Buckets and data. More information on these can be found below
Further information on S3 can be found here
The Glacier service is similar to S3 in principal, but it
is designed to store data that is not frequently accessed, for example data that has to be kept for legal purposes, historical records or any archiving.... essentially data that is infrequently used. Key points of Glacier are:
- Cheap - Glacier offers you the ability to store data from as little as $0.01 per GB a month
- Unlimited Storage Capacity - As with S3, Glacier also supports unlimited storage capacity of your data
- Highly Available - similarly to S3, Glacier also gives 11 9's of durability of its data by replicating data to multiple locations
- Slow Retrieval time - If you require access to the data stored on Glacier then you have to submit a request to access your data; it could then take between 3-5 hours for AWS to have your data readily available to access and download. Glacier is not suitable for data you need to access often in a timely manner
- Integration with S3 - Glacier provides great integration with S3, specifically with the use of Lifecycle policies that allow the automatic transfer of data between S3 and Glacier for data archiving
- Data Encryption - Glacier supports SSE of your Data and SSL encryption of your data during transfer
- Access Controlled - IAM (Identity Access Management) allows you to control who has access to your data in Glacier
Further information can be found on Glacier here
Moving Data into AWS
There are 2 main ways of moving your data into AWS from your on site premises with the use of 2 AWS Services:
- AWS Import/Export
- AWS Storage Gateway
The AWS Import/Export service does exactly that: it allows you to import and export data into and out of AWS on physical media. This service is typically used if you have a LARGE
amount of data to be transferred that could take a substantial amount of time to transfer over the internet. Key points of Import/Export Service are:
- Speed - This service allows you to physically send your data on physical media to AWS who will then manually import the data into the desired service within your AWS environment. If you have terabytes and terabytes of data then this service could save a significant amount of time as opposed to using traditional Internet upload methods. Similarly, data can be extracted onto physical media back to your onsite premises
- Cost savings - Transfer costs out of AWS incur a cost, so having vast amounts of data exported using this service could save you significant money on typical data transfer costs that AWS charges for
- Service Integration - Import/Export integrates with other services such as S3 and Glacier
- Useful for Cloud Migration - If you are considering migrating your data to the cloud, then as explained in the points above, to transfer your initial data store from you on premise data centre to the cloud you could consider using this service to save time and money
Further information on AWS Import/Export can be found here
AWS Storage Gateway
The AWS Storage Gateway allows you to connect with an on-premise software appliance giving the capability of utilising your own local storage systems with a link to AWS S3 or Glacier for backups and additional storage when needed. Key points of AWS Storage Gateway are:
- 3 different configurations are available:
- Gateway Cached Volumes - Primary data stored in S3, frequent data stored and cached locally
- Gateway Stored Volumes - Primary data is stored locally for low latency access with snapshot asynchronous backups to S3 for disaster recovery purposes
- Gateway - Virtual Tape Library - Allows integration with your backup application through industry standard iSCSI interface to S3 for Virtual Tapes and Glacier for Virtual Tape shelf's
- Secure - Data transferred between your on-premise location and AWS is encrypted over SSL and data is encrypted at rest on S3 and Glacier
- Service Integration - AWS Storage Gateway integrates with other Storage services such as S3, Glacier and EBS providing all the benefits that each of these services have
- Disaster Recovery - Utilising the Cached and Stored Volumes configurations it allows you to readily have EC2 instances and your synchronised data in place and operational on the cloud should your on-premise site fail
Further information AWS Storage Gateway can be found here
Having all of this storage and the benefits they provide are great, but how much does it cost you? How much is it going to cost to store terabytes of data on S3, or for 100 EBS volumes all with Provisioned IOPS enabled? These types of questions are likely to be asked by Senior Management when considering your transition to AWS for your storage needs.
A useful tool to help you provide some of these answers is also provided by AWS known as the Simple Monthly Calculator
This allows you to select the services you require (left hand side) and enter the capacity you need plus other elements that all attribute to costs. Once you have made your selection with the region you have specified it will provide you with the monthly cost. This is a great tool to get an estimate of your needs.
Further detailed pricing for each storage service can be found on the links below:
Hopefully this information enabled you to have a greater understanding of the storage services offered by AWS and the differences between them allowing you to make a decision on the best service that suits your needs.
Thank you for taking the time to read my article, if you have any feedback please do leave a comment below.
If you liked this article or found it helpful please click the 'Good Article' button at the bottom of this article, it would be very much appreciated.
I look forward to your comments and suggestions.
-AWS Certified Solutions Architect
-AWS Accredited Technical Professional
-AWS Accredited Business Professional
-AWS Accredited TCO and Cloud Economics