AWS S3 - Amazon Simple Storage Service

Amazon Simple Storage Service (S3)

S3 is a managed object store service, this means Amazon manages the sizing, infrastructure and durability of the Amazon S3 service. There is not setup cost or minimum usage fee.
  • Managed
  • Highly durable, standard class 99.999999999 % durability (11 nines)
  • Highly available, is built for 99.99% availability for objects during a given year (4 nines), although Amazon gives customers a SLA of 99.9%.
  • Objects reside in the region of your choice but AWS will replicate the object across multiple availability zones within the chose region.
  • Objects are stored in elastic buckets.
  • S3 objects can range from 0 bytes to 5TB.
  • Largest object that can be uploaded in a single PUT is 5GB.
  • For objects larger that 100MB use multi-part upload.
  • 100 buckets per region.
  • No need to resize your buckets, since they resize automatically.
  • A bucket can be deleted only if it is empty, once deleted you can reuse the bucket name.
  • Bucket ownership is not transferable.

Data Consistency Model
  • Read after write consistency for PUTS of new objects. 
  • Eventual consistency for overwrite PUTS and DELETES (can take some time to propagate).

S3 Buckets

S3 bucket names need to be globally unique this is because the will be part of a URL name in the AWS namespace. 

Naming conventions:

  • Can be between 3 and 63 characters long.
  • Can contain only lower case characters, numbers, periods and dashes.
  • Cannot contain underscores, end with a dash, have consecutive periods or use dashes adjacent to periods.
  • Cannot be formatted as an IP address.
  • They have to be DNS compliant.
  • Your bucket cannot be renamed, so be careful about the name you are picking.
You can choose any region for your bucket, as mentioned, before, AWS will replicate your bucket across availability zones for the chosen region BUT it won't do the same across regions. If you want to enable replication across region then you need to enable a feature named "Cross-Region Replication" (see screenshot below).


Access Control Mechanisms

S3 offers three access control mechanisms:
  • AWS Identity and Access Management (IAM) policies: these allow you to grant or restricts user access to your buckets or objects. Your working at the user and group level here.
  • Access Control Lists (ACLs): they allow you to selectively add permissions on individual objects.
  • Bucket policies: Can be used to allow or deny permissions across some or all of the objects within an individual bucket.
  • Query string authentication: you can share S3 objects through an URL that is valid for a specified period of time.
S3 also offers three options to encrypt content:
  • Server-side encryption using Amazon managed SSE-S3 keys (AAES 256).
  • Server-side encryption using KMS managed SSE-KMS.
  • Server-side encryption with customer provided keys SSE-C.

Storage Classes

S3 also has three storage classes:
  • Standard storage class
  • Standard infrequent access class (IA): minimum file size of 128KB and minimum storage duration of 30 days before being moved to this storage class. Is provisioned for objects that are accessed infrequently but still need to be returned quickly when they are requested. You can upload objects that are smaller than 128 KB but they will incur charges as if they were 128KB. The main advantage of this class is lower storage cost but there is a retrieval fee per GB, and is a good fit for backups, large files, etc.
  • Reduced redundancy class: Has the same availability than the Standard class but with reduced durability (99.99%). This service is intended for non-critical reproducible data. It's about 30% cheaper than the Standard class with the cost of less redundancy. Preview images and transcoded media are good examples of things you could store in this storage class.
  • Glacier: Very cheap, but used for archival only. It takes 3-5 hours to restore from Glacier. Object can be put into Glacier only by lifecycle rules, and the sobject should have been in IA for at least 30 days before being moved to Glacier.
Note that the storage class is selected at the folder or object level, it is not available at the bucket level, this means a single bucket can have a mixture of storage classes. The next table shows and breakdown of each storage class:


S3 Charges 

S3 is charged by:
  • Storage space.
  • Number of requests.
  • Storage Management Pricing
  • Data Transfer Pricing (moving data around)
  • Transfer Acceleration.
Note that uploading data to S3 is free.

Versioning

Versioning allows you to preserve, retrieve and restore every version of objects stored in a bucket. Versioning is available at the bucket level and after being enabled it can only be suspended (not possible to disable).

Lifecycle

This S3 feature allows you to control the lifecycle rules for objects stored in your bucket. You can control when your objects are moved to the IA class or when they will be moved to the Glacier storage class and/or remove objects after an specified amount of time.

Logging

You can enable logging to tracks the requests made to your bucket and place them in the bucket itself.

Event Management

You can configure S3 to trigger notifications to:
  • SNS (Amazon Simple Notification Service)
  • SQS (Amazon Simple Queue Service)
  • A lambda function.

Cross-region Replication

  • Allows asynchronous copying of object across buckets in different AWS regions.
  • Both buckets should have versioning enabled.
  • Buckets must be in different regions.
  • S3 must have permissions to replicate objects from the source bucket to the target bucket.

Transfer Acceleration

Accelerates S3 data transfers by making use of optimized network protocols and the Amazon Edge infrastructure. Improvements are typically in the range of 50 to 500% for cross country transfers of large objects. To enable this feature the system creates a "s3-accelerate" endpoint. In the end, Transfer Acceleration is about S3 taking advantage of another AWS service: CloudFront.

Requester Pays

You can disable anonymous access to your bucket and restrict to authenticated requests where the requesting party will be the one responsible for the fee and charges associated with their requests,

CloudFront

This is Content Delivery Network (CDN) service. A CDN is a system of distributed servers that delivers web pages and other web content to a user based on the geographic locations of the user, the origin of the web page and a content delivery server.

Key Terminology:
  • Edge Location: This is the location where the content will be cached, this is NOT  a region or AZ concept.
  • Origin: This is the origin of all the files the CDN will distribute. This could be either a S3 bucket, an EC2 instance, an Elastic Load Balancer or Route53.
  • Distribution: This is the name given to the CDN which consists of a collection of Edge nodes.
  • Web Distribution: typically used for websites.
  • RTMP: Used for Media Streaming.
Notes:
  • Edge location are not READ only, you can also write to them (for example, put an object) .
  • Objects are cached for the life of the TTL (Time to Live).
  • You can clear cached objects, but you will be charged.

Storage Gateway

 AWS Storage Gateway is a service that connect an on-premises software appliance with cloud-based storage to provide seamless and secure integration between an organization's on-premises IT environment and AWS's storage infrastructure. This service enables you to securely store data to the AWS cloud for scalable and cost-effective storage.

AWS Storage Gateway's software appliance is available for download as a virtual machine (VM) image that you install on a host in your datacenter. Storage Gateway supports either VMWare ESXi or Microsoft Hyper-V. Once you've installed your gateway and associated with your AWS account through the activation process , you can use the AWS Management Console to create storage gateway option that is right for you.

Four types of storage gateway:
  • File Gateway (NFS): Files are stored as S3 objects in your buckets, accessed through and NFS mount point.
  • Volumes Gateway (iSCSI): The volume interface presents your applications with disk volumns using the iSCSI block protocol. Data written to these volumes can be asynchronously backed-up as point-in-time snapshots of your volumes and stored in the cloud as EBS snapshots.
    • Stored Volumes: Lets you store your primary data locally , while asynchronously backing up data to AWS.
    • Cache Volumes: Lets you use S3 as your primary storage while retaining frequently accessed data locally in your storage gateway.
  • Tape Gateway (VTL):  Lets your leverage your existing tape-based backup application infrastructure to store data on virtual tape cartridges that you create on your tape gateway.

Comentarios