S3 - AWS Study Guide (v2.0)

S3 Overview & Fundamentals

Amazon S3 provides secure, durable, and highly scalable object storage in the cloud, managing data as objects rather than traditional file systems.

S3 manages data as objects rather than in file systems or data blocks. This means that S3 is a place where you can store your files. You can upload any file type that you can think of to S3 such as image files, text files, videos, web pages, etc.

S3 buckets are the folders inside S3 where you store your files. All AWS accounts share the S3 namespace, so your S3 bucket name has to be globally unique. The web address for a bucket that is created will always be https://the bucket name, then .s3, and then .the region (e.g., us-east-1), and then .amazonaws.com, followed by / and then the key value pair or any folders within the bucket.

Technical Specs: URL format: https://bucketname.s3.region.amazonaws.com/keyname

S3 works off a key-value store. The key is simply the name of the object. The value is the data itself, which is made up of a sequence of bytes. Additionally, a version ID allows for storing multiple versions of the same object, and metadata is simply data about the data being stored.

S3 is highly available and highly durable. It is built for 99.95% to 99.99% service availability depending on the S3 tier and designed for 11 9's durability (99.999999999%). This high availability and durability are achieved by spreading data across multiple devices and facilities.

Technical Specs: Service Availability: 99.95% to 99.99%; Data Durability: 99.999999999% (11 9s)

S3 objects can be up to 5 terabytes in size. There is unlimited storage for the total volume of data and the number of objects you can store in S3.

Technical Specs: Max object size: 5 TB; Total data volume: Unlimited; Number of objects: Unlimited

S3 provides strong read-after-write consistency for data consistency. This means that after a successful write (PUT) of an object, all subsequent read (GET) requests will immediately return the latest version of the object.

S3 is not suitable for installing an operating system or a database.

Successful uploads to S3 return an HTTP 200 status code.

Technical Specs: HTTP 200 status code on successful upload

S3 Storage Classes

Amazon S3 offers a range of storage classes designed for different use cases, optimizing for cost, performance, and data access patterns.

Different storage classes are available, from frequently accessed data to archival, with varying costs, retrieval times, and durability/availability characteristics.

S3 Standard

A general-purpose storage class for frequently accessed data. It is the default storage class.

durability: High durability (11 9s)

availability: 99.99% availability SLA

latency: Low latency (millisecond access)

data_distribution: Data copied to >3 Availability Zones (AZs)

retrieval_fee: None

minimum_storage_duration: None

minimum_object_size: None

Use Cases:

General-purpose use cases
Applications requiring frequent data access
Websites
Content distribution
Mobile and gaming apps
Big data and analytics

S3 Intelligent-Tiering

Automatically monitors and moves data to lower-cost tiers based on changing access patterns, without operational overhead or retrieval charges.

latency: Millisecond access latency, instant access

operational_overhead: No operational overhead

retrieval_charges: No retrieval charges

minimum_storage_duration: No minimum storage duration

durability: 11 nines durability

availability: 99.99% availability

cost_incurred: Small monthly monitoring and automation charges

Use Cases:

Unknown or changing data access patterns
Digital media applications where some files are accessed frequently and others rarely and unpredictably

Amazon S3 Express One Zone

High-performance storage for the most frequently accessed data, offering consistent single-digit millisecond latency.

latency: Consistent single-digit millisecond request latency

retrieval_speed: Up to 10x faster retrieval than S3 Standard

storage_cost: Higher storage cost per GB

data_retrieval_fee: Low data retrieval fee

durability: 11 9s durability within a single AZ

data_loss_risk: Data loss if the AZ goes down

Use Cases:

High-performance storage for most frequently accessed data

S3 Standard Infrequent Access (S3 Standard IA)

Designed for infrequently accessed data that still requires rapid access when needed. Suitable for long-term storage like backups and disaster recovery files.

data_distribution: Data copied to >3 AZs

retrieval_fee: Retrieval fee per GB

minimum_storage_duration: 30 days (charged for full 30 days if deleted, overwritten, or moved before then)

minimum_object_size: 128 KB (objects smaller are charged as 128 KB)

latency: Millisecond access latency

Use Cases:

Infrequently accessed data requiring quick retrieval
Long-term storage for backups and disaster recovery files

S3 One Zone Infrequent Access (S3 One Zone IA)

Similar to S3 Standard-IA but stores data in a single Availability Zone. Costs 20% less than S3 Standard-IA.

data_distribution: Data stored in a single AZ

data_loss_risk: Data loss if the AZ goes down

latency: Millisecond access latency

retrieval_fee: Retrieval fee per GB

minimum_storage_duration: 30 days

minimum_object_size: 128 KB

Use Cases:

Reproducible or less critical data that is accessed infrequently
Long-lived, infrequently accessed, non-critical data that can be affordably lost

Amazon S3 Glacier Instant Retrieval

An archival storage class for data requiring fast access, delivering low-cost storage for long-lived data that is rarely accessed but needs millisecond retrieval.

latency: Millisecond latency access

data_distribution: Data copied to >3 AZs

retrieval_fee: Retrieval fee per GB

minimum_storage_duration: 90 days

minimum_object_size: 128 KB

Use Cases:

Archival data requiring fast access
Critical data that may not need to be accessed frequently but requires instant access when needed

Amazon S3 Glacier Flexible Retrieval (formerly S3 Glacier)

Ideal for archived data where retrieval time can vary, offering different retrieval modes.

data_distribution: Data copied to >3 AZs

retrieval_modes: Expedited (1-5 minutes), Standard (3-5 hours), Bulk (5-12 hours)

retrieval_fees: Different retrieval fees for each mode

minimum_storage_duration: 90 days

minimum_object_size: 128 KB

Use Cases:

Archival data, where retrieval time can vary
Archived data that does not require immediate access but needs flexibility to retrieve large datasets at no cost for backup or disaster recovery use cases

Amazon S3 Glacier Deep Archive

The lowest-cost storage class for long-term storage (compliance, regulatory) and infrequent access. Designed for customers retaining datasets for 7-10 years or longer.

data_distribution: Data copied to >3 AZs

access_modes: Standard (Within 12 hours), Bulk (Within 48 hours)

retrieval_fee: Retrieval fee based on retrieval speed

minimum_storage_duration: 180 days

minimum_object_size: 40 KB

Use Cases:

Long-term storage (compliance, regulatory)
Infrequent access
Data archiving and long-term backup (7-10 years or longer)
Meeting stringent regulatory requirements

S3 Lifecycle Management: Overview and Object Transitions

S3 Lifecycle Management automates the process of moving objects between different storage tiers and expiring them, optimizing storage costs and data retention.

S3 Lifecycle Rules automate object movement between S3 storage tiers to optimize storage costs and improve data durability and performance. The key goal is cost optimization.

Lifecycle Management automates moving objects, eliminating the need for manual management. Rules can be applied to an entire S3 bucket, a specific prefix (folder), or objects with a certain tag for granular control.

Automatically moves objects to a different storage class based on object age. Transitions are charged per request. Objects smaller than 128 KB will not transition.

Technical Specs: Charge: Per request; Minimum object size for transition: 128 KB

Permanently deletes objects after a specific period. This is useful for cleaning up temporary files or old log data.

Objects can transition from more frequent access tiers to less frequent or archival tiers. Once in an IA or Glacier tier, they generally cannot transition back upwards automatically.

Automatic transitions are not supported for objects smaller than 128 KB (restriction added in September 2024). Objects must be stored for at least 30 days in S3 Standard IA or S3 One Zone IA before they can be transitioned out. Transitioning objects out before their minimum storage duration incurs charges for the remainder of that duration.

Technical Specs: Object size restriction: < 128 KB not supported for automatic transitions (as of Sept 2024); Minimum storage in S3 Standard IA/One Zone IA before transition out: 30 days

Lifecycle Management can be used with versioning to move different versions of objects to different storage tiers. This can include transitioning current and non-current versions separately.

Configuring S3 Lifecycle Rules

procedure

A step-by-step guide to setting up automated lifecycle rules for S3 buckets, including transitions and expirations.

Lifecycle rules are configured via the S3 console, allowing you to define actions like transitioning objects to different storage classes or expiring them based on age.

Prerequisites

An existing S3 bucket

Navigate to the S3 Dashboard and select the desired bucket.

💡 This is the entry point for managing bucket-specific configurations.

Go to the 'Management' tab and select 'Lifecycle Rules'.

💡 The 'Management' tab contains options for lifecycle configuration, replication, and other advanced settings.

Click 'Create lifecycle rule'.

💡 Initiates the wizard for defining a new lifecycle policy.

Provide a 'Lifecycle rule name' (e.g., 'CS demo life cycle rule').

💡 A descriptive name helps in identifying and managing rules.

Define the 'Scope' for the rule: apply to all objects, specific prefix, or tags.

💡 Determines which objects within the bucket the rule will affect.

Configure 'Transition current versions of objects between storage classes' by selecting a storage class (e.g., S3 Standard-IA, S3 Glacier Deep Archive) and specifying the number of days for transition.

💡 Automates cost optimization by moving data to cheaper tiers as it ages. Multiple transitions can be chained.

Configure 'Transition non-current versions of objects between storage classes' similarly, if versioning is enabled and you wish to manage old versions.

💡 Allows separate management of older object versions, further optimizing costs while retaining a recovery history.

Configure 'Expire objects' by specifying the number of days after which objects will be permanently deleted.

💡 Ensures data is removed after its retention period, crucial for compliance and cost management.

Save the changes to create the lifecycle rule.

💡 Applies the defined policy to the S3 bucket.

S3 Pricing Model

Understanding the S3 pricing model components and considerations is crucial for managing costs effectively.

S3 pricing is based on several components:

Archive tiers (Glacier) have lower per GB storage charges, making them cost-effective for long-term archiving. It's important to understand which tiers have retrieval fees, data upload fees, and lifecycle transition fees. Exact pricing values are not required for certification, but relative cost and fee structures are important.

S3 Performance Optimization

Optimizing S3 for performance involves leveraging its architecture for high scalability and throughput, particularly for large objects and high request rates.

S3 is designed for high scalability and low latency. It automatically scales to handle thousands of requests per second. To increase request rate and overall performance, objects should be distributed across multiple S3 prefixes, as each prefix provides the same baseline performance, allowing S3 to partition data and scale horizontally.

Technical Specs: Typical object retrieval latency: 100 to 200 milliseconds; Request rate: Up to 3,500 PUT, COPY, POST, or DELETE requests per prefix per second; Request rate: Up to 5,500 GET or HEAD requests per prefix per second

S3 prefixes are the folders and subfolders within an S3 bucket. The more prefixes used, the faster the performance, enabling high numbers of requests. Spreading reads across different prefixes (directories) can achieve better performance, for example, 2 prefixes can achieve 11,000 requests/second, and 4 prefixes can achieve 22,000 requests/second.

Technical Specs: Request rate increase: 11,000 requests/second with 2 prefixes; 22,000 requests/second with 4 prefixes

Allows retrieval of specific portions (ranges) of an object, optimizing for large files and partial downloads. It enables requesting only a specific range of bytes from an object using the Range HTTP header in S3 GET object requests. This speeds up downloads by parallelizing fetches of different parts of a large file, and improves resilience by allowing resumption of only failed parts of an interrupted download. Ideal for applications like video streaming.

Technical Specs: Uses: Range HTTP header in S3 GET object requests

For uploading very large objects, break them into smaller parts and upload them independently or in parallel. S3 reassembles the parts into a complete object. This increases upload throughput due to parallel uploads, allows pausing and resuming uploads, and improves error recovery by only requiring re-upload of failed parts.

Technical Specs: Required for objects larger than 5 GB; Recommended for objects larger than 100 MB

Uses globally distributed AWS edge locations to accelerate file transfers to and from S3, particularly for globally dispersed users. Users upload to the nearest edge location, then data travels over AWS’s private backbone network to the S3 bucket. It drastically reduces latency, especially for users far from the S3 bucket’s region, and mitigates the impact of variable or slow internet connections over long distances. Must be enabled explicitly on the S3 bucket and incurs additional costs.

When using Server-Side Encryption with KMS (SSE-KMS) to encrypt and decrypt objects in S3, KMS comes with built-in limits that vary by region. Uploading and downloading data counts towards the KMS quota, which is either 5,500, 10,000, or 30,000 requests per second. Currently, a quota increase for KMS cannot be requested.

Technical Specs: KMS quota: 5,500, 10,000, or 30,000 requests per second (region-specific); Quota increase: Not currently supported

S3 Security Features

S3 provides robust security options to protect data at rest and in transit, control access, and enforce immutability.

S3 buckets are private by default, meaning that only the bucket owner has access.

S3 Block Public Access settings ensure that S3 buckets do not have unauthorized configuration changes and can prevent objects from being made publicly readable. These settings must be disabled on the bucket for files to be publicly readable.

Object ACLs are used for individual object access control. If bucket ownership is set to not use ACLs, the option to 'Make public using ACL' for objects will be grayed out.

Bucket policies are used for bucket-wide access control, allowing you to grant granular permissions (e.g., s3:GetObject) to specific principals or the public. AWS will not permit creating a bucket policy if S3 Block Public Access settings are still enabled. To grant public read access, the policy requires the '/* ' trailing the BUCKET_ARN to apply to all objects within the bucket.

Technical Specs: JSON Bucket Policy Example: `{"Version": "2012-10-17", "Id": "Policy1645724938586", "Statement": [{"Sid": "PublicReadGetObject", "Effect": "Allow", "Principal": "*", "Action": "s3:GetObject", "Resource": "<BUCKET_ARN>/*"}]}`

S3 Object Lock is used to store objects using a WORM (write once, read many) model to prevent objects from being deleted or modified for a fixed amount of time or indefinitely. It comes with two modes: governance mode and compliance mode.

Governance mode prevents most users from deleting objects but allows some authorized users (with sufficient permissions) to alter the object, its retention settings, or delete it if necessary.

Compliance mode ensures that a protected object version cannot be overwritten or deleted by any user, including the root user in your AWS account, until the retention period expires. This mode is used when absolute immutability is required (e.g., for medical trial results for a minimum of one year).

Technical Specs: Enforces Write-Once-Read-Many (WORM) protection, preventing modification or deletion by any user including the root user until retention expires.

Retention periods protect an object version for a fixed amount of time, after which the object version can be overwritten or deleted unless a legal hold is also placed on it.

Legal holds are a type of S3 Object Lock that prevent an object version from being overwritten or deleted. They do not have an associated retention period and remain in effect until they are explicitly removed.

Glacier Vault Lock is a way of applying a WORM model to Glacier. It allows you to specify controls, such as WORM, in a vault lock policy and lock the policy from future edits once applied.

Enabling MFA Delete on an S3 bucket (in conjunction with Versioning) requires Multi-Factor Authentication for permanent deletion of object versions, adding an extra layer of protection against accidental or malicious deletions.

Technical Specs: Requires Multi-Factor Authentication (MFA) for permanent deletion of object versions.

Encryption in transit uses SSL certificates and TLS to encrypt objects sent to and from the S3 bucket.

Technical Specs: Technologies: SSL certificates, TLS, HTTPS

Encryption at rest, or server-side encryption, refers to encrypting data once it has arrived in S3. There are three types of server-side encryption available.

S3 manages the encryption keys, using AES 256-bit encryption to encrypt objects at rest.

Technical Specs: Encryption: AES 256-bit; Key management: S3 manages keys

Uses AWS KMS keys to encrypt objects. It provides key usage logging in AWS CloudTrail and KMS supports automatic key rotation annually. This option is recommended for confidential data requiring auditing and annual key rotation.

Technical Specs: Key management: AWS KMS; Auditing: Key usage logging in AWS CloudTrail; Key rotation: Automatic annual rotation

Allows the customer to manage their own encryption keys, but S3 then handles the actual encryption of data using these provided keys. This method requires the customer to manually manage and rotate keys, resulting in higher operational overhead.

Technical Specs: Key management: Customer-provided keys

Files are encrypted on the desktop or client-side before being uploaded to S3. This requires the user to manage the encryption process themselves, including obtaining a master key (either KMS-managed or client-side master key) and using an SDK for encryption.

Technical Specs: Key management: User manages master key (KMS-managed or client-side); Tooling: Requires an SDK for encryption

Bucket policies can be used to enforce encryption on S3 PUT requests. A policy can deny any S3 PUT request that does not include encrypted objects, or the encryption parameter (x-amz-server-side-encryption) in the request header.

Technical Specs: Bucket Policy Condition: Deny PUT requests without x-amz-server-side-encryption header.

S3 Data Access & Eventing

S3 offers features for auditing data access, triggering actions based on object events, and providing temporary, secure access to private objects.

Access logs are crucial for security and compliance, recording every request made to an S3 bucket. They capture the IP address of the requester, request time, operation performed, and HTTP status code. Logs are delivered as simple text files and can become very large for busy buckets. They are delivered on a best-effort basis, are not real-time, and may be delivered late. For API access logs, Amazon CloudTrail is an alternative.

Technical Specs: Data captured: IP address, request time, operation, HTTP status code; Log format: Simple text files; Delivery: Best-effort basis, not real-time

Access logs are enabled via the S3 console and require specifying a target S3 bucket. Best practice dictates delivering logs to a different bucket than the source bucket to avoid endless logging loops. The target bucket must be in the same AWS region as the source bucket.

Technical Specs: Target bucket requirement: Must be in the same AWS region as the source bucket

Amazon Athena, a serverless query service, can run SQL queries directly on S3 log data without infrastructure management, making it an effective tool for analyzing log data.

Triggers actions in real-time when specific events occur in an S3 bucket. Events include object uploads (s3:ObjectCreated:*), object deletions (s3:ObjectRemoved:*), object restores (s3:ObjectRestore:*), and object loss from RRS storage class (s3:ReducedRedundancyLostObject). Event delivery has eventual consistency, meaning a slight delay in delivery.

Technical Specs: Event types: s3:ObjectCreated:*, s3:ObjectRemoved:*, s3:ObjectRestore:*, s3:ReducedRedundancyLostObject; Delivery: Eventual consistency

Event notifications can be fanned out to multiple destinations including Amazon SNS (for notifications), SQS (for asynchronous processing), AWS Lambda (to run custom code), API Gateway, Kinesis Data Firehose, and Amazon EventBridge. Events can be filtered by prefix and suffix (e.g., .jpg file types), but cannot be filtered by object tags.

Technical Specs: Destinations: Amazon SNS, SQS, AWS Lambda, API Gateway, Kinesis Data Firehose, Amazon EventBridge; Filtering: By prefix and suffix; Limitation: Cannot filter by object tags

Grants temporary access to a single file in a private S3 bucket to external users without AWS credentials. An authenticated AWS user generates a special URL, which can then be passed to an external user for uploading (PUT) or downloading (GET) a specified file. The URL's validity is configured for a specific duration (e.g., 1 hour, 15 minutes) and HTTP method. The permissions of the generating user determine access, not the user of the URL. If a bucket is public, a pre-signed URL is not needed.

Technical Specs: Actions: GET for download, PUT for upload; Console expiration limit: 12 hours; Longer durations: Consider CloudFront Signed URLs or Signed Cookies for durations > 7 days

S3 Replication Overview

S3 Replication allows objects to be automatically replicated from one S3 bucket to another, enhancing durability and disaster recovery.

S3 Replication (formerly Cross Region Replication) replicates objects from one bucket to another, which can be in the same or different regions. This process ensures files remain accessible in extreme scenarios where data loss could occur, and is a key component for backing up data.

Versioning must be enabled on both the source and destination buckets for replication to function.

Objects already existing in a bucket are not replicated automatically when replication is first enabled. However, all subsequently updated or new objects will be replicated automatically.

Delete markers are not replicated automatically by default but can be enabled if needed.

AWS offers to create a specific IAM role for S3 replication with necessary permissions such as s3:ListBucket, s3:GetReplicationConfiguration, s3:GetObjectVersionForReplication, s3:ReplicateObject, s3:ReplicateDelete, s3:ReplicateTags, and s3:ObjectOwnerOverrideToBucketOwner.

Technical Specs: IAM Actions for Replication Role: s3:ListBucket, s3:GetReplicationConfiguration, s3:GetObjectVersionForReplication, s3:GetObjectVersionAcl, s3:GetObjectVersionTagging, s3:GetObjectRetention, s3:GetObjectLegalHold, s3:ReplicateObject, s3:ReplicateDelete, s3:ReplicateTags, s3:ObjectOwnerOverrideToBucketOwner

Setting Up S3 Cross-Region Replication

procedure

This procedure outlines how to create an S3 bucket and enable automatic file replication to a different AWS region for disaster recovery and data accessibility.

You will create a new S3 bucket in a different region and configure a replication rule from an existing source bucket to the newly created destination bucket, ensuring objects are automatically copied.

Prerequisites

An existing S3 source bucket with versioning enabled (e.g., appconfigprod1)
AWS Management Console access
Permissions to create S3 buckets and replication rules

Navigate to S3 in the AWS console.

💡 This is the entry point for S3 management.

Copy the name of the lab-provided source bucket (e.g., `appconfigprod1`). Click 'Create bucket'.

💡 Starting the process of creating the destination bucket.

Set the destination bucket's name (e.g., paste `appconfigprod1` and replace it with `appconfigprod2` to ensure global uniqueness). Select a different AWS Region (e.g., `US West (Oregon) us-west-2`).

💡 Bucket names must be globally unique. Replicating to a different region enhances disaster recovery.

Under 'Copy settings from existing bucket', click 'Choose bucket' and select the source bucket (`appconfigprod1`). Then click 'Choose bucket'.

💡 This helps in pre-populating some settings, though specific replication rules are configured separately.

Leave the rest of the settings as defaults and click 'Create bucket'. (Ignore system tag warnings if they appear).

💡 Completes the destination bucket creation.

Click the source bucket (`appconfigprod1`) to open it. Navigate to the 'Management' tab.

💡 Replication rules are configured on the source bucket.

In the 'Replication rules' section, click 'Create replication rule'. Click 'Enable Bucket Versioning' if not already enabled.

💡 Versioning is a prerequisite for S3 replication.

Under 'Replication rule configuration', enter a 'Replication rule name' (e.g., `CrossRegion`). Under 'Source bucket', select 'Apply to all objects in the bucket'.

💡 Names the rule and defines its scope for replication.

Under 'Destination', leave 'Choose a bucket in this account' selected. Click 'Browse S3', select the newly created destination bucket (`appconfigprod2`), and click 'Choose path'. Also, click 'Enable bucket versioning' for the destination.

💡 Specifies the target for replicated objects and ensures versioning is active on the destination.

Under 'IAM role', click the dropdown and select 'Create new role'. Click 'Save'.

💡 An IAM role is required to grant S3 permissions to perform replication actions between buckets.

When prompted with the 'Replicate existing objects?' popup, choose 'No, do not replicate existing objects' and click 'Submit'.

💡 This lab focuses on new objects; existing objects can be replicated using S3 Batch Replication if needed.

S3 Versioning

S3 Versioning allows for multiple versions of an object to be stored, providing a mechanism for data recovery and version control.

Versioning in S3 allows for multiple versions of an object to be stored within a bucket, enabling version control with files. It acts as a backup tool, ensuring all versions of objects are stored in S3, even if deleted.

Once versioning is enabled on a bucket, it cannot be disabled, only suspended. Enabling it creates a 'null' version for existing objects before new versions are uploaded. Uploading a new file with the same name as an existing object (even a different file content) will create a new version of the original file.

Versioning is a great way to protect objects from accidental deletion. It supports multifactor authentication (MFA) to prevent accidental permanent deletion. Enabling MFA means that two forms of authorization are needed to delete an object.

Versioning integrates with lifecycle rules, allowing for automated management of different object versions across storage tiers or for expiration.

Creating S3 Buckets and Enabling Versioning (Lab Mode)

procedure

This procedure guides through creating both public and private S3 buckets, uploading files, verifying access, and enabling versioning to manage different object versions.

You will create two S3 buckets, one public and one private, upload a file to each, and verify their respective access permissions. Then, you will enable versioning on the public bucket and upload a new version of the same file to observe versioning in action.

Prerequisites

AWS Management Console access
Two image files (e.g., cat1.jpg and cat2.jpg) downloaded locally

Navigate to S3 in the AWS console. Click 'Create bucket'.

💡 Initiates the bucket creation process.

Create a 'Public Bucket': Set 'Bucket name' to `acg-testlab-public-<random>` (where `<random>` is a unique string). Select `US East (N. Virginia) us-east-1` for 'Region'. Under 'Object Ownership', select 'ACLs enabled' and 'Bucket owner preferred'. In 'Block Public Access settings', uncheck 'Block all public access' and acknowledge the warning. Leave other settings as defaults. Click 'Create bucket'.

💡 This configures the bucket to allow public access at a later stage for objects.

Create a 'Private Bucket': On the 'Buckets' screen, click 'Create bucket'. Set 'Bucket name' to `acg-testlab-private-<random>`. Select `US East (N. Virginia) us-east-1` for 'Region'. Leave other settings as defaults (Block Public Access should be enabled by default). Click 'Create bucket'.

💡 This bucket will remain private, demonstrating restricted access.

Upload a file to the Private Bucket: Select the private bucket. Click 'Upload', then 'Add files'. Upload `cat1.jpg`. Leave other settings as defaults and click 'Upload'. After success, click its name and open the 'Object URL' in a new tab to see an 'Access Denied' error. Note that 'Make public using ACL' is grayed out.

💡 Demonstrates that objects in a private bucket (without ACLs for public access) are inaccessible publicly.

Upload a file to the Public Bucket and make it public: Go back to 'Buckets' and select the public bucket. Click 'Upload', then 'Add files'. Upload `cat1.jpg`. Leave other settings as defaults and click 'Upload'. After success, click its name, open 'Object URL' (will show error), then select 'Object actions' > 'Make public using ACL', and click 'Make public'. Open 'Object URL' again; the image should load.

💡 Shows that even in a public-access-enabled bucket, individual objects need to be explicitly made public.

Enable Versioning on the Public Bucket: Back on the public bucket page, click the 'Properties' tab. In the 'Bucket Versioning' section, click 'Edit'. Click 'Enable' and 'Save changes'.

💡 Prepares the bucket to store multiple versions of objects.

Upload another image to test Versioning: Click the 'Objects' tab. Click 'Upload', 'Add files'. Rename `cat2.jpg` to `cat1.jpg`. Upload this newly renamed `cat1.jpg` image. Click 'Upload'. After success, click its name to view properties, then click the 'Versions' tab.

💡 Uploading a new file with the same name creates a new version, preserving the old one.

View the Image Versions: Select 'Object actions' > 'Make public using ACL' for the latest version and click 'Make public'. Open its 'Object URL' to see the new image. Then, on the 'Versions' tab, click the 'null' object (the original version). Open its 'Object URL' to see the original `cat1.jpg` image.

💡 Confirms that both versions of the object are retained and accessible via their unique version IDs.

S3 Static Website Hosting

procedure

Amazon S3 can be used to host static websites efficiently and cost-effectively. This involves configuring an S3 bucket for website hosting and managing access permissions.

To host a static website on Amazon S3, you configure an S3 bucket for website hosting, upload your website content (HTML, CSS, client-side JavaScript, images) to the bucket, and set appropriate public access permissions.

Prerequisites

HTML, CSS, client-side JavaScript, and image files for your static website (e.g., index.html, error.html)
AWS Management Console access
S3 Block Public Access settings must be disabled on the bucket

Create an S3 bucket in the `us-east-1` region, ensuring its name begins with `my-bucket-` (e.g., `my-bucket-<ACCOUNT ID>`). When creating the bucket, uncheck all four S3 Block Public Access settings checkboxes to allow public access. If the bucket was created with public access blocked, go to `Bucket` > `Permissions` > `Public access settings` > `Edit` and uncheck all four restrictions.

💡 A globally unique bucket name is required. Disabling Block Public Access is critical to allow public read access for a static website.

aws s3 mb s3://saa-quiz-app-static

Upload your website content (e.g., `index.html` and `error.html`) to the S3 bucket. You can find sample code in the provided GitHub repository.

💡 This places the static files that constitute your website into the S3 bucket.

aws s3 cp data/question_bank.json s3://saa-quiz-app-static/data/
aws s3 cp web/static/ s3://saa-quiz-app-static/static/ --recursive

Enable Static Website Hosting: On the bucket's `Properties` tab, scroll to 'Static website hosting' and click 'Edit'. Select 'Enable', set 'Index document' to `index.html`, and 'Error document' to `error.html`. Save changes.

💡 This configures S3 to serve content as a website, specifying the default page and an error page.

Apply a Bucket Policy to make objects publicly readable: Go to `Bucket` > `Permissions` > `Bucket Policy`. Add a JSON statement that grants `s3:GetObject` permission to everyone (Principal: ``). Ensure the `Resource` ARN ends with `/` so the policy applies to all objects within the bucket.

💡 This policy grants the necessary read access for anonymous users to view your website content.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "PublicReadGetObject",
      "Effect": "Allow",
      "Principal": "*",
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::saa-quiz-app-static/*"
    }
  ]
}

S3 Integrations and Advanced Use Cases

S3 integrates with numerous AWS services, enabling powerful architectures for analytics, content delivery, data migration, and private connectivity.

Amazon Athena is a serverless query service that can directly analyze JSON logs stored in S3 using standard SQL. This is the correct solution for simple, on-demand queries with minimal architecture changes and low operational overhead, as it avoids ETL processes or infrastructure setup.

EC2 instances within a VPC can access S3 buckets privately without internet connectivity by creating a gateway VPC endpoint to the S3 bucket. This eliminates the need for an internet gateway, NAT Gateway, or public IPs, keeping traffic within the AWS network and reducing data transfer costs within the same region.

To increase storage capacity for on-premises SMB file servers and manage data lifecycle, an Amazon S3 File Gateway can be used. It provides an SMB interface with local caching for low latency, while storing files in S3. S3 lifecycle policies can then automate transitions of rarely accessed files to cheaper S3 storage classes like S3 Glacier Deep Archive.

For transmitting and processing large volumes of continuous data like clickstream data, Amazon Kinesis Data Firehose can be used to reliably deliver data from Kinesis Data Streams to an Amazon S3 data lake with minimal operational overhead. This S3 data lake can then be integrated with Amazon Redshift for scalable analytics.

For securely and reliably transferring large amounts of data (e.g., 10TB daily JSON instrumentation data) from on-premises SAN to Amazon S3, AWS DataSync over AWS Direct Connect is the most reliable solution. DataSync is purpose-built for efficient transfers, with built-in encryption and integrity checks. Direct Connect provides a dedicated, reliable, high-bandwidth connection.

While DynamoDB Point-in-Time Recovery only retains data for up to 35 days, for long-term data retention (e.g., 7 years) of user transaction data, AWS Backup can be used to create backup schedules and retention policies for DynamoDB tables. This provides a more operationally efficient solution than manual backups or custom Lambda scripts.

Amazon CloudFront, a Content Delivery Network (CDN), is used to cache content from S3 buckets at edge locations worldwide. This provides low latency access for global users, minimizes load on S3, and offers scalability, performance, and cost efficiency. It supports secure delivery of confidential content via signed URLs or signed cookies.

For static websites using CloudFront with an S3 origin, security can be enhanced by configuring CloudFront with an Origin Access Identity (OAI) to restrict direct S3 bucket access, allowing only CloudFront to retrieve content. Additionally, AWS WAF can be associated with the CloudFront distribution to inspect all incoming requests before they reach the origin.

To add an extra layer of security for sensitive user-submitted information in a CloudFront distribution, a Field-Level Encryption profile can be configured. This encrypts specific sensitive data fields at the edge (before reaching the origin), ensuring the information remains protected throughout the application stack, with decryption restricted to authorized applications with the correct private key.

The best practice for granting EC2 instances or Amazon ECS tasks access to S3 is to create an IAM role with the necessary S3 permissions and attach that role to the EC2 instances via an instance profile, or specify it as the task role ARN in the ECS task definition. This provides temporary credentials automatically and avoids manual management of static credentials.

S3 is commonly used for storing static assets (e.g., `question_bank.json`, `web/static/` files) in web applications, often in conjunction with services like CloudFront for CDN capabilities or Elastic Beanstalk for managed deployments. This is a cost-effective and scalable solution.

Amazon AppFlow offers a fully managed service for easily automating the exchange of data between SaaS vendors and AWS services like Amazon S3. It can transfer up to 100 gibibytes per flow, avoiding execution timeout limits common with custom Lambda functions for large data transfers.

Technical Specs: Max transfer: 100 gibibytes per flow

Learning Objectives

S3 Overview & Fundamentals

S3 Storage Classes

S3 Standard

S3 Intelligent-Tiering

Amazon S3 Express One Zone

S3 Standard Infrequent Access (S3 Standard IA)

S3 One Zone Infrequent Access (S3 One Zone IA)

Amazon S3 Glacier Instant Retrieval

Amazon S3 Glacier Flexible Retrieval (formerly S3 Glacier)

Amazon S3 Glacier Deep Archive

S3 Lifecycle Management: Overview and Object Transitions

Configuring S3 Lifecycle Rules

Prerequisites

Navigate to the S3 Dashboard and select the desired bucket.

Go to the 'Management' tab and select 'Lifecycle Rules'.

Click 'Create lifecycle rule'.

Provide a 'Lifecycle rule name' (e.g., 'CS demo life cycle rule').

Define the 'Scope' for the rule: apply to all objects, specific prefix, or tags.

Configure 'Transition current versions of objects between storage classes' by selecting a storage class (e.g., S3 Standard-IA, S3 Glacier Deep Archive) and specifying the number of days for transition.

Configure 'Transition non-current versions of objects between storage classes' similarly, if versioning is enabled and you wish to manage old versions.

Configure 'Expire objects' by specifying the number of days after which objects will be permanently deleted.

Save the changes to create the lifecycle rule.

S3 Pricing Model

S3 Performance Optimization

S3 Security Features

S3 Data Access & Eventing

S3 Replication Overview

Setting Up S3 Cross-Region Replication

Prerequisites

Navigate to S3 in the AWS console.

Copy the name of the lab-provided source bucket (e.g., `appconfigprod1`). Click 'Create bucket'.

Set the destination bucket's name (e.g., paste `appconfigprod1` and replace it with `appconfigprod2` to ensure global uniqueness). Select a different AWS Region (e.g., `US West (Oregon) us-west-2`).

Under 'Copy settings from existing bucket', click 'Choose bucket' and select the source bucket (`appconfigprod1`). Then click 'Choose bucket'.

Leave the rest of the settings as defaults and click 'Create bucket'. (Ignore system tag warnings if they appear).

Click the source bucket (`appconfigprod1`) to open it. Navigate to the 'Management' tab.

In the 'Replication rules' section, click 'Create replication rule'. Click 'Enable Bucket Versioning' if not already enabled.

Under 'Replication rule configuration', enter a 'Replication rule name' (e.g., `CrossRegion`). Under 'Source bucket', select 'Apply to all objects in the bucket'.

Under 'Destination', leave 'Choose a bucket in this account' selected. Click 'Browse S3', select the newly created destination bucket (`appconfigprod2`), and click 'Choose path'. Also, click 'Enable bucket versioning' for the destination.

Under 'IAM role', click the dropdown and select 'Create new role'. Click 'Save'.

When prompted with the 'Replicate existing objects?' popup, choose 'No, do not replicate existing objects' and click 'Submit'.

S3 Versioning

Creating S3 Buckets and Enabling Versioning (Lab Mode)

Prerequisites

Navigate to S3 in the AWS console. Click 'Create bucket'.

Create a 'Private Bucket': On the 'Buckets' screen, click 'Create bucket'. Set 'Bucket name' to `acg-testlab-private-<random>`. Select `US East (N. Virginia) us-east-1` for 'Region'. Leave other settings as defaults (Block Public Access should be enabled by default). Click 'Create bucket'.

Enable Versioning on the Public Bucket: Back on the public bucket page, click the 'Properties' tab. In the 'Bucket Versioning' section, click 'Edit'. Click 'Enable' and 'Save changes'.

Upload another image to test Versioning: Click the 'Objects' tab. Click 'Upload', 'Add files'. Rename `cat2.jpg` to `cat1.jpg`. Upload this newly renamed `cat1.jpg` image. Click 'Upload'. After success, click its name to view properties, then click the 'Versions' tab.

S3 Static Website Hosting

Prerequisites

Upload your website content (e.g., `index.html` and `error.html`) to the S3 bucket. You can find sample code in the provided GitHub repository.

Enable Static Website Hosting: On the bucket's `Properties` tab, scroll to 'Static website hosting' and click 'Edit'. Select 'Enable', set 'Index document' to `index.html`, and 'Error document' to `error.html`. Save changes.

Apply a Bucket Policy to make objects publicly readable: Go to `Bucket` > `Permissions` > `Bucket Policy`. Add a JSON statement that grants `s3:GetObject` permission to everyone (Principal: `*`). Ensure the `Resource` ARN ends with `/*` so the policy applies to all objects within the bucket.

S3 Integrations and Advanced Use Cases

Exam Tips

Glossary

Key Takeaways

Content Sources

Apply a Bucket Policy to make objects publicly readable: Go to `Bucket` > `Permissions` > `Bucket Policy`. Add a JSON statement that grants `s3:GetObject` permission to everyone (Principal: ``). Ensure the `Resource` ARN ends with `/` so the policy applies to all objects within the bucket.