Backup - AWS Study Guide (v2.0)

AWS Backup Service Overview

AWS Backup is a managed service that centralizes and automates backup and recovery, simplifying data protection across AWS services. (source_page: 1)

AWS Backup is a centralized, managed service designed to simplify and automate backup and recovery processes across various AWS services. It aims to address the complexity and decentralization of traditional IT backup strategies.

AWS Backup eliminates the need to manage backups through separate interfaces for different services (e.g., EC2, RDS, DynamoDB), reduces manual configuration errors, and simplifies enforcing consistent backup policies and auditing.

AWS Backup provides a unified solution with a single dashboard for defining, applying, and managing backup policies, retention, and lifecycle across an entire AWS environment. (source_page: 1) It ensures backup and recovery are critical aspects of IT infrastructure. (source_page: 3)

AWS Backup offers a single pane of glass for centralized management. It uses 'Backup Plans' for policy-based orchestration, which can be created using AWS-defined templates, custom configurations, or JSON files. Backup Plans define frequency, retention periods, and cross-region replication for disaster recovery. It supports Continuous Backup for near-zero Recovery Point Objective (RPO) and point-in-time recovery for services like RDS and EFS. Restoration Point Testing is available to ensure data recoverability. Resources can be automatically included in backup plans using tag-based backup.

AWS Backup includes a feature that scans backups for malware using AWS GuardDuty.

AWS Backup supports a growing list of services across Compute & Storage, Databases, File Systems, and Hybrid Cloud.

AWS Backup Vaults and Vault Lock

AWS Backup Vaults provide secure, encrypted storage for backups, while Vault Lock enforces immutability for compliance and protection. (source_page: 1)

A secure, encrypted storage location within your AWS account to organize and store backups. It acts as a centralized repository for managing and accessing backups.

Vault Lock enforces immutability for backups, implementing a “write once, read many” (WORM) model. It prevents deletion or alteration of backup recovery points until their defined retention period expires, even by the AWS account root user. This is crucial for meeting strict regulatory compliance (e.g., financial industry) and protecting against insider threats or ransomware.

These are specialized secure storage isolated from the primary production environment through software and security configurations.

Configuring AWS Backup Plans

procedure

Step-by-step process for creating and configuring backup plans in AWS Backup for EC2 instances and EBS volumes. (source_page: 1, 3)

Defines backup rules, schedules, and resources to be backed up, allowing for automated backups based on templates or custom settings. (source_page: 3)

Prerequisites

AWS Management Console Login (source_page: 3)
Existing EC2 instances and EBS volumes (source_page: 3)

1

Access AWS Backup Service

💡 The entry point for all AWS Backup configurations.

2

Create a Backup Vault

💡 A secure, encrypted location to store all backups.

3

Configure Backup Vault Details

💡 To name the vault and optionally enable encryption for enhanced security.

4

Create a Backup Plan

💡 To define backup rules, schedules, and target resources. Can be created from scratch or using AWS-defined templates.

5

Define Custom Plan Configuration (e.g., 'CS_backup_demo')

💡 To set specific parameters for the backup strategy.

6

Configure Backup Schedule

💡 To specify frequency, window, and retention.

7

Enable Continuous Backup (Optional)

💡 For near-zero RPO and point-in-time recovery for supported services; disables lifecycle policy if enabled.

8

Configure Retention Period and Cross-Region Replication (Optional)

💡 To define how long backups are kept and for disaster recovery strategies.

9

Set Advanced Backup Settings (e.g., Windows VSS)

💡 To ensure application-consistent backups, particularly for Windows-based instances running Microsoft products. (source_page: 3)

10

Assign Resources to the Backup Plan

💡 To specify which resources will be protected by this plan.

11

Specify Resource Assignment Name and IAM Role

💡 IAM role is required for the backup vault to access AWS services.

12

Select Resources (All, Specific, or Tag-Based)

💡 Tag-based filtering ('AWS backup = yes') allows automated inclusion of new resources, ensuring scalability.

Example: A backup plan with a tag filter prod will automatically include new EBS volumes tagged with prod. (source_page: 1)

Amazon RDS Backup and Recovery

Amazon RDS provides robust backup and recovery mechanisms, including automated backups, manual snapshots, and advanced features for Aurora. (source_page: 2)

Enabled by default, RDS performs daily backups during a defined backup window. Transaction logs are continuously uploaded to Amazon S3 every 5 minutes. Point-in-time recovery is achieved by combining transaction logs and daily backups, allowing restoration to any second within the retention period, up to the last 5 minutes. The retention period is configurable from 1 day to a maximum of 35 days; setting it to 0 disables automated backups. Default retention is 7 days in the AWS Console and 1 day via RDS API/AWS CLI.

Technical Specs: Frequency: Daily; Transaction Logs: Uploaded every 5 minutes; Point-in-Time Recovery: Up to last 5 minutes; Retention Period: 1-35 days (0 disables); Default Console: 7 days; Default API/CLI: 1 day

Manual snapshots offer more control over the backup strategy and persist until explicitly deleted, not being subject to automated retention periods. They are useful for compliance and long-term data retention requirements. For multi-instance database clusters, a manual snapshot captures the entire cluster. The first snapshot is a full copy; subsequent snapshots of the same instance are incremental.

Technical Specs: Persistence: Until explicitly deleted; Incremental: Yes (after first full snapshot)

Aurora supports all default RDS backup and restore capabilities but offers more advanced features. Backtrack allows rewinding the database to any point in time within a configured period (up to 72 hours) without traditional backups, minimizing data loss from user errors. Aurora creates cluster-level backups and uses a Copy on Write protocol for efficient cloning, where new clusters share the original's data volume, and only changed blocks are copied. Continuous Data Protection ensures data is always protected with a user-defined retention period (up to 35 days) without impacting performance.

Technical Specs: Backtrack: Up to 72 hours; Continuous Data Protection Retention: Up to 35 days; Scope: Cluster-level (standard RDS are instance-level)

You can create manual snapshots of your database instances (including read replicas) for point-in-time backups. These are user-defined and can be restored, copied, shared, migrated, upgraded, or exported to Amazon S3.

Automated backups are enabled by default, with RDS automatically taking daily backups. You can restore your database to any point in time within the retention period (typically 7 days, configurable up to 35 days).

Technical Specs: Retention: Typically 7 days, configurable up to 35 days.

For disaster recovery, you can configure automated backups or manual snapshots to be replicated to a different AWS region.

Manual snapshots can be deleted directly. System snapshots cannot be deleted directly as they are tied to automated backup retention or deleted when the instance is deleted and backups are not retained.

Amazon EBS Snapshots

Amazon EBS Snapshots are incremental backups of EBS volumes stored in Amazon S3, enabling volume recreation, data sharing, and automated lifecycle management. (source_page: 5)

Snapshots enable backups, volume recreation, and data sharing. They are used for creating backups of critical workloads, recreating EBS volumes from backups, and sharing and copying data. (source_page: 5) They are stored in Amazon Simple Storage Service (Amazon S3) for durability. (source_page: 5)

AWS Backup can be configured for EBS volume backups. A tag-based approach (e.g., 'AWS backup = yes') ensures only the EBS volume attached to the tagged SQL Server instance is backed up. (source_page: 3)

Technical Specs: Backup Frequency: Hourly (for demo)

Amazon DLM automates the creation, retention, and deletion of EBS snapshots. It uses tags to identify volumes for backup and a lifecycle policy to define backup and retention actions. An IAM role is required to grant necessary permissions. (source_page: 5)

EBS Snapshot Management with AWS CLI

procedure

This section details how to manage EBS volumes and snapshots using the AWS Command Line Interface (AWS CLI). (source_page: 5)

CLI commands for creating, copying, and restoring EBS snapshots.

Prerequisites

AWS CLI configured with necessary permissions
An existing EBS volume

1

Create an EBS snapshot

💡 To create a backup of a specific EBS volume.

aws ec2 create-snapshot --volume-id <volume_id> --description "<description>"

2

Copy an EBS snapshot to another region

💡 For cross-region disaster recovery or data distribution.

aws ec2 copy-snapshot --region <destination_region> --source-region <source_region> --source-snapshot-id <snapshot_id> --description "<description>"

3

Restore an EBS snapshot (create a new volume from it)

💡 To recover data or launch a new volume with the snapshot's data.

aws ec2 create-volume --size <size> --availability-zone <az> --volume-type <volume_type> --snapshot-id <snapshot_id>

4

Create an IAM Role for DLM

💡 To grant Amazon Data Lifecycle Manager (DLM) the necessary permissions to manage snapshots.

aws dlm create-default-role

5

Create a Lifecycle Policy (for DLM)

💡 To automate the creation, retention, and deletion of EBS snapshots based on defined rules and tags.

aws dlm create-lifecycle-policy --description "<description>" --state ENABLED --execution-role-arn <role_arn> --policy-details file://policyDetails.json

6

View a Lifecycle Policy (for DLM)

💡 To inspect the details of an existing DLM policy.

aws dlm get-lifecycle-policy --policy-id <policy_id>

Amazon S3 for Backup and Archiving

Amazon S3 is a highly scalable, durable, and secure object storage service ideal for data backups, archiving, and disaster recovery. (source_page: 5)

Amazon S3 is widely used for data backups, archiving, and disaster recovery. It can replace traditional tape backup storage solutions. (source_page: 5, 9)

Amazon S3 Glacier is a storage service specifically built for archiving data. It offers high performance, flexible retrieval options, and significantly low-cost storage in the cloud, making it ideal for long-term data retention. It is an extremely low-cost option for archiving and long-term backup. (source_page: 5)

S3 Glacier offers classes tailored for different retrieval needs: Instant Retrieval (milliseconds), Flexible Retrieval (3-5 hours standard, 1-5 minutes expedited, 5-12 hours bulk), and Deep Archive (long-term retention, 1-2 times per year access).

Key concepts for S3 Glacier include: a Vault (container for archives with a unique URI), an Archive (any data stored in Glacier, also with a unique URI), and a Job (used to retrieve an archive or vault inventory, with unique URIs). Notification Configuration allows receiving notifications (e.g., via SNS) upon job completion.

S3 Object Lock prevents objects from being deleted or overwritten for a specified time or indefinitely, using retention periods or legal holds. Retention modes include Compliance and Governance. (source_page: 5)

S3 Intelligent-Tiering automatically moves objects between Frequent Access, Infrequent Access, and Archive Instant Access tiers based on access patterns, optimizing storage costs. Optional tiers include Archive Access and Deep Archive Access.

S3 Versioning and File Synchronization (CLI)

procedure

This procedure outlines how to use AWS CLI for synchronizing files with an S3 bucket, enabling versioning, and recovering deleted files. (source_page: 5)

Using AWS CLI to manage S3 buckets for file synchronization, versioning, and recovery of deleted objects.

Prerequisites

AWS CLI configured
An S3 bucket
Sample files for synchronization

1

Activate versioning for your Amazon S3 bucket

💡 To keep a complete history of every version of every object, enabling recovery from accidental deletions or overwrites.

aws s3api put-bucket-versioning --bucket S3-BUCKET-NAME --versioning-configuration Status=Enabled

2

Sync local folder contents to Amazon S3

💡 To recursively copy new and updated files from a source directory to a destination.

aws s3 sync files s3://S3-BUCKET-NAME/files/

3

Delete a local file

💡 To simulate a local deletion that will be mirrored to S3.

rm files/file1.txt

4

Sync with deletion option to remove corresponding S3 file

💡 To make the S3 bucket an exact mirror of the local folder by deleting S3 files not present locally.

aws s3 sync files s3://S3-BUCKET-NAME/files/ --delete

5

Find the Version ID of the Delete Marker for the deleted file

💡 To identify the specific version of the delete marker that S3 placed on the object, which is needed to recover the file.

aws s3api list-object-versions --bucket S3-BUCKET-NAME --prefix files/file1.txt

6

Download the previous version of the deleted file

💡 Since there is no direct command to restore an older version to its own bucket, download it locally first.

aws s3api get-object --bucket S3-BUCKET-NAME --key files/file1.txt --version-id VERSION-ID files/file1.txt

7

Re-sync the local folder to Amazon S3

💡 To upload the recovered file back to S3, creating a new version and making it visible again.

aws s3 sync files s3://S3-BUCKET-NAME/files/

Hybrid Cloud Backup with AWS Storage Gateway

AWS Storage Gateway is a hybrid storage service connecting on-premises applications to AWS cloud storage for backup, archiving, and disaster recovery. (source_page: 5)

Storage Gateway allows on-premises applications to seamlessly interact with AWS cloud storage. Key use cases include Backup and Archiving (securely storing backups and archives in the cloud), Disaster Recovery (facilitating efficient strategies), Cloud Data Processing, Storage Tiering, and Migration. It supports File, Volume, and Tape storage interfaces. (source_page: 5)

Tape Gateway provides access to a virtual tape library that utilizes Amazon S3 archive tiers for long-term data retention at the lowest cost. (source_page: 5)

Cloud-Based Backups and Archives: Moving backups and archives to the cloud for cost savings and enhanced security. (source_page: 5)

Data Transfer Services for Backup and Archiving

AWS offers services like DataSync and the Snow Family to facilitate secure and efficient data transfer for backup and archiving purposes. (source_page: 5)

AWS DataSync is a managed data transfer service that automates and accelerates data movement between on-premises storage and AWS storage services, as well as between different AWS storage services. Its use cases include data migration, archiving cold data, and data protection. (source_page: 5)

Technical Specs: Connection: Internet or AWS Direct Connect; Agent: AWS DataSync Agent (using NFS protocol)

The AWS Snow Family (Snowball, Snowball Edge, Snowcone, Snowmobile) provides physical devices for transferring massive datasets securely to and from AWS. Key features include a secure enclosure, up to 210 TB of data transfer capacity (parallel shipments for Snowball), simplified logistics, and strong end-to-end encryption. Use cases include data transfer from sensors or machines, data collection in remote locations, and media/entertainment content aggregation. (source_page: 5)

Technical Specs: Data Transfer Capacity: Up to 210 TB (Snowball); Encryption: Strong end-to-end

Recovery Procedures

AWS provides various methods for data recovery from backups, ensuring business continuity. (source_page: 1, 3, 6)

AWS Backup supports Restoration Point Testing to ensure data recoverability.

To restore, select the created AMI from the backup vault, choose 'Action', and then 'Restore'. Restoration parameters (instance type, VPC, subnet, security group, IAM role) can be modified or defaults used. The process typically takes 2-3 minutes, creating a new EC2 instance from the restored AMI. The new instance will have a newly generated Instance ID and changed Public IP Address, but the same Key Pair as the original. Data validation involves logging into the restored instance and verifying data integrity.

Technical Specs: Restore Duration: Approximately 2-3 minutes; New Instance: New ID, new Public IP, original Key Pair

Automated backups in RDS allow restoration to any second within the retention period, up to the last 5 minutes, by combining daily backups and continuously uploaded transaction logs. (source_page: 2)

Technical Specs: Recovery Point Objective: Last 5 minutes; Recovery Time Objective: Any second within retention period

When deleting an RDS instance, you can choose to retain automated backups or create a final snapshot (recommended for production). If not retained, automated backups and point-in-time recovery are lost. Deletion protection must be disabled before deletion. Read replicas can be promoted to standalone primary databases if the original primary is deleted. (source_page: 6)

Backup Security and Compliance

AWS Backup and related services offer features crucial for meeting compliance and enhancing data security. (source_page: 1, 2, 5, 8)

AWS Backup provides a unified view and consistent reporting for audit purposes. (source_page: 1)

The Vault Lock feature offers a powerful defense against ransomware attacks that attempt to delete backups. (source_page: 1)

If database encryption is enabled for an RDS instance, snapshots, backups, and read replicas are automatically encrypted. (source_page: 2) Encryption is enabled by default for data at rest and in transit. (source_page: 6)

Data in Amazon S3 Glacier is automatically encrypted using AES-256, with AWS managing encryption keys. Access is controlled via AWS Identity and Access Management (IAM) policies, vault access policies, and Vault Lock policies. Only vault access policies are modifiable at any time. (source_page: 5)

Technical Specs: Encryption: AES-256

The Reliability pillar's goal is to recover from failures and mitigate disruptions. Its key topics include infrastructure/service failure recovery, and its design principles include testing recovery procedures and automatically recovering from failure. (source_page: 8)

Backup Cost Optimization

Optimizing backup costs involves selecting appropriate storage, managing retention, and leveraging pricing models. (source_page: 3, 4, 10)

Frequent backups incur storage costs. (source_page: 3)

Amazon S3 is the lowest-cost, durable storage option for retaining database backups for immediate retrieval. (source_page: 4)

Selecting cost-effective resources involves choosing the correct service, type, size, and pricing model for your use case, and planning for data transfers. Examples include using Glacier for archival storage and S3 for frequently accessed storage. (source_page: 10)

For Amazon OpenSearch workloads, in addition to EBS volumes or instance stores, UltraWarm Storage and Cold Storage provide much lower storage costs over hot storage. UltraWarm is for less frequently accessed, read-only data, combining Amazon S3 with AWS Nitro System-powered nodes for a hot-like experience without replicas. Cold Storage is for rarely accessed, read-only audit log data that is searched periodically, separating compute from storage for even lower costs and can be attached to UltraWarm nodes when queries are needed. (source_page: 10)

Technical Specs: Hot Storage calculation: Source data * (1 + number of replicas) * (1 + indexing overhead) / (1 - Linux reserved space) / (1 - OpenSearch Service overhead) = minimum storage requirement; Simplified: Source data * (1 + number of replicas) * 1.45 = minimum storage requirement

Learning Objectives

AWS Backup Service Overview

AWS Backup Vaults and Vault Lock

Configuring AWS Backup Plans

Prerequisites

Access AWS Backup Service

Create a Backup Vault

Configure Backup Vault Details

Create a Backup Plan

Define Custom Plan Configuration (e.g., 'CS_backup_demo')

Configure Backup Schedule

Enable Continuous Backup (Optional)

Configure Retention Period and Cross-Region Replication (Optional)

Set Advanced Backup Settings (e.g., Windows VSS)

Assign Resources to the Backup Plan

Specify Resource Assignment Name and IAM Role

Select Resources (All, Specific, or Tag-Based)

Amazon RDS Backup and Recovery

Amazon EBS Snapshots

EBS Snapshot Management with AWS CLI

Prerequisites

Create an EBS snapshot

Copy an EBS snapshot to another region

Restore an EBS snapshot (create a new volume from it)

Create an IAM Role for DLM

Create a Lifecycle Policy (for DLM)

View a Lifecycle Policy (for DLM)

Amazon S3 for Backup and Archiving

S3 Versioning and File Synchronization (CLI)

Prerequisites

Activate versioning for your Amazon S3 bucket

Sync local folder contents to Amazon S3

Delete a local file

Sync with deletion option to remove corresponding S3 file

Find the Version ID of the Delete Marker for the deleted file

Download the previous version of the deleted file

Re-sync the local folder to Amazon S3

Hybrid Cloud Backup with AWS Storage Gateway

Data Transfer Services for Backup and Archiving

Recovery Procedures

Backup Security and Compliance

Backup Cost Optimization

Exam Tips

Glossary

Key Takeaways

Content Sources