DynamoDB

DynamoDB Core Features and Architecture

DynamoDB is a foundational AWS service, offering a highly scalable and managed NoSQL database solution.

DynamoDB is a serverless, fully scalable, NoSQL database, fully managed by AWS, eliminating the need for server management, patching, or engine version updates.

It utilizes a NoSQL data model, specifically Key-Value and Document, providing schema flexibility.

DynamoDB delivers high performance with single-digit millisecond latency at massive scale. It can handle millions of requests per second and tables with trillions of rows and hundreds of terabytes of data.

Technical Specs: Single-digit millisecond latency; handles millions of requests per second; supports tables with trillions of rows and hundreds of terabytes of data

It provides automatic and synchronous replication across three Availability Zones for durability. For security, it integrates with IAM for fine-grained authorization.

Technical Specs: Automatic and synchronous replication across three Availability Zones

DynamoDB tables contain items.

An item is analogous to a row in a relational database; it is a complete set of attributes and represents a single record. Items can vary in the number and types of attributes, making DynamoDB a flexible, schema-less database.

Attributes are key-value pairs within an item, serving as the columns of a DynamoDB table. Each attribute has a name (case-sensitive, used as a key) and a value (the actual data).

The maximum size for an item is 400 KB. For items exceeding this limit, the recommended approach is to store the item in Amazon S3 and place the S3 object path within DynamoDB.

Technical Specs: Maximum item size: 400 KB

DynamoDB supports Scalar Types (single values), Document Types (complex structures), and Set Types (collections of unique scalar values).

A DynamoDB table is a collection of items. The table is divided into multiple partitions to handle increased throughput and store more data as the table grows. Each partition has multiple replicas distributed across Availability Zones, forming a replication group to improve availability and durability.

DynamoDB Primary Key

The primary key is essential for uniquely identifying each item and for data distribution in DynamoDB.

The primary key uniquely identifies each item in a DynamoDB table.

DynamoDB supports two types of primary keys: Simple Primary Key (Partition Key only) and Composite Primary Key (Partition Key and Sort Key).

The Partition Key's function is to distribute data across partitions. Best practices suggest choosing a Partition Key with high cardinality and low access frequency for even data distribution, which prevents 'hot partitions' (overwhelmed single partitions causing performance degradation). Querying only supports exact match/equals operators on the Partition Key.

The Sort Key is used in conjunction with a Partition Key to create a composite primary key. The combination of Partition Key and Sort Key must be unique. Items with the same Partition Key are stored together and sorted by the Sort Key. This enables range queries (>, <, BETWEEN, BEGINS_WITH) and is useful for retrieving a collection of related items in a specific order (e.g., multiple orders for a customer).

An Item Collection refers to a group of items that share the same Partition Key.

DynamoDB Read Consistency Modes

DynamoDB offers two read consistency modes to balance data freshness, performance, and cost.

Users can choose between eventual consistency for higher throughput and lower cost, or strong consistency for the most up-to-date data.

Eventual Consistency Reads

This is the default behavior for GetItem, Query, and Scan operations. Writes go to the primary node, and reads can be served from any of the three replicated nodes. It may return stale data for a short period (milliseconds to a few seconds) if a read occurs before replication is complete. Offers higher read throughput and lower cost. Suitable for applications where immediate consistency is not critical.

default_behavior: Yes

data_staleness: Possible (milliseconds to a few seconds)

performance_cost: Higher read throughput, lower cost

Use Cases:

Social media feeds
CDN
analytical dashboards

Strong Consistency Reads

Must be explicitly requested by setting the ConsistentRead parameter to true. It ensures writes/updates are reflected across all replicas before the read occurs, guaranteeing the most up-to-date data. Generally slower and more expensive due to the need for synchronization across all nodes. Suitable for applications requiring absolute data accuracy.

explicit_request_required: Yes (ConsistentRead parameter to true)

data_freshness: Guaranteed most up-to-date data

performance_cost: Generally slower and more expensive

Use Cases:

Financial transactions
inventory management
healthcare records

DynamoDB Read/Write Capacity Modes

DynamoDB provides two capacity modes to manage throughput for read and write operations, each suited for different workload patterns.

These modes allow users to optimize for cost or flexibility based on their application's traffic predictability.

Provisioned Capacity Mode

Users specify the desired number of Read Capacity Units (RCUs) and Write Capacity Units (WCUs) beforehand. Billing is based on the provisioned capacity, regardless of actual usage. Autoscaling can be configured to automatically adjust provisioned capacity within defined minimum and maximum limits. This mode is suitable for workloads with predictable traffic patterns where capacity needs can be accurately estimated. CloudWatch alarms are automatically created if autoscaling is enabled for provisioned tables, and these alarms should not be edited or deleted as they affect autoscaling functionality.

configuration: Specify RCUs and WCUs

billing: Charged for provisioned capacity

autoscaling: Supported to adjust capacity within limits

cloudwatch_alarms_for_autoscaling: Automatically created for read/write on provisioned tables (4 read, 4 write) and should not be edited/deleted.

On-Demand Capacity Mode

DynamoDB automatically scales capacity to accommodate the workload, requiring no capacity planning. Billing is pay-per-request pricing, charged only for actual read and write requests. It automatically manages throughput capacity scaling up or down. This mode is suitable for workloads with unpredictable traffic patterns, sudden spikes, or when preferring a hands-off approach to capacity management. It is generally more costly per RCU/WCU than provisioned capacity. On-demand tables are fully managed by AWS.

configuration: Automatic scaling; no capacity planning required

billing: Pay-per-request pricing; charged only for actual requests

scaling: Automatically manages throughput capacity scaling

cost: Generally more costly per RCU/WCU

Read Capacity Unit (RCU) and Write Capacity Unit (WCU) Calculations

Understanding RCUs and WCUs is crucial for optimizing performance and cost in DynamoDB.

RCU measures read operations per second. An eventual consistent read consumes 0.5 RCU for items up to 4 KB. A strongly consistent read consumes 1 RCU for items up to 4 KB. RCU consumption scales linearly and is rounded up if the item size exceeds 4 KB; every 4 KB increment requires an additional RCU. Velocity must be converted to items per second for rate calculation.

Technical Specs: Eventual Consistent Read: 0.5 RCU per read for items up to 4 KB; Strongly Consistent Read: 1 RCU per read for items up to 4 KB; RCU consumption scales linearly for every 4 KB increment (rounded up)

WCU measures write operations per second. A standard write consumes 1 WCU for items up to 1 KB. WCU consumption scales linearly and is rounded up if the item size exceeds 1 KB; every 1 KB increment requires an additional WCU. Velocity must be converted to items per second for rate calculation.

Technical Specs: Standard Write: 1 WCU per write for items up to 1 KB; WCU consumption scales linearly for every 1 KB increment (rounded up)

DynamoDB Secondary Indexes

Secondary indexes enable more flexible query patterns beyond the primary key, preventing inefficient table scans.

DynamoDB offers two types of secondary indexes, each with distinct characteristics and use cases.

Local Secondary Index (LSI)

LSIs are used to query items sharing the same Partition Key but using a different Sort Key order. They must have the same Partition Key as the base table. Queries against LSIs can be strongly consistent. LSIs must be created at table creation time and cannot be added or modified later. They share read/write capacity with the base table. A limitation is that the total size of all items with the same Partition Key across the base table and LSIs is limited to 10 GB.

partition_key_constraint: Must be same as base table

consistency: Queries can be strongly consistent

creation_time: Must be created at table creation

capacity_sharing: Shares read/write capacity with base table

size_limitation: Total size of all items with same Partition Key across base table and LSIs is limited to 10 GB

Use Cases:

Query items with same Partition Key, different Sort Key order

Global Secondary Index (GSI)

GSIs are used to query data using a different Partition Key than the base table, enabling distinct access patterns. They can have a different Partition Key and Sort Key from the base table. Reads are always eventually consistent (crucial for certification exams). GSIs can be created, updated, or deleted after table creation. They have their own provisioned read/write capacity, separate from the base table. Write throttling in a GSI can affect write operations on the base table. GSIs can help prevent hot partitions if the base table’s key distribution is poor.

partition_key_constraint: Can have a different Partition Key and Sort Key from the base table

consistency: Reads are always eventually consistent

creation_time: Can be created, updated, or deleted after table creation

capacity_sharing: Has its own provisioned read/write capacity, separate from the base table

impact_of_throttling: Write throttling in a GSI can affect write operations on the base table

Use Cases:

Query data using a different Partition Key
enabling distinct access patterns
preventing hot partitions

Amazon DynamoDB Accelerator (DAX) Deep Dive

Amazon DynamoDB Accelerator (DAX) is a caching service specifically designed to enhance DynamoDB performance.

DAX provides an in-memory cache to reduce read latency for DynamoDB, crucial for read-heavy applications.

Problem DAX Solves

DAX addresses performance bottlenecks caused by high read request volume to DynamoDB, even with its inherent low latency, by introducing a caching layer.

What is Amazon DAX?

DAX is a fully managed, highly available, in-memory caching service specifically designed for Amazon DynamoDB. It delivers up to 10x performance improvement, reducing latency from milliseconds to microseconds. As a managed service, AWS handles cache invalidation, data population, and cluster management. It is API compatible with existing DynamoDB APIs, requiring no application logic modifications. DAX uses a pay-as-you-go capacity model and can lower provisioned DynamoDB capacity needs by offloading read requests. The default TTL is 5 minutes, configurable to balance cache freshness with retrieval speed.

performance_improvement: Up to 10x

latency_reduction: From milliseconds to microseconds

default_ttl: 5 minutes (configurable)

api_compatibility: Compatible with existing DynamoDB APIs

cost_model: Pay-as-you-go

How DAX Works (Architecture)

Applications are configured to communicate with DAX endpoints. Upon a cache hit, DAX returns data immediately from its in-memory cache. If there's a cache miss, DAX fetches data from DynamoDB, stores it in its cache, and then returns it to the application. Implementation involves replacing DynamoDB endpoints with DAX endpoints, without requiring extra code for cache management.

DAX Cluster Creation (AWS Console)

To create a DAX cluster, navigate to DynamoDB in the AWS Management Console, select 'Clusters' under 'DAX'. Configuration options include cluster name, description, node family (T, R, or All Families), cluster size (1 to 11 nodes, AWS recommends 3), network settings (IPv4, IPv6, Dual Stack), subnet group, VPC, subnets, access control (security group), Availability Zone (automatic or specific), and IAM permissions (new or existing role with read-only or read-write policy to all or specific DynamoDB tables). Encryption is enabled by default (at rest and in transit). Parameter group and tagging are optional. There is no free tier for DAX; creating a cluster incurs charges.

cluster_size_range: 1 to 11 nodes (AWS recommends 3)

network_settings: IPv4, IPv6, or Dual Stack

encryption: Enabled by default (at rest and in transit)

cost_model: No free tier; incurs charges upon creation

DAX vs. ElastiCache Comparison

comparison-table

DAX and ElastiCache both provide caching capabilities but are designed for different use cases and levels of flexibility.

Choosing between DAX and ElastiCache depends on the specific caching requirements of your application and its interaction with data sources.

Option	Specialization	Use Case	Simplicity / Management	Data Handling
DAX	Purpose-built, plug-and-play cache specifically for DynamoDB	Ideal for read-heavy applications using only DynamoDB	Handles cache invalidation and data consistency automatically	Stores individual objects, direct scan/query results without complex operations
ElastiCache	General-purpose, highly flexible cache	Usable with DynamoDB or other data sources (RDS, Aurora, etc.); suitable for storing aggregated results or when caching data from multiple sources	Requires manual management of cache invalidation and data consistency in application code	More flexible, can store aggregated results or complex data structures from multiple sources

DynamoDB in Serverless Architecture (API Gateway + Lambda)

DynamoDB is a common choice for serverless backends due to its managed nature and scalability, often integrated with API Gateway and Lambda.

This architecture uses AWS services for a serverless approach. API Gateway handles API requests, Lambda functions process these requests, and DynamoDB serves as the NoSQL database. This option provides better scalability and eliminates server management, though it can be more complex to learn for beginners compared to traditional frameworks.

A critical step is to design the schema for DynamoDB tables, ensuring it supports all necessary app functionality and queries. This involves defining and collecting data for participants, user scores, and votes.

Lambda functions must be linked to DynamoDB, ensuring they can perform necessary read and write operations. This includes configuring the DynamoDB table with the correct schema and setting up IAM roles and permissions for Lambda functions to access DynamoDB.

A Lambda function interacts with DynamoDB using the boto3 library for Python. For a visitor counter, the Lambda function would retrieve and update the count in a DynamoDB table configured with on-demand pricing. IAM permissions between the Lambda function and the DynamoDB table are crucial for successful operation.

Technical Specs: DynamoDB on-demand pricing for visitor counter

This Python code for a Lambda function updates a visitor count in a DynamoDB table named 'VisitorCounter'. It uses the `update_item` operation to atomically increment the 'CounterValue' for an item identified by 'CounterName' = 'TotalVisitors'. It returns a success message.

This Lambda function fetches the current visitor count, increments it, updates it in DynamoDB, and then returns the new `updatedValue` in the response body. It uses `get_item` to retrieve the current count and `update_item` for the increment.

Scanning an entire DynamoDB table can consume a lot of read capacity and cause large data transfers, leading to higher costs and slower performance. Instead of full table scans, using queries (based on primary key values) is more efficient. For large tables, pagination should be used to retrieve data in smaller portions. Projections can also be used to retrieve only specific attributes, further reducing data transfer.

This improved Python Lambda function performs a `scan` operation with `ProjectionExpression` to retrieve only specified attributes ('Artist', 'Song', 'Country', 'imageUrl') and uses pagination (`ExclusiveStartKey`) to retrieve data in smaller portions. This optimizes performance and cost compared to a full, unpaginated scan.

DynamoDB Migration Considerations

When migrating to AWS, DynamoDB is a strong candidate for a 'Serverless-First' architecture due to its cost-effectiveness and scalability for unpredictable workloads.

This architecture adheres to a scale-to-zero serverless philosophy with pay-per-use pricing. It uses AWS Lambda for compute, Amazon API Gateway for the API, and Amazon DynamoDB for the database (on-demand mode).

The estimated cost for 50 users is $3-15/month, with DynamoDB contributing about $0.75. This is the lowest cost option for unpredictable traffic.

Technical Specs: Cost: $3-15/month (50 users); DynamoDB: $0.75/month

Benefits include lowest cost for unpredictable traffic, automatic scaling from 0 to 10,000 users, zero server management, and high availability built-in.

Technical Specs: Automatic scaling from 0 to 10,000 users

Drawbacks include potential cold starts (2-5 seconds for the first request), moderate migration complexity, vendor lock-in to AWS Lambda/DynamoDB, the need for upfront design of DynamoDB query patterns, and requiring a rewrite for session management.

Technical Specs: Cold starts (2-5 seconds first request); Moderate migration complexity (3/5)

This architecture is recommended when traffic is unpredictable/bursty, there's an extremely tight budget ($5-10/month), the team is comfortable with serverless architecture, and vendor lock-in is acceptable.

Learning Objectives

DynamoDB Core Features and Architecture

DynamoDB Primary Key

DynamoDB Read Consistency Modes

Eventual Consistency Reads

Strong Consistency Reads

DynamoDB Read/Write Capacity Modes

Provisioned Capacity Mode

On-Demand Capacity Mode

Read Capacity Unit (RCU) and Write Capacity Unit (WCU) Calculations

DynamoDB Secondary Indexes

Local Secondary Index (LSI)

Global Secondary Index (GSI)

Amazon DynamoDB Accelerator (DAX) Deep Dive

Problem DAX Solves

What is Amazon DAX?

How DAX Works (Architecture)

DAX Cluster Creation (AWS Console)

DAX vs. ElastiCache Comparison

DynamoDB in Serverless Architecture (API Gateway + Lambda)

DynamoDB Migration Considerations

Exam Tips

Glossary

Key Takeaways

Content Sources