Learn: Scalability
Concept-focused guide for Scalability (no answers revealed).
~9 min read

Overview
Welcome! In this session, we’re diving deep into the real-world principles and patterns behind building scalable systems. You’ll walk away with a strong grasp of vertical vs. horizontal scaling, partitioning and sharding, replication strategies, caching, microservices organization, and asynchronous patterns—plus practical ways to reason through architectural trade-offs. We’ll use logical breakdowns, practical steps, and generic worked examples to help you confidently approach system scalability challenges.
Concept-by-Concept Deep Dive
1. Vertical vs. Horizontal Scalability
What it is:
Scalability describes how a system can handle increased load. Vertical scalability (“scaling up”) means upgrading existing resources—like adding more CPU, memory, or disk to a single machine. Horizontal scalability (“scaling out”) involves adding more machines or nodes to share the load.
Subtopics:
- Vertical Scaling:
- Easy to implement for smaller systems.
- Limited by hardware constraints; eventually, you can't add more capacity.
- No change in application logic, but may require downtime to upgrade.
- Horizontal Scaling:
- Adds servers or instances to distribute workload.
- Requires a distributed architecture (stateless services, shared-nothing databases).
- Supports virtually unlimited scaling, but demands more complex coordination.
Step-by-Step Reasoning:
- Identify bottlenecks (CPU, memory, I/O).
- Assess if performance can be improved by upgrading the existing machine (vertical) or by distributing work (horizontal).
- Consider fault tolerance: horizontal scaling often improves resilience.
Common Misconceptions:
- Believing vertical scaling is always easier—often true at first, but hits hard limits.
- Assuming horizontal scaling is only for huge companies—it’s valuable even for moderate growth and availability.
2. Partitioning and Sharding
What it is:
Partitioning divides a dataset or workload into smaller, more manageable pieces. Sharding is a type of horizontal partitioning in databases, distributing rows across multiple servers.
Components:
-
Sharding Strategies:
- Hash-Based: Uses a hash of a key (like user ID) to assign data to a shard. Good for uniform distribution; may cause hotspots if popular keys aren't well distributed.
- Range-Based: Assigns data based on value ranges (e.g., users A–F on one shard). Easier to query contiguous data, but can lead to uneven load.
- Directory-Based: Uses a lookup service to map data to shards.
-
Hotspots:
- Occur when a shard receives disproportionate traffic (e.g., one user or product is extremely popular).
- Mitigated by careful sharding and rebalancing strategies.
Calculation Recipe:
- Estimate anticipated data and request distribution.
- Choose a sharding key that minimizes hotspots and balances load.
- Plan for future re-sharding or migration if usage patterns shift.
Common Misconceptions:
- Thinking sharding is only about data size; it’s also about balancing access patterns.
- Assuming hash-based always solves hotspots—skewed keys can still cause imbalance.
3. Replication Strategies
What it is:
Replication involves copying data across multiple servers for fault tolerance, availability, and performance. Methods vary in consistency, latency, and complexity.
Subtopics:
- Synchronous Replication:
- Writes are acknowledged only when all replicas confirm.
- Guarantees strong consistency, but increases write latency.
- Asynchronous Replication:
- Writes are acknowledged as soon as the primary completes, with replicas catching up later.
- Lower latency, but risk of data loss on failure (eventual consistency).
- Read Replicas:
- Used to offload read queries.
- Writes go to the primary; replicas lag slightly behind.
Reasoning for Use:
- Use synchronous where consistency is critical.
- Use asynchronous or read replicas to improve performance and scale reads.
Common Misconceptions:
- Believing more replicas always means better performance—write latency, replication lag, and network costs must be considered.
- Assuming eventual consistency is always “good enough”—some applications (like financial transactions) require strong consistency.
4. Caching Patterns and Strategies
What it is:
Caching stores frequently accessed data in fast storage (memory) to reduce load on slower backend systems (like databases).
Types of Caching:
- Read-Through Cache: Application checks cache first; on miss, fetches from DB and updates cache.
- Write-Through Cache: Writes go to both cache and DB simultaneously.
- Write-Back/Write-Behind Cache: Writes go to cache first, then asynchronously to DB.
- Cache Aside (Lazy Loading): Application loads data into cache only when needed.
Eviction Policies:
- LRU (Least Recently Used): Removes least recently used items.
- LFU (Least Frequently Used): Removes least accessed items.
- TTL (Time-to-Live): Expires items after a set time.
Reasoning:
- For read-heavy workloads, caching can dramatically reduce database hits.
- For write-heavy, ensure cache coherency and avoid stale data.
Common Misconceptions:
- Assuming cache is always up-to-date—stale data is a risk unless carefully managed.
- Over-caching can waste resources or cause eviction of important data.
5. Microservices Organization and Communication
What it is:
Microservices break applications into small, independently deployable services. Scalability and maintainability depend on how you split functionality and communicate.
🔒 Continue Reading with Premium
Unlock the full vlog content, professor narration, and all additional sections with a one-time premium upgrade.
One-time payment • Lifetime access • Support development
Join us to receive notifications about our new vlogs/quizzes by subscribing here!