Learn: Scalability

Concept-focused guide for Scalability (no answers revealed).

~9 min read

Learn: Scalability
Advertisement
Explore more for “softarchi”:

Overview

Welcome! In this session, we’re diving deep into the real-world principles and patterns behind building scalable systems. You’ll walk away with a strong grasp of vertical vs. horizontal scaling, partitioning and sharding, replication strategies, caching, microservices organization, and asynchronous patterns—plus practical ways to reason through architectural trade-offs. We’ll use logical breakdowns, practical steps, and generic worked examples to help you confidently approach system scalability challenges.


Concept-by-Concept Deep Dive

1. Vertical vs. Horizontal Scalability

What it is:
Scalability describes how a system can handle increased load. Vertical scalability (“scaling up”) means upgrading existing resources—like adding more CPU, memory, or disk to a single machine. Horizontal scalability (“scaling out”) involves adding more machines or nodes to share the load.

Subtopics:

  • Vertical Scaling:
    • Easy to implement for smaller systems.
    • Limited by hardware constraints; eventually, you can't add more capacity.
    • No change in application logic, but may require downtime to upgrade.
  • Horizontal Scaling:
    • Adds servers or instances to distribute workload.
    • Requires a distributed architecture (stateless services, shared-nothing databases).
    • Supports virtually unlimited scaling, but demands more complex coordination.

Step-by-Step Reasoning:

  • Identify bottlenecks (CPU, memory, I/O).
  • Assess if performance can be improved by upgrading the existing machine (vertical) or by distributing work (horizontal).
  • Consider fault tolerance: horizontal scaling often improves resilience.

Common Misconceptions:

  • Believing vertical scaling is always easier—often true at first, but hits hard limits.
  • Assuming horizontal scaling is only for huge companies—it’s valuable even for moderate growth and availability.

2. Partitioning and Sharding

What it is:
Partitioning divides a dataset or workload into smaller, more manageable pieces. Sharding is a type of horizontal partitioning in databases, distributing rows across multiple servers.

Components:

  • Sharding Strategies:

    • Hash-Based: Uses a hash of a key (like user ID) to assign data to a shard. Good for uniform distribution; may cause hotspots if popular keys aren't well distributed.
    • Range-Based: Assigns data based on value ranges (e.g., users A–F on one shard). Easier to query contiguous data, but can lead to uneven load.
    • Directory-Based: Uses a lookup service to map data to shards.
  • Hotspots:

    • Occur when a shard receives disproportionate traffic (e.g., one user or product is extremely popular).
    • Mitigated by careful sharding and rebalancing strategies.

Calculation Recipe:

  • Estimate anticipated data and request distribution.
  • Choose a sharding key that minimizes hotspots and balances load.
  • Plan for future re-sharding or migration if usage patterns shift.

Common Misconceptions:

  • Thinking sharding is only about data size; it’s also about balancing access patterns.
  • Assuming hash-based always solves hotspots—skewed keys can still cause imbalance.

3. Replication Strategies

What it is:
Replication involves copying data across multiple servers for fault tolerance, availability, and performance. Methods vary in consistency, latency, and complexity.

Subtopics:

  • Synchronous Replication:
    • Writes are acknowledged only when all replicas confirm.
    • Guarantees strong consistency, but increases write latency.
  • Asynchronous Replication:
    • Writes are acknowledged as soon as the primary completes, with replicas catching up later.
    • Lower latency, but risk of data loss on failure (eventual consistency).
  • Read Replicas:
    • Used to offload read queries.
    • Writes go to the primary; replicas lag slightly behind.

Reasoning for Use:

  • Use synchronous where consistency is critical.
  • Use asynchronous or read replicas to improve performance and scale reads.

Common Misconceptions:

  • Believing more replicas always means better performance—write latency, replication lag, and network costs must be considered.
  • Assuming eventual consistency is always “good enough”—some applications (like financial transactions) require strong consistency.

4. Caching Patterns and Strategies

What it is:
Caching stores frequently accessed data in fast storage (memory) to reduce load on slower backend systems (like databases).

Types of Caching:

  • Read-Through Cache: Application checks cache first; on miss, fetches from DB and updates cache.
  • Write-Through Cache: Writes go to both cache and DB simultaneously.
  • Write-Back/Write-Behind Cache: Writes go to cache first, then asynchronously to DB.
  • Cache Aside (Lazy Loading): Application loads data into cache only when needed.

Eviction Policies:

  • LRU (Least Recently Used): Removes least recently used items.
  • LFU (Least Frequently Used): Removes least accessed items.
  • TTL (Time-to-Live): Expires items after a set time.

Reasoning:

  • For read-heavy workloads, caching can dramatically reduce database hits.
  • For write-heavy, ensure cache coherency and avoid stale data.

Common Misconceptions:

  • Assuming cache is always up-to-date—stale data is a risk unless carefully managed.
  • Over-caching can waste resources or cause eviction of important data.

5. Microservices Organization and Communication

What it is:
Microservices break applications into small, independently deployable services. Scalability and maintainability depend on how you split functionality and communicate.

🔒 Continue Reading with Premium

Unlock the full vlog content, professor narration, and all additional sections with a one-time premium upgrade.

One-time payment • Lifetime access • Support development

Advertisement
Was this helpful?

Join us to receive notifications about our new vlogs/quizzes by subscribing here!

Advertisement