Design WhatsApp: System Design Interview Guide
WhatsApp delivers 100+ billion messages per day across 2+ billion users, with end-to-end encryption and 99.99% delivery success.
Designing WhatsApp forces you to combine real-time bidirectional messaging, durable offline delivery, end-to-end encryption, presence and typing indicators, and group fan-out. It is the canonical chat system design problem and a favorite at FAANG.
Asked at: Commonly asked at Meta (which owns WhatsApp), Google, Amazon, Slack, Discord, Snap, and Microsoft Teams. It is the standard real-time messaging interview question.
Why this question is asked
Design WhatsApp tests whether you understand persistent connections (WebSocket or XMPP), message durability, the difference between presence and message delivery, fan-out to groups, and how end-to-end encryption changes server responsibilities. Bonus points if you can talk about why WhatsApp can run on 50 engineers when Facebook Messenger runs on hundreds.
Requirements
Always clarify these in the first 5 minutes of the interview. Do not start drawing boxes until both lists are agreed.
Functional requirements
- Users send and receive 1:1 messages in real time
- Group chats up to 1024 members with the same delivery guarantees
- Delivery receipts: sent, delivered, read
- Presence and typing indicators
- Media messages (image, video, audio, document) up to 100 MB
- Voice and video calls (out of scope for this writeup but worth mentioning)
- Offline message queuing with delivery on reconnect
- End-to-end encryption for all message content
Non-functional requirements
- Message delivery latency under 500 ms at the 99th percentile when both parties online
- 99.99% delivery success including offline queueing
- Server cannot read message content (E2E encryption)
- Scale to 2B users with sub-second connection setup
- Mobile-first, must work on flaky networks
- Battery efficient on mobile clients
Back-of-envelope scale estimates
Show your math. Pulling numbers from thin air signals you have not thought about the load.
Total users
2B
WhatsApp public reporting. Assume 60% daily active.
Messages per day
100B
100B+ messages per day is a publicly cited figure. That averages to roughly 80 messages per DAU.
Messages per second (peak)
5M
100B per day with a 4x peak factor for prayer times, evenings, and New Year.
Concurrent persistent connections
500M
Roughly 40% of DAU online at peak. Each holds a WebSocket or XMPP-style connection.
Storage per message
200 bytes
Plus media stored separately. 100B messages per day equals roughly 7 PB per year of message metadata. Media is much larger but stored in object storage with TTL.
High-level architecture
Clients hold a persistent connection (originally XMPP, now a proprietary protocol) to a fleet of Connection Servers. When a client sends a message, the connection server hands it to a Message Router. The router looks up the recipient's connection (in a sharded session store keyed by user_id), and if the recipient is online, it pushes the message over their connection. If offline, the message is queued in an Offline Message Store (sharded by user_id) and delivered on reconnect. End-to-end encryption (Signal Protocol) is handled client-side: the server only sees ciphertext blobs. Media (images, videos) is uploaded to a Media Service that returns a URL; the URL is encrypted and embedded in the message. Group chats use the same router with a fan-out step: the router looks up group members and pushes a copy to each online member. Push notifications (APNs, FCM) are used to wake the recipient app when offline.
In a real interview, sketch this on the whiteboard before diving into any single box.
Core components
Walk through each service. The interviewer wants to hear what each one owns, not just the names.
Connection Server (Chat Gateway)
Holds millions of persistent connections per node. Originally Ejabberd (XMPP), rewritten in Erlang for high concurrency. Routes incoming messages to the Message Router and pushes outgoing messages to the client.
Session Store
A sharded KV store mapping user_id to the connection server holding that user's active connection. Reads and writes on every message. Backed by Redis or a custom in-memory store.
Message Router
Receives the ciphertext blob, looks up the recipient in the session store, and either pushes over the live connection or stores in the offline queue. For groups, fans out to all members.
Offline Message Store
Sharded by user_id. Stores messages for users not currently connected. On reconnect, the connection server drains the queue and pushes everything in order. Messages are deleted from the server after delivery (WhatsApp is store-and-forward, not durable archive).
Media Service
Handles uploads and downloads of images, videos, voice notes, and documents. Media is encrypted client-side with a per-message symmetric key (the key is encrypted in the E2E payload). Media is stored in object storage with a 30-day TTL.
Presence Service
Tracks online/offline/typing/last-seen state. Updates flow over the same persistent connection but are rate-limited. Presence is eventually consistent; a 1-2 second lag is acceptable.
Push Notification Gateway
When a message is queued for an offline user, fires a wake-up notification via APNs (iOS) or FCM (Android). The notification does not contain message content (it cannot, given E2E encryption).
Data model
Pick the right store per table. Justify each choice with the access pattern, not by reflex.
usersuser_id (PK)phone_number (UNIQUE)registration_id (E2E identity)device_keys[]last_seen_atSharded by user_id hash. Device keys are public keys used for the Signal Protocol.
sessionsuser_id (PK)device_idconnection_serverconnected_atIn-memory only. Wiped on disconnect. Used by the Message Router for live lookup.
offline_messagesuser_id (PK partition)message_id (sortable, e.g. ULID)ciphertext_blobsender_idreceived_atSharded by user_id. Append-only. Deleted after delivery. Bounded retention (typically 30 days) for users who never reconnect.
groupsgroup_id (PK)members[]admins[]created_atnameSharded by group_id. Members list cached aggressively because every group message reads it.
media_objectsmedia_id (PK)object_urlsha256size_bytesuploaded_atMetadata in SQL. Bytes in object storage (S3-equivalent). 30-day TTL.
Deep dives
These are the conversations the interviewer is steering you toward. Practice each one until you can talk through it without notes.
Persistent connections at 500M concurrent
A single connection server in WhatsApp's published architecture holds 1 to 2 million persistent connections. With 500M concurrent users, that is 250 to 500 servers just for the chat gateway. The trick is Erlang's lightweight processes: each connection is a process with tiny memory overhead (a few KB), and the BEAM scheduler handles millions of them on one box. The Node.js or JVM-per-thread model would need 10x the hardware. Load balancers in front are TCP-level (Layer 4), not HTTPS (Layer 7), because the connection is long-lived and the LB does not need to read traffic.
Message routing and offline queueing
When a message arrives at the gateway, the Message Router does a session lookup keyed by recipient user_id. If the recipient is connected, the router sends the message directly to the connection server that holds the recipient's session. If not connected, the router writes to the offline message queue (sharded by recipient user_id). On reconnect, the recipient's connection server drains the queue in order. Messages are acknowledged client-to-client; the server deletes the queued message only after the recipient acks. This gives at-least-once delivery. The Signal Protocol on the client handles deduplication.
End-to-end encryption and the server's role
WhatsApp uses the Signal Protocol. Each user has a long-term identity key, a signed pre-key, and a batch of one-time pre-keys, all stored on the server. To start a session, the sender fetches the recipient's pre-key bundle, derives a shared secret, and encrypts the message. The server only sees the ciphertext. For groups, the sender encrypts the message once per recipient (sender keys are an optimization to amortize this), and the server distributes the ciphertext blobs. Because the server cannot read content, it cannot do content-based features like server-side search or content moderation. WhatsApp pushes that to the client.
Group fan-out for 1024-member groups
A message to a 1024-member group has to fan out to every online member. The Message Router pulls the group member list from the cached groups table, then dispatches a copy to each member's session. Encryption is per-recipient (or per-sender-key with rotation). The expensive part is the read of the member list, which is why it is aggressively cached. For very large broadcast groups (a separate feature), the design switches to a pub-sub model where members opt in to a topic and the message is dropped onto the topic for batch fan-out.
Trade-offs to discuss
Every senior interviewer expects you to surface at least 3 of these. Pick the decisions, state the alternatives, and justify your choice.
Erlang vs Go or Java for the gateway
Erlang's lightweight processes win at millions of concurrent connections per node. Go and Java need 5 to 10x the hardware for the same load. The trade-off is hiring: the Erlang talent pool is small. WhatsApp accepted that trade-off.
Custom protocol vs XMPP
XMPP is verbose (XML), chatty, and not great for mobile battery. WhatsApp started on XMPP, then switched to a binary proprietary protocol over the same persistent connection. Binary saves bytes; battery savings show up on the device.
Server-stored history vs store-and-forward
Server-stored history (like iMessage in iCloud) means users can see their full chat history on a new device. Store-and-forward means the server forgets after delivery. WhatsApp picked store-and-forward for privacy and storage cost. The cost is that chat history transfers between devices are slow and require an explicit backup.
Push notifications with content vs without
If the server could read messages, it could include preview text in the push notification. With E2E encryption, the notification only says You have a new message; the client decrypts and updates the lock-screen preview. iOS and Android both support this client-side rendering hook.
Centralized presence vs gossip
A centralized presence service is simpler but a single hot path. WhatsApp uses a sharded presence service and rate-limits typing indicators. Last seen is updated lazily (every minute, not every second).
How WhatsApp actually does it
WhatsApp famously ran on a tiny engineering team (~50 engineers serving 900M users at the time of the Facebook acquisition) by leaning hard on Erlang and a custom version of FreeBSD-tuned servers. The chat protocol is a proprietary binary protocol derived from XMPP. End-to-end encryption uses the Signal Protocol developed by Open Whisper Systems. Media is stored in a custom blob store. The Erlang choice came from WhatsApp's hire of an Ejabberd contributor in the early days.
Lessons to study before this interview
If any of these topics are fuzzy, the interviewer will catch it. Each lesson is 15 to 60 minutes with diagrams, code, and a quiz.