Imagine having just a few seconds to reach millions with a critical message. This scenario would make even the most seasoned engineer break out in a cold sweat.
As someone who has spent countless hours architecting and optimizing systems for low-latency data pipelines and ensuring real-time communication, I can't help but be in awe of the systems and engineering prowess required to make such a feat possible.
Today we’ll discuss how Duolingo sent 4 million push notifications in just 6 seconds.
And we will analyze the technical details and architectural genius behind how Duolingo prepared its systems to deliver a viral Super Bowl commercial with pinpoint accuracy.
The Problem
As part of its Super Bowl marketing campaign, language learning platform Duolingo needed to send out 4 million mobile push notifications precisely when their 5-second ad aired during the commercial break.
Their previous best was 500,000 notifications in 60 seconds. And that caused a few crashes too. Yikes!
The Challenges
Unpredictable Timing: Duolingo couldn't predict exactly when their ad would air, so the notification delivery had to be triggered manually.
Massive Scale: Sending 4 million push notifications in a short time window required carefully architecting the system to scale.
Duplicate Prevention: Duolingo needed to ensure each user received only one notification, even if more than one person triggered the process independently.
Integration with Third-Party Platforms: To deliver the notifications, Duolingo had to integrate with Google's Firebase Cloud Messaging (FCM) and Apple Push Notification Service (APNS).
The Solution
Architecture Design
They built an asynchronous, event-driven architecture on AWS to handle the push notification delivery by leveraging AWS services like API Gateway, ECS, SQS, DynamoDB, S3, and CloudWatch.
Source: QCon London
API Gateway: Served as the entry point for the process, triggering the notification delivery.
Python Components in ECS: Responsible for processing the notification requests and interfacing with the queuing and data storage services.
SQS Queues: For deduplication, a FIFO (First-In-First-Out) queue was used, and a regular queue was used for publishing notifications.
DynamoDB and S3: Stored user and device data required for the notification delivery.
CloudWatch: Provided observability and monitoring for the entire system.
Scaling and Performance
To handle the massive scale:
Autoscaling: Duolingo manually provisioned 5,000 notification worker instances a few hours before the Super Bowl by modifying the autoscaling group (ASG) and ECS task.
Data Prefetching: They also set up 20 "interim worker" instances to prefetch user and device data from S3 and store it in memory for faster access.
Batching and Deduplication: FIFO queues support a 5-minute deduplication window for 300 messages/second delivery rate. So, Duolingo used a second SQS queue to trigger publishing push notifications. Since SQS queues have an in-flight message limit of 120,000 messages/second, engineers used data batching to support the required publication rate.
Stress Testing: Duolingo conducted three push notification delivery tests with 1 million users to ensure they could successfully scale the architecture and find out/address any performance bottlenecks.
Integration with Third-Party Platforms
Duolingo also contacted Google and Apple to understand any rate limits on their notification delivery platforms. Fortunately, no specific limits allowed Duolingo to focus on scaling its internal systems.
Results
On the day of the Super Bowl, the Duolingo team was able to:
Publish 95% of notifications in 3.9 seconds
Publish 99% of notifications in 5.7 seconds
This was achieved by leveraging the asynchronous, event-driven architecture and careful scaling and performance optimizations.
Lessons Learned
Importance of Stress Testing: Duolingo conducted three push notification delivery tests with 1 million users to validate the end-to-end process at scale and address any performance bottlenecks.
Vendor Integrations: While Duolingo couldn't control the performance of Google and Apple's notification platforms, proactively understanding their capabilities and limitations helped them design a robust internal system.
Observability and Monitoring: Using CloudWatch for observability was crucial in ensuring the system performed as expected and enabled quick troubleshooting during the live event.
Asynchronous, Event-Driven Architecture: Duolingo's decision to use an asynchronous, event-driven approach with queues and batching proved to be the right choice for handling the massive spike in traffic and delivering notifications at scale.
Overall, Duolingo's experience showcases the importance of careful architecture design, performance optimization, and thorough testing when building systems that need to handle extreme traffic spikes and time-sensitive events.