How Canva Tests 300 Designs/Second Without Looking
Learn how Canva's engineering team entirely changed search testing while keeping 200M users' data private.
When you're handling 30 billion designs from 200 million users who create nearly 300 new designs every second, how do you make search better without ever looking at what people are searching for?
That's the puzzle Canva's engineering team had to solve.
The strategic challenge
Privacy isn't just a checkbox for Canva - it's a core value embedded in their "Be a Good Human" principle. This commitment means engineers can't view users' designs or analyze their search patterns, making traditional search improvement methods impossible.
The standard playbook for search optimization typically involves deep analysis of user search patterns, countless hours studying what makes searches successful or unsuccessful, and rigorous testing of improvements using real user data.
But Canva had to throw out this entire playbook. They wanted to build a world-class search engine while treating user content like a black box—completely invisible and untouchable.
Engineers spent days testing simple improvements, search quality couldn't be adequately measured, and the development process was becoming a bottleneck.
So, how did they solve this problem?
They changed the entire approach to search development. Instead of peeking at user searches or analyzing real designs, Canva made a bold move: they created an entirely synthetic world of designs and search behaviors that would mirror actual usage while maintaining absolute privacy.
They built a complete user behavior simulation without using actual user data. But this wasn't just any simulation. Using GPT-4, they crafted realistic documents, presentations, and social media posts that captured the complexity and diversity of real-world usage patterns.
This new approach solved three critical challenges:
The privacy paradox: Engineers could experiment freely without touching sensitive user data. Every test design, every query, and every interaction was synthetic yet realistic enough to yield meaningful results.
The scale problem: The system could generate and test thousands of scenarios in minutes, something impossible with traditional approaches. Engineers could run more than 300 evaluations in the same time it previously took to complete just one test.
The quality assurance challenge: Most importantly, these synthetic tests proved remarkably accurate at predicting real-world performance. When changes showed promise in the artificial environment, they consistently delivered similar improvements in production.
Another approach could have been to analyze aggregate user behavior data and leverage AI to enhance the search experience, without accessing individual search queries or content.
Technical implementation
Building this privacy-preserving testing environment wasn't just a matter of asking GPT-4 to generate some random data. The engineering team had to solve several intricate puzzles to make this work in practice.
The first challenge was to create realistic test cases.
Canva's team had to generate designs that reflected real-world use patterns. They fed GPT-4 with carefully crafted prompts about business presentations, marketing materials, and social media posts.
They also had to teach GPT-4 about the nuances of how people actually search. Someone looking for last quarter's financial report might search for "Q4 numbers," "financial update December," or even "budget meeting deck from last month."
But GPT-4 wasn't always the perfect collaborator. Sometimes, it refused to create very long titles (a blessing in disguise—it turns out humans rarely use super-long titles, either!).
Other times, it would get creative with misspellings in ways that didn't match real-world patterns.
The team constantly refined their prompts and validation approaches to keep the synthetic data realistic.
(Fig. Examples of test cases used to measure precision)
One particularly clever innovation was their handling of precision testing. Instead of checking if a search returned the correct result, they created entire "families" of related documents - draft versions, templates, and similar content - to ensure their search could properly rank and prioritize results just as needed in production.
The results spoke for themselves. Engineers could test changes in minutes instead of days, and they discovered and fixed issues they never could have found before.
One example? They identified that their staging environment was costing more than production due to an overlooked configuration - something that might have gone unnoticed for months in the old system.
What were the results?
Canva's team can now process over 1000 test cases in less than 10 minutes and perform more than 300 offline evaluations in the same 2-3 day period it once took to run a single online experiment.
Notably, the improvements that showed promise in their synthetic testing environment consistently delivered similar gains in the real world.
This approach allowed Canva to accelerate development while improving privacy, as the engineering team could make bold search improvements without compromising their commitment to user data protection.
Learn more about it here.
Here are some of the insightful editions you may have missed: