From 115s to 4s: How Dropbox Extracts Intelligence from Files Instantly

Discover how Dropbox rebuilt its file processing system to convert 300 file types into instant answers, process 2.5 billion requests daily, and cut response times from 115 seconds to just 4 seconds.

Oct 23, 2024

File search has come a long way from grep and ctrl+F. As tech teams, we've all been there—combing through last quarter's Sprint retrospectives, scanning meeting recordings for that key decision, or parsing JSON logs for that one error message.

Yes, we've made huge progress with elastic search and vector embeddings, but let's be real—most search tools still give us documents instead of actual answers.

Dropbox took a different approach to this age-old problem: making files answer questions directly.

The enterprise challenge

A typical enterprise faces two critical problems. Information is scattered across thousands of files—documents, videos, presentations, and contracts.

Finding specific information means either manually searching through files, asking colleagues who might remember where the information is, or using basic search functions that match words but miss context.

All these approaches drain productivity and make decision-making difficult.

For perspective, Dropbox processes an exabyte of data daily across its platform—equivalent to streaming Netflix's entire content library 3,500 times over.

The challenge was clear: How do you make billions of files "intelligent" enough to answer questions while keeping costs manageable and response times under 5 seconds?

The engine behind Dropbox's AI

Architecture that scales

Riviera is an intelligent processing framework that orchestrates complex transformations through a sophisticated plugin architecture.

Each plugin runs in an isolated container, ensuring security and reliability while handling sensitive enterprise data.

The system maintains a conversion graph, mapping out all possible transformation paths between file types.

For instance, when processing a CAD file for AI analysis, Riviera might convert CAD to PDF for visual representation, extract text and metadata from the PDF, and generate AI-ready embeddings from the extracted content.

This multi-step pipeline approach is what allows Riviera to handle 300 different file types while processing 2.5 billion requests daily.

Intelligent resource management

What makes Riviera genuinely innovative is its approach to resource optimization. Rather than treating each conversion as a standalone task, Riviera creates a directed graph of all possible conversion paths, identifies the most efficient route for each file type, and caches intermediate states for reuse while prioritizing high-demand conversion paths.

Consider video content processing: Riviera creates a cascade of reusable transformations, from video to audio extraction, to audio transcription, then to formatted text, and finally to semantic embeddings.

Each stage is cached and reusable, meaning popular content only needs to be processed once, dramatically reducing server load and response times.

Enterprise-grade security and performance

Riviera's container-based architecture provides scalability and enterprise-grade security. Each conversion runs in its own isolated environment, preventing potential security breaches from affecting the broader system.

Riviera supports files up to multiple gigabytes in size, processes thousands of concurrent requests in parallel, delivers sub-second response times for cached conversions, and scales automatically based on demand patterns.

The path to AI enhancement

Riviera's genius in design becomes apparent in how it enables AI features. By treating AI processing as just another type of conversion, Dropbox created a scalable foundation for intelligence through three distinct layers.

The Content Transformation Layer converts any file type to processable text while maintaining formatting and structural information.

Above this, the Semantic Processing Layer generates embeddings for content understanding and maps relationships between different content pieces.

Finally, the Intelligence Layer handles summarization and Q&A requests, leveraging cached transformations to provide source-attributed responses.

(The high level architecture of their file previews surface, with new machine learning components highlighted)

This layered approach enabled Dropbox to achieve dramatic performance improvements—from 115 seconds to 4 seconds for complex queries.

It's not just about speed; it's about creating a sustainable, scalable architecture that can handle enterprise-level demands while maintaining security and reliability.

Dropbox's AI impact in numbers

The results are compelling:

Processing costs per summary dropped by 93%, while query costs decreased by 64%
Response times improved from 115 seconds to just 4 seconds at the 75th percentile
Complex queries across multiple files now return answers faster than a typical web search

For enterprise teams, this means:

Leaders can now get instant summaries of hour-long video meetings without watching the recording.

Engineers can extract precise answers from thousands of pages of technical documentation in seconds, while analysts can rapidly compare information across multiple quarterly reports with unprecedented accuracy.

Most importantly, teams can verify facts across vast document repositories without the traditional manual cross-referencing that consumes hours of valuable time.

Learn more about it here.

Here are some of the insightful editions you may have missed:

Microsoft Outage: 4 Critical Learnings
Lessons From Adidas’ Focused Global Cloud Migration
How Booking.com 2Xed Team Performance
How to Manage Content from 1 Billion Users

Simform Newsletter

Discussion about this post