How to Manage Content from 1 Billion Users
An overview of LinkedIn and Instagram's content moderation and content recommendation engines respectively.
Instagram has well over a billion active monthly users, while LinkedIn boasts more than 950 million users from across the globe.
At such a massive scale, everything becomes a challenge.
Especially when you want users to keep coming back to the platform. You can’t let anything negatively impact their experience.
In this article, we’ll see how two social media giants tackle two of the biggest content management challenges at this scale.
How to recommend the most relevant content to users in real-time, and
How to prioritize content that needs human review.
Instagram’s real-time recommendation engine
Meta's Explore recommendations system on Instagram is designed to offer users relevant real-time content. It consists of four main stages: retrieval, first-stage ranking, second-stage ranking, and final reranking.
1. Retrieval:
In the retrieval stage, the system selects relevant content, also known as candidates, for users from a large pool of available options. It retrieves these candidates using various sources, including heuristics and machine learning techniques. Item and user embeddings generated through the Two Tower neural network and user interaction history are key components in this stage.
Candidates’ sources can be based on heuristics (e.g., trending posts) as well as more sophisticated ML approaches. Additionally, retrieval sources can be real-time (capturing most recent interactions) and pre-generated (capturing long-term interests).
2. First-stage ranking:
After retrieving candidates, a lightweight model called the first-stage ranker is used to narrow down the selection. This stage employs the Two Tower neural network, focusing on predicting the output of the second-stage model. By doing this, the first-stage ranker distills knowledge from the heavyweight second-stage model into a more manageable size.
3. Second-stage ranking:
The second-stage ranker utilizes a heavier neural network called the multi-task multi-label (MTML) model. Using powerful user-item interaction features, it predicts the probability of different user engagement events, such as clicks and likes. The system calculates the expected value of each candidate by combining these probabilities and assigns weights to different engagement metrics.
4. Final reranking:
In the last stage, the system fine-tunes the recommendations based on various rules and criteria, such as content integrity and diversity. This ensures that the final recommendations are engaging, high-quality, and diverse. Several techniques, including filtering, downranking, and shuffling, help achieve the desired results.
These four stages work together to offer personalized and engaging content recommendations to millions of users on Instagram's Explore platform daily.
Find more details on the implementation here.
While Instagram uses ML to find and reward good content, LinkedIn showcases a way to find and penalize harmful content.
LinkedIn’s Content Moderation Engine
LinkedIn receives hundreds of thousands of items in the content review queue every week for human review. Content that potentially violates their Professional Community Policies goes into this queue for assessment by human reviewers and automated systems.
AI Models for Content Scoring
LinkedIn uses a set of XGBoost machine learning models to predict the probability of a piece of content being violative or clear. These models are trained on past human-labeled data and use real-time signals to make predictions. They are designed to maintain high precision while detecting policy-violating content efficiently.
Model Hosting and Decision Layer
The machine learning models are hosted and made available for consumption through a dedicated scoring layer built on top of ProML, LinkedIn's machine learning productivity platform. This layer fetches the required features from different data sources, runs inference on the models, and prepares the results for the content review queue.
Intelligent Review Queue Prioritization
The content review queue uses the AI model scores to intelligently prioritize the review of items. High priority is given to items with a greater probability of being policy-violating, while low priority is assigned to items that are likely non-violative. This dynamic prioritization helps human reviewers focus on content that needs immediate attention.
Continuous Score Updates
The AI model scores for each item in the review queue are updated continuously as new member reports are made. This ensures that the final decision on prioritization is based on the sum total of the latest information available rather than just a single point in time.
Their new content review prioritization framework reduces the burden on human reviewers by making auto-decisions on around 10% of all queued content at high precision levels. It also reduces the average time taken to detect policy-violating content by about 60%, resulting in fewer unique members being exposed to violative content.
Find more details here.