Rili | Synchronized Codelab

Talk to Celebrities Powered by AI

Project Overview

Rili AI is an AI-powered social platform enabling users to interact with celebrities through voice, video, and chat experiences. The project focused on building scalable APIs for chat, video, and AI features while integrating text, voice, and video AI services with independent module scaling architecture.

Industry : AI Social Media / Celebrity Interaction

Location : Belgium

Year : 2024

Technologies Used

Python

AI Celeb

AI Engine

AI Services

Django

React Native

Django Rest Framerwork

PostgreSQL

Modular APIs

Challenges

Faced integration issues with multiple AI services for text, voice, and video features. Complex API orchestration creating latency bottlenecks during celebrity interactions. Scaling independent modules while maintaining real-time responsiveness across user base. Ensuring seamless AI processing without service outages during peak celebrity live sessions.

API response delays breaking real-time chat/video experiences.
Manual AI service coordination for every interaction type error-prone.
Frequent deployments causing module conflicts and feature downtime.
Supporting independent scaling across chat, voice, video AI modules simultaneously.

Solutions

1. MICROSERVICES ARCHITECTURE

Broke down the monolith into independent services for chat, voice, and video AI modules. Each service scales separately based on demand—video peaks during live sessions while chat handles steady traffic. Enabled fault isolation so a failure in one module does not impact others.

2. API GATEWAY ORCHESTRATION

Implemented a single entry point to route requests intelligently to the appropriate microservice. Introduced caching for frequent responses and rate limiting, reducing end-to-end latency by up to 70%. Handles authentication, logging, and payload transformation for seamless AI service integration.

3. KUBERNETES AUTOMATION

Used Kubernetes for container orchestration to automate deployments, scaling, and rollouts. Horizontal Pod Autoscaler (HPA) scales pods during traffic spikes, while rolling updates ensure zero downtime. Integrated with CI/CD pipelines for rapid and reliable releases across all modules.

Results

Achieved 70% latency reduction with end to end response times dropping from 2.5s to 750ms during peak celebrity interactions. Delivered 99.9% uptime through Kubernetes automation and Istio handling 10x traffic spikes without outages. Unlocked 50% cost savings via independent scaling and Redis caching video modules only scaled during live sessions. Key metrics showed 5,000 concurrent users (up from 800), <0.1% error rate (down from 12%), 2x faster AI processing via async queues, and seamless support for 100k+ monthly active users.