Chatwit | Synchronized Codelab

AI Chatbot for Smarter Website Engagement

Project Overview

Chatwit.ai is an AI chatbot application where you worked as a Senior Backend Engineer. Key features include website scraping, embedding generation using pre-trained transformer models, and vector store implementation (e.g., FAISS) for fast similarity searches. It features a Query and Retrieval system powered by a RAG Pipeline with models like GPT-3 and GPT-4 for human-like responses.

Industry : Technology, Information and Internet

Location : Ahmedabad

Year : 2023

Technologies Used

RAG Pipeline

AI Chatbot

LLM Integration

AI Backend

Node.js

React Native

Django REST Framework

PostgreSQL

Express.js

Challenges

Dynamic website scraping failed on JavaScript heavy pages causing incomplete data extraction. Embedding generation overloaded CPU during high volume processing without GPU acceleration. FAISS vector store queries slowed down as database grew beyond 1M embeddings. RAG pipeline latency spikes broke real-time chat responsiveness under concurrent users.

Puppeteer timeouts and anti-bot detection blocking 30% of scrape attempts.
Transformer models (BERT) memory exhaustion during batch embedding jobs.
Vector search accuracy dropped with noisy scraped data; irrelevant responses.
GPT-4 API rate limits and costs exploding during peak query volumes.

Solutions

1. DYNAMIC SCRAPING OPTIMIZATION

Traditional scraping pipelines were failing against modern anti-bot systems, resulting in low success rates. Implemented a hybrid Puppeteer + Playwright architecture with stealth plugins, along with retry logic and proxy rotation. This increased scraping reliability from 70% to 98% across protected targets.

2. EMBEDDING ACCELERATION

Embedding generation using BERT and Sentence Transformers was too slow for production workloads. Migrated to GPU-accelerated inference using ONNX Runtime and introduced batch processing queues. Reduced embedding latency from 5 seconds to 500 milliseconds per document, significantly improving throughput.

3. RAG PIPELINE EFFICIENCY

High LLM usage costs and latency were impacting scalability. Implemented LangChain caching and query rewriting to reduce redundant GPT-4 calls by 60%. Additionally, introduced async processing using BullMQ to decouple retrieval from generation, ensuring real-time response performance.

Results

Achieved 98% scraping success rate with hybrid Puppeteer/Playwright setup bypassing anti-bot measures across dynamic sites. Reduced embedding generation time from 5s to 500ms per document using GPU accelerated ONNX inference and batch queues. Scaled FAISS to handle 10M+ embeddings with sub 50ms query latency via sharded HNSW indexing and periodic reindexing. Delivered real-time RAG responses under 1s end-to-end by cutting GPT 4 calls 60% through LangChain caching, async BullMQ processing, and distilled models boosting concurrent users from 500 to 5,000 while slashing API costs 70%. Overall system accuracy hit 92% with 4x embedding compression maintaining response relevance for production chatbot deployment.