Notion Scales Vector Search 10x, Cuts Costs 90% in Two Years

Details

Notion launched AI Q&A in November 2023 using vector search for semantic retrieval, scaling infrastructure 10x to millions of workspaces while reducing costs 90% by February 2026.
Involved Notion engineering team, initial pod clusters, serverless migration, Turbopuffer vector DB, Ray on Anyscale for embeddings, and tools like Apache Spark, Kafka, Airflow.
Key upgrades include serverless decoupling storage/compute (50% savings May 2024), Turbopuffer migration (60% search cost cut, latency improved to 50-70ms p50), Page State hashing for 70% data reduction, Ray for unified GPU/CPU pipelines and self-hosted models.
From launch bottlenecks (hundreds/day onboarding) to 600x capacity increase by April 2024 clearing waitlist; shifted from sharded pods to generation-based indexing then simplified architectures, unlike prior Postgres re-sharding.
Turbopuffer uses object storage for cost-efficiency; Ray enables open-source model flexibility without API dependencies, promising 90+% embeddings cost reduction; supports integrations like Slack, Google Drive.

Impact

Notion's optimizations set a benchmark for AI infrastructure scaling, achieving massive efficiency gains without dedicated ML teams via managed services like Anyscale and Turbopuffer. This enables broader AI adoption in productivity tools, pressuring competitors like OpenAI or Google Workspace to match cost-performance in RAG systems. Future model agility positions Notion for rapid iteration amid embedding advances.