Bringing AI agents to life – so they can truly perceive, listen, understand, and engage in a deeply human way.
Human conversations have changed the world since the beginning of time – preventing wars, inspiring revolutions, and sparking love. We’re bringing the power, magic, and ease of human conversation to human-to-machine interaction.
Last year, we introduced the world’s fastest Conversational Video Interface (CVI), which allowed developers to build incredible real-time conversational video experiences like celebrity digital twins at Delphi or AI interviewers at Mercor.
Today, we’re taking it even further.
Introducing the next evolution of CVI – a complete operating system that is emotionally intelligent. It lets you build AI Agents that truly see, listen, understand, and engage in real-time, face-to-face interactions, all powered by our family of new groundbreaking models.
Phoenix-3, Raven-0, and Sparrow-0—work together to make AI video conversations feel truly alive. Combining human-like perception, with a transformer-based turn-taking engine, and full-face rendering, the system allows developers to build the next level of engaging video interactions that feel like talking to a human.
CVI is more than just a tool—it’s a new way for humans and AI to communicate. Whether it’s assisting in a doctor’s office, guiding a mental health conversation, roleplaying sales scenarios, or elevating customer service, the use cases are infinite. Once you see it in action, it’s clear: AI isn’t just responding anymore, it’s thinking, reacting, and changing how we work and interact with machines.
Try talking to Charlie, our live demo on the homepage to get a feel for what you can build.
A family of models that gets the art of conversation
Real conversation is more than an exchange of words – it’s presence, timing, and unspoken meaning. Each of our three new models plays a critical role in making AI interactions feel human.

Phoenix-3 beta: Full-face rendering model
Phoenix-3 is our groundbreaking Gaussian-diffusion rendering model that brings human-like expressiveness to digital interactions. Unlike traditional systems that focus solely on lip movements, Phoenix-3 animates the entire face, eyebrows, cheeks, eyes, and mouth, capturing the full range of human expressions.
- Full-face animation: Generates natural, continuous facial movements, ensuring every micro-expression and muscle movement is authentically represented.
- Dynamic emotion control: Adapts expressions in real-time based on conversational context, allowing for both automatic emotional responses and explicit emotion settings.
- Hyper-realistic expressions: Ensures that facial expressions align naturally with speech patterns, creating fluid and engaging interactions.
By focusing on the intricate details like moving from neutral to happy, Phoenix-3 delivers a significantly more immersive and realistic user experience, making digital interactions feel genuinely human.
Raven-0: Perception model
Raven-0 is a first of its kind perception system that doesn’t just see—it understands. Unlike traditional vision systems that recognize static objects and ‘discrete’ emotions, Raven-0 processes continuous visual input, tracks movement, and interprets human interactions in real time. This enables AI to perceive context, intent, and emotion, making digital interactions more natural and intuitive.
- Continuous visual processing: Tracks motion, gestures, and eye contact dynamically, allowing AI to respond with real-time awareness
- Emotional intelligence: Reads facial expressions, micro-reactions, and body language to detect user sentiment and engagement
- Action monitoring: Watches for specific gestures, objects, or behaviors, triggering custom actions or automated responses in real time
- Multi-channel awareness: Tracks multiple participants (coming soon), screens, and background elements for a comprehensive understanding
With Raven-0, AI gains true situational awareness and emotional intelligence, making interactions more fluid, responsive, and human-like.
Sparrow-0: Turn-taking model
Conversations aren’t just about what’s said—they’re about when to speak and when to listen. Traditional AI often interrupts at the wrong moment or leaves awkward pauses, making interactions feel unnatural.
Sparrow-0 changes that. Built with a transformer-based turn-taking engine, it understands rhythm, intent, and pacing, ensuring seamless, human-like dialogue. Instead of simply detecting silence, Sparrow-0 adapts to conversational flow in real time, responding naturally and never cutting in at the wrong moment.
- Conversational awareness: Detects tone, pacing, and semantic meaning to determine the perfect response timing
- Turn sensitivity & control: Captures subtle cues in human speech, respecting pauses and adapting to different conversation styles dynamically and manually
- Actionable timing intelligence: Dynamically adjusts response latency based on speech patterns, making AI feel more human
- Optimized for speed: Delivers sub-600ms response times, ensuring real-time, uninterrupted conversations
With Sparrow-0, AI no longer just reacts—it listens, waits, and responds at the right moment, making every interaction feel natural and effortless.
Experience CVI in action: Meet Charlie
In our demo, you’ll meet Charlie, an AI agent that feels less like a chatbot and more like a friend you just met. Unlike typical AI assistants, Charlie doesn’t just execute tasks; he engages in thoughtful, lifelike dialogue, understanding context, intent, and nuance. Whether you’re debugging code, strategizing your next chess move, or refining your fashion style, he doesn’t just offer quick answers, he collaborates, reasons, and problem-solves with you in real time.
With the ability to search the internet, analyze your screen, and generate images seamlessly, Charlie is deeply interactive, responding to what you see and do. And for those who want to see the magic behind the curtain, dev mode logs every event and interaction between you and Charlie, serving as a blueprint for extending the Tavus interaction layer, enabling agentic actions, function calling, and real-world utility beyond just conversation.

Get building
With a simple API, developers can embed real-time, emotionally intelligent AI assistants into their applications in minutes. Built for low-latency, real-time video, it supports natural conversation flow, emotional adaptability, and full-face rendering out of the box. Whether for AI-powered coaching, customer support, or interactive sales training, Tavus CVI makes building human-like AI effortless.
The future of AI conversations starts now.
AI isn’t just responding anymore. It’s perceiving, reasoning, and evolving. For the first time, AI video conversations feel human. And this is just the beginning.
Try the demo. Start building. Welcome to the next era of AI interaction.