Introducing the evolution of Conversational Video Interface – now with Emotional Intelligence

Julia Szatar

•

min read

•

June 18, 2025

Table of Contents

Example H2

Bringing AI agents to life – so they can truly perceive, listen, understand, and engage in a deeply human way.

Human conversations have changed the world since the beginning of time – preventing wars, inspiring revolutions, and sparking love. We’re bringing the power, magic, and ease of human conversation to human-to-machine interaction.

Last year, we introduced the world’s fastest Conversational Video Interface (CVI), which allowed developers to build incredible real-time conversational video experiences like celebrity digital twins at Delphi or AI interviewers at Mercor.

Today, we’re taking it even further.

Introducing the next evolution of CVI – a complete operating system that is emotionally intelligent. It lets you build AI Agents that truly see, listen, understand, and engage in real-time, face-to-face interactions, all powered by our family of new groundbreaking models.

Phoenix-3, Raven-0, and Sparrow-0—work together to make AI video conversations feel truly alive. Combining human-like perception, with a transformer-based turn-taking engine, and full-face rendering, the system allows developers to build the next level of engaging video interactions that feel like talking to a human.

CVI is more than just a tool—it’s a new way for humans and AI to communicate. Whether it’s assisting in a doctor’s office, guiding a mental health conversation, roleplaying sales scenarios, or elevating customer service, the use cases are infinite. Once you see it in action, it’s clear: AI isn’t just responding anymore, it’s thinking, reacting, and changing how we work and interact with machines.

Try talking to Charlie, our live demo on the homepage to get a feel for what you can build.

‍

A family of models that gets the art of conversation

Real conversation is more than an exchange of words – it’s presence, timing, and unspoken meaning. Each of our three new models plays a critical role in making AI interactions feel human.

‍

Phoenix-3 beta: Full-face rendering model

Phoenix-3 is our groundbreaking Gaussian-diffusion rendering model that brings human-like expressiveness to digital interactions. Unlike traditional systems that focus solely on lip movements, Phoenix-3 animates the entire face, eyebrows, cheeks, eyes, and mouth, capturing the full range of human expressions.

Full-face animation: Generates natural, continuous facial movements, ensuring every micro-expression and muscle movement is authentically represented.
Dynamic emotion control: Adapts expressions in real-time based on conversational context, allowing for both automatic emotional responses and explicit emotion settings.
Hyper-realistic expressions: Ensures that facial expressions align naturally with speech patterns, creating fluid and engaging interactions.

By focusing on the intricate details like moving from neutral to happy, Phoenix-3 delivers a significantly more immersive and realistic user experience, making digital interactions feel genuinely human.

Raven-0: Perception model

Raven-0 is a first of its kind perception system that doesn’t just see—it understands. Unlike traditional vision systems that recognize static objects and ‘discrete’ emotions, Raven-0 processes continuous visual input, tracks movement, and interprets human interactions in real time. This enables AI to perceive context, intent, and emotion, making digital interactions more natural and intuitive.

Continuous visual processing: Tracks motion, gestures, and eye contact dynamically, allowing AI to respond with real-time awareness
Emotional intelligence: Reads facial expressions, micro-reactions, and body language to detect user sentiment and engagement
Action monitoring: Watches for specific gestures, objects, or behaviors, triggering custom actions or automated responses in real time
Multi-channel awareness: Tracks multiple participants (coming soon), screens, and background elements for a comprehensive understanding

With Raven-0, AI gains true situational awareness and emotional intelligence, making interactions more fluid, responsive, and human-like.

Sparrow-0: Turn-taking model

Conversations aren’t just about what’s said—they’re about when to speak and when to listen. Traditional AI often interrupts at the wrong moment or leaves awkward pauses, making interactions feel unnatural.

Sparrow-0 changes that. Built with a transformer-based turn-taking engine, it understands rhythm, intent, and pacing, ensuring seamless, human-like dialogue. Instead of simply detecting silence, Sparrow-0 adapts to conversational flow in real time, responding naturally and never cutting in at the wrong moment.

Conversational awareness: Detects tone, pacing, and semantic meaning to determine the perfect response timing
Turn sensitivity & control: Captures subtle cues in human speech, respecting pauses and adapting to different conversation styles dynamically and manually
Actionable timing intelligence: Dynamically adjusts response latency based on speech patterns, making AI feel more human
Optimized for speed: Delivers sub-600ms response times, ensuring real-time, uninterrupted conversations

With Sparrow-0, AI no longer just reacts—it listens, waits, and responds at the right moment, making every interaction feel natural and effortless.

‍

Experience CVI in action: Meet Charlie

In our demo, you’ll meet Charlie, an AI agent that feels less like a chatbot and more like a friend you just met. Unlike typical AI assistants, Charlie doesn’t just execute tasks; he engages in thoughtful, lifelike dialogue, understanding context, intent, and nuance. Whether you’re debugging code, strategizing your next chess move, or refining your fashion style, he doesn’t just offer quick answers, he collaborates, reasons, and problem-solves with you in real time.

With the ability to search the internet, analyze your screen, and generate images seamlessly, Charlie is deeply interactive, responding to what you see and do. And for those who want to see the magic behind the curtain, dev mode logs every event and interaction between you and Charlie, serving as a blueprint for extending the Tavus interaction layer, enabling agentic actions, function calling, and real-world utility beyond just conversation.

Get building

With a simple API, developers can embed real-time, emotionally intelligent AI assistants into their applications in minutes. Built for low-latency, real-time video, it supports natural conversation flow, emotional adaptability, and full-face rendering out of the box. Whether for AI-powered coaching, customer support, or interactive sales training, Tavus CVI makes building human-like AI effortless.

The future of AI conversations starts now.

AI isn’t just responding anymore. It’s perceiving, reasoning, and evolving. For the first time, AI video conversations feel human. And this is just the beginning.

Try the demo. Start building. Welcome to the next era of AI interaction.

Product

Hummingbird-0: Advancing Zero-Shot Lip Synchronization in AI-Generated Video

Conversational AI video APIs

Build immersive AI-generated video experiences in your application

Get a Demo

Introducing Hummingbird-0: A Leap in Lip Sync

Introducing AI Santa: The Magic of Christmas, Powered by Tavus

VEED Leaps Ahead in AI Avatar Development by Leveraging Tavus APIs

The (Tavus) Hackathon Cookbook

Building Real-Time AI Video Agents with LiveKit and Tavus (NEW)

Hummingbird-0: Advancing Zero-Shot Lip Synchronization in AI-Generated Video

Conversational AI video APIs