Product

Introducing the evolution of Conversational Video Interface – now with Emotional Intelligence

By
Julia Szatar
5 min
min read
March 6, 2025
Table of Contents
Contributors
Build AI video with Tavus APIs
Get Started Free
Share
Bringing AI agents to life – so they can truly perceive, listen, understand, and engage in a deeply human way. 

Human conversations have changed the world since the beginning of time – preventing wars, inspiring revolutions, and sparking love. We’re bringing the power, magic, and ease of human conversation to human-to-machine interaction.

Last year, we introduced the world’s fastest Conversational Video Interface (CVI), which allowed developers to build incredible real-time conversational video experiences like celebrity digital twins at Delphi or AI interviewers at Mercor.  

Today, we’re taking it even further.

Introducing the next evolution of CVI – a complete operating system that is emotionally intelligent. It lets you build AI Agents that truly see, listen, understand, and engage in real-time, face-to-face interactions, all powered by our family of new groundbreaking models.

Phoenix-3, Raven-0, and Sparrow-0—work together to make AI video conversations feel truly alive. Combining human-like perception, with a transformer-based turn-taking engine, and full-face rendering, the system allows developers to build the next level of engaging video interactions that feel like talking to a human. 

CVI is more than just a tool—it’s a new way for humans and AI to communicate. Whether it’s assisting in a doctor’s office, guiding a mental health conversation, roleplaying sales scenarios, or elevating customer service, the use cases are infinite. Once you see it in action, it’s clear: AI isn’t just responding anymore, it’s thinking, reacting, and changing how we work and interact with machines.

Try talking to Charlie, our live demo on the homepage to get a feel for what you can build.

A family of models that gets the art of conversation

Real conversation is more than an exchange of words – it’s presence, timing, and unspoken meaning. Each of our three new models plays a critical role in making AI interactions feel human.

Phoenix-3 beta: Full-face rendering model

Phoenix-3 is our groundbreaking Gaussian-diffusion rendering model that brings human-like expressiveness to digital interactions. Unlike traditional systems that focus solely on lip movements, Phoenix-3 animates the entire face, eyebrows, cheeks, eyes, and mouth, capturing the full range of human expressions.

  • Full-face animation: Generates natural, continuous facial movements, ensuring every micro-expression and muscle movement is authentically represented.
  • Dynamic emotion control: Adapts expressions in real-time based on conversational context, allowing for both automatic emotional responses and explicit emotion settings.
  • Hyper-realistic expressions: Ensures that facial expressions align naturally with speech patterns, creating fluid and engaging interactions.

By focusing on the intricate details like moving from neutral to happy, Phoenix-3 delivers a significantly more immersive and realistic user experience, making digital interactions feel genuinely human.

Raven-0: Perception model

Raven-0 is a first of its kind perception system that doesn’t just see—it understands. Unlike traditional vision systems that recognize static objects and ‘discrete’ emotions, Raven-0 processes continuous visual input, tracks movement, and interprets human interactions in real time. This enables AI to perceive context, intent, and emotion, making digital interactions more natural and intuitive.

  • Continuous visual processing: Tracks motion, gestures, and eye contact dynamically, allowing AI to respond with real-time awareness
  • Emotional intelligence: Reads facial expressions, micro-reactions, and body language to detect user sentiment and engagement
  • Action monitoring: Watches for specific gestures, objects, or behaviors, triggering custom actions or automated responses in real time
  • Multi-channel awareness: Tracks multiple participants (coming soon), screens, and background elements for a comprehensive understanding

With Raven-0, AI gains true situational awareness and emotional intelligence, making interactions more fluid, responsive, and human-like.

Sparrow-0: Turn-taking model

Conversations aren’t just about what’s said—they’re about when to speak and when to listen. Traditional AI often interrupts at the wrong moment or leaves awkward pauses, making interactions feel unnatural.

Sparrow-0 changes that. Built with a transformer-based turn-taking engine, it understands rhythm, intent, and pacing, ensuring seamless, human-like dialogue. Instead of simply detecting silence, Sparrow-0 adapts to conversational flow in real time, responding naturally and never cutting in at the wrong moment.

  • Conversational awareness: Detects tone, pacing, and semantic meaning to determine the perfect response timing
  • Turn sensitivity & control: Captures subtle cues in human speech, respecting pauses and adapting to different conversation styles dynamically and manually
  • Actionable timing intelligence: Dynamically adjusts response latency based on speech patterns, making AI feel more human
  • Optimized for speed: Delivers sub-600ms response times, ensuring real-time, uninterrupted conversations

With Sparrow-0, AI no longer just reacts—it listens, waits, and responds at the right moment, making every interaction feel natural and effortless.

Experience CVI in action: Meet Charlie

In our demo, you’ll meet Charlie, an AI agent that feels less like a chatbot and more like a friend you just met. Unlike typical AI assistants, Charlie doesn’t just execute tasks; he engages in thoughtful, lifelike dialogue, understanding context, intent, and nuance. Whether you’re debugging code, strategizing your next chess move, or refining your fashion style, he doesn’t just offer quick answers, he collaborates, reasons, and problem-solves with you in real time. 

With the ability to search the internet, analyze your screen, and generate images seamlessly, Charlie is deeply interactive, responding to what you see and do. And for those who want to see the magic behind the curtain, dev mode logs every event and interaction between you and Charlie, serving as a blueprint for extending the Tavus interaction layer, enabling agentic actions, function calling, and real-world utility beyond just conversation.

Get building 

With a simple API, developers can embed real-time, emotionally intelligent AI assistants into their applications in minutes. Built for low-latency, real-time video, it supports natural conversation flow, emotional adaptability, and full-face rendering out of the box. Whether for AI-powered coaching, customer support, or interactive sales training, Tavus CVI makes building human-like AI effortless.

The future of AI conversations starts now.

AI isn’t just responding anymore. It’s perceiving, reasoning, and evolving. For the first time, AI video conversations feel human. And this is just the beginning.

Try the demo. Start building. Welcome to the next era of AI interaction.

Research initiatives

The team is at the forefront of AI video research and pushes model updates every two weeks based on the latest research and customer needs.

Product
5 min
min read
This is some text inside of a div block.
min read

Introducing the evolution of Conversational Video Interface – now with Emotional Intelligence

Introducing our new family of state-of-the-art AI models: Phoenix-3, Raven-0, and Sparrow-0. Together they bring Conversational Video Interfaces (CVI) to the next level, and power Charlie, our new demo persona.
Product
min read
This is some text inside of a div block.
min read

Introducing AI Santa: The Magic of Christmas, Powered by Tavus

What if Santa could talk to everyone all at once? Now He Can! We are thrilled to announce AI Santa by Tavus, the world’s first fully AI-powered replica of the man in the red suit.
Product
5
min read
This is some text inside of a div block.
min read

What Makes Conversational AI Human Like?

The Tavus team leveraged multiple techniques to get life-like conversational interface with an AI. In particular, streaming, parallelism, resource management, and memory management. This combination helped reduce a long utterance to utterance time down to under a second.
Product
5 min
min read
This is some text inside of a div block.
min read

Introducing the evolution of Conversational Video Interface – now with Emotional Intelligence

Introducing our new family of state-of-the-art AI models: Phoenix-3, Raven-0, and Sparrow-0. Together they bring Conversational Video Interfaces (CVI) to the next level, and power Charlie, our new demo persona.
Industry
min read
This is some text inside of a div block.
min read

Introducing: The world's fastest Conversational Video Interface for developers

Humanize digital interactions with real-time interactive digital twins that can speak, see, and hear.
Industry
min read
This is some text inside of a div block.
min read

11+ Best AI Video Chat APIs [2025]

Learn about the top AI video chat videos API with Tavus and how you can implement AI video chat into your tech stack.

AI video APIs for digital twins

Build immersive AI-generated video experiences in your application