Research preview

Hummingbird, a leap in lip sync

Hummingbird-0 delivers unmatched lip sync accuracy, identity preservation, and video quality.

Original

Lip Sync

Sound: off
Sound: on

Hummingbird-0

This best-in-class zero-shot lip sync model, available in a research preview via API, outperforms all open and closed-source models on the market.

As a research artifact of our Phoenix-3 full-face replica rendering model, it delivers market-leading realism instantly. Hummingbird can power video editing workflows and high-volume video creation.

Fal

Hummingbird-0 is available to developers on the Fal platform.

Build apps with lip syncing, that's actually good

Instant Video Creation

This zero-shot model generates realistic lip movement for any face and voice, no training required. Great for influencers and UGC.

Content at Scale

Turn one video into thousands of versions with fresh lip synced audio, ready for marketing, training, and localization at scale.

Video Editing

Build editing work flows into any video platform. Users can edit existing footage of dialogue avoiding reshoots, or heavy post-production.

Integrate with Video Generation

Create an end-to-end AI film studio platform. Enrich videos generated by Sora, Veo, and Kling with lip synced dialogue.

Best-in-class performace

Hummingbird outperforms rivals in key evaluations, and delivers the most natural lip syncing on the market.

Natural Lip Synchronization

Lips move naturally with each sound—no awkward delays —so it actually feels like the person is speaking. This realism drives engagement, especially in personalized or localized content.

Exceptional Identity Preservation

Faces and speaking style stay true to the original speaker. Videos look authentic and personal, not uncanny or off-brand, keeping viewers engaged.

Superior Visual Quality

Every frame looks sharp, natural, and glitch-free—so viewers stay focused on the message. Videos feel polished and real, even at scale.

Bring lip sync to every use case with our APIs

Ai Reshoots

CGI Editing

B2B Content

AI workflows

Localization

Influencer UGC

Hummingbird-0 benchmarking
Read the research
Model
Each column compares performance across Hummingbird-0 and leading competitors using the same diverse set of benchmark videos and metrics.
Hummingbird
Leading Competitor 1
Leading Competitor 2
Lip Sync 

LSE-D scores
(lower is better)
Measures how closely the model aligns mouth movements to spoken audio. Lower scores indicate more accurate and natural sync.
6.7365
7.0446
7.4605
Identity Preservation

Arcface scores
(higher is better)
Evaluates how well the model maintains the speaker’s facial features and overall look throughout the video. Higher is better.
0.8352
0.7834
0.3356
Visual Quality 

FID scores
(lower is better)
Assesses overall image quality and realism, including presence of visual artifacts. Lower scores mean cleaner, more photorealistic outputs.
63.9248
95.6702
133.5371

Unleash easy-to-use
lip sync APIs