Wan 2.6 video Key Features
Multimodal Reference Generation
Generate videos from text, images, audio, or short reference clips (up to 5s) and preserve appearance and voice characteristics for consistent characters and performances.
Native Audio-Visual Synchronization & Lip-Sync
Built-in audio-visual alignment with precise lip-sync for dialogue, natural voice expressions, and improved music/singing rendering for realistic spoken and sung performances.
Intelligent Multi-Shot Scheduling
Understands natural language and professional shot-breakdown prompts to compose multi-shot narratives within a single 15-second sequence, maintaining visual consistency across shots.
High-Quality Short-Form Output & Flexible Exports
Produces cinematic 15-second 1080p HD videos at 24fps and exports to MP4, MOV, or WebM in 16:9, 9:16, or 1:1 aspect ratios — optimized for TikTok, Reels, Shorts and ad platforms.