JXP-Wan2.6 Key Features

Reference-Based Identity & Voice Consistency

Generate videos from photos or reference clips while preserving the visual identity and voice of people or characters across multiple shots and scenes.

Intelligent Multi-Shot Storytelling

Understand shot-level or natural language prompts to automatically schedule and assemble multiple shots into a coherent, cinematic 1–15s sequence with stable scene/character continuity.

Native Audio-Visual Synchronization & Realistic Voices

Produce high-quality native audio, music, and sound effects with precise lip-sync and stable multi-person dialogue for natural, expressive results.

1080P Cinematic Output & Flexible Formats

Export up to 15-second videos in 1080P with multiple aspect ratios (16:9, 9:16, 1:1) and formats (MP4, MOV, WebM) optimized for platforms like YouTube, TikTok, and Instagram.

Scalable Model Options & Commercial Rights

Choose between a high-performance 14B model and an efficient 5B model for consumer-grade GPUs; all outputs include commercial usage rights for ads, broadcasts, and products.