Video diffusion, but feels like image diffusion
FramePack is a next-frame prediction neural network structure that generates videos progressively. With just 6GB VRAM, you can create high-quality one-minute videos.
Only 6GB VRAM needed to generate one-minute videos (1800 frames), perfect for laptop GPUs
Compresses input context to a fixed length, making generation workload invariant to video length
See generated frames in real-time before the entire video is complete
Control video actions and scene changes with concise, descriptive prompts
FramePack is designed to work efficiently even on laptop GPUs, making video generation accessible to more users. The progressive generation approach provides visual feedback before the entire video is complete.
update.bat
to update (Important! Otherwise you may be using an older version with potential bugs)run.bat
to start the program# We recommend using an independent Python 3.10 environment
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
pip install -r requirements.txt
# Start the GUI
python demo_gradio.py
Supports --share
, --port
, --server
, and other parameters
Concise prompts typically work best. Follow this format:
The man dances powerfully, striking sharp poses and gliding smoothly across the reflective floor.
Prompt: "The girl dances gracefully, with clear movements, full of charm."
Prompt: "The girl skateboarding, repeating the endless spinning and dancing and jumping on a skateboard, with clear movements, full of charm."
Prompt: "The young man writes intensely, flipping papers and adjusting his glasses with swift, focused movements."
FramePack is efficient. A minimum of 6GB VRAM on an NVIDIA RTX 30XX series or newer GPU is required. Faster GPUs like the RTX 4090 provide better performance (~1.5-2.5 sec/frame). It's designed to work well even on consumer laptops.
Yes, FramePack is flexible and supports generating videos at various resolutions. Higher resolutions will require more GPU memory and take longer, but the efficient design makes it feasible on standard hardware.
FramePack stands out due to its memory efficiency (runs on 6GB VRAM) and quality consistency over long sequences. Unlike many models that degrade quickly, FramePack maintains quality using anti-drifting techniques and offers stable performance regardless of video length (O(1) complexity).
While highly efficient, FramePack isn't designed for true real-time generation at full quality yet. However, with optimizations like teacache on high-end GPUs (e.g., RTX 4090), it achieves near-real-time speeds (~1.5 sec/frame), making it usable for applications where slight latency is acceptable.
Yes, FramePack is designed for interoperability. It can be integrated into existing AI workflows, combined with image generation tools, video editors, or used via its adaptable API for custom applications.
FramePack excels at generating consistent, long-form video content from images or existing videos. It's ideal for narrative sequences, educational material, visual effects, and creative projects where maintaining visual quality over time is crucial. Its flexible scheduling adapts to various needs.
Just one image and a simple prompt to generate smooth, high-quality video content
Get Started with FramePack