Generating, Fast and Slow: Scalable Parallel Video Generation with Video Interface Networks
Bhishma Dedhia1,2,
David Bourgin2,
Krishna Kumar Singh2,
Yuheng Li2,
Yan Kang2,
Zhan Xu2,
Niraj K. Jha1,
Yuchen Liu2
1Princeton University, 2Adobe Research
Main Paper
Figure 1 (b)
VINs
256 frames@16fps
Fluffy Persian cat in pearl necklace, studio, burgundy background, portrait
VINs
256 frames@16fps
Back view of a young woman dressed in a yellow jacket, walking in the forest
VINs
256 frames@16fps
Shark swimming in the clear Carribean ocean
Figure 1 (c)
A fat rabbit wearing a purple robe walking through a fantasy landscape
Full Attention
256 frames@16fps
Autoregressive
256 frames@16fps
VINs
256 frames@16fps
Figure 6
Two Pandas Discussing an Academic Paper
Autoregressive
256 frames@16fps
Spectral Blending
256 frames@16fps
VINs
256 frames@16fps
Happy dog wearing a yellow turtleneck, studio, potrait
Autoregressive
256 frames@16fps
Spectral Blending
256 frames@16fps
VINs
256 frames@16fps
Figure 7
A drone flying over a snow forest
Autoregressive
256 frames@16fps
Spectral Blending
256 frames@16fps
VINs
256 frames@16fps
Cherry blossoms swing in front of ocean view
Autoregressive
256 frames@12fps
Spectral Blending
256 frames@12fps
VINs
256 frames@16fps
Figure 9
VINs
128 frames@24fps
Bird's eye view of dense fir forests and a transparent lake in the middle in Dolomites
VINs
192 frames@24fps
A glass bead falling into water with a huge splash. Sunset in the background
VINs
256 frames@16fps
Happy Corgi playing in the park, golden hour, 4K
Supplementary/Appendix
Figure 14
Fluffy Persian cat in pearl necklace, studio, burgundy background, portrait
OpenSora v1.2
256 frames
Mochi-1
256 frames
HunyuanVideo
256 frames
VINs
256 frames
Happy Corgi playing in the park, golden hour, 4K
OpenSora v1.2
256 frames
Mochi-1
256 frames
HunyuanVideo
256 frames
VINs
256 frames
Figure 15
Shark swimming in the clear Carribean ocean
OpenSora v1.2
Mochi-1
HunyuanVideo
VINs
256 frames
Back view of a young woman dressed in a yellow jacket, walking in the forest
OpenSora v1.2
Mochi-1
HunyuanVideo
VINs
Figure 16
Confused grizzly bear trying to learn calculus
Full
256 frames@16fps
Autoreg.
256 frames@12fps
ST2V
256 frames@8fps
FreeNoise
256 frames@16fps
Spectral Blending
256 frames@12fps
VINs
256 frames@16fps
Back view of a young woman dressed in a yellow jacket walking in the forest
Full
256 frames@16fps
Autoreg.
256 frames@12fps
ST2V
256 frames@8fps
Freenoise
256 frames@16fps
Spectral Blending
256 frames@12fps
VINs
256 frames@16fps
Figure 17
A raccoon dressed in suit playing the trumpet
Full
256 frames@16fps
Autoreg.
256 frames@12fps
ST2V
256 frames@8fps
Freenoise
256 frames@16fps
Spectral Blending
256 frames@12fps
VINs
256 frames@16fps
A dog wearing a superhero outfit with red cape flying through the sky
Full
256 frames@16fps
Autoreg.
256 frames@12fps
ST2V
256 frames@8fps
Freenoise
256 frames@16fps
Spectral Blending
256 frames@12fps
VINs
256 frames@16fps
Figure 18
A Shark swimming in the clear Carribean ocean
Full
256 frames@16fps
Autoreg.
256 frames@12fps
ST2V
256 frames@8fps
Freenoise
256 frames@16fps
Spectral Blending
256 frames@12fps
VINs
256 frames@16fps
A fat rabbit wearing a purple robe walking through a fantasy landscape
Full
256 frames@16fps
Autoreg.
256 frames@16fps
ST2V
256 frames@8fps
""
Freenoise
256 frames@16fps
Spectral Blending
256 frames@12fps
VINs
256 frames@16fps
Figure 19
A drone flying over a snowy forest
Full
256 frames@16fps
Autoreg.
256 frames@16fps
ST2V
256 frames@8fps
Freenoise
256 frames@16fps
Spectral Blending
256 frames@16fps
VINs
256 frames@16fps
Yellow flowers swinging in the wild
Full
256 frames@16fps
Autoreg.
256 frames@12fps
ST2V
256 frames@8fps
Freenoise
256 frames@16fps
Spectral Blending
256 frames@12fps
VINs
256 frames@16fps
Figure 20
Campfire at night in a snowy forest with starry sky
Full
256 frames@16fps
Autoreg.
256 frames@12fps
ST2V
256 frames@8fps
Freenoise
256 frames@16fps
Spectral Blending
256 frames@12fps
VINs
256 frames@16fps
Cherry blossoms swing in front of ocean view
Full
256 frames@16fps
Autoreg.
256 frames@12fps
ST2V
256 frames@8fps
Freenoise
256 frames@16fps
Spectral Blending
256 frames@12fps
VINs
256 frames@16fps
Figure 21
Fluffy Persian cat in pearl necklace, studio, burgundy background, portrait
VINs with global tokens
256 frames@16fps
VINs without global tokens
256 frames@16fps
Smiling elderly gentlemen with rimmed glasses, in a tweed jacket, studio, potrait
VINs with global tokens
256 frames@16fps
VINs without global tokens
256 frames@16fps
Two pandas reading an academic paper
VINs with global tokens
256 frames@16fps
VINs without global tokens
256 frames@16fps
Distinguished poodle wearing tweed vest, studio, potrait
VINs with global tokens
256 frames@16fps
VINs without global tokens
256 frames@16fps