Emu Video's primary function is to generate high-quality videos from text prompts using diffusion models.
Emu Video has been rated higher in quality and faithfulness to the prompt than other tools such as Make-a-Video, Imagen-Video, and others, according to human raters.
Emu Video uses a two-step process, initially generating an image from a text prompt, then creating a video from that image and the text prompt, which is efficient and requires only two diffusion models.
Emu Video is limited to producing videos that are 512px in resolution, 4 seconds long, and run at 16 frames per second.
Emu Video was developed by AI researchers at Meta, with contributions from multiple collaborators.