Lumiere is a text-to-video diffusion model designed to create realistic and coherent motion videos using a Space-Time U-Net architecture, facilitating tasks such as text-to-video, image-to-video conversion, stylized generation, video inpainting, and video editing.