Dreamix: A New Diffusion Model for Video Editing

The model is capable of generating and editing videos from image and text prompts.

A team of AI developers has unveiled Dreamix, a new method for text-based motion and appearance editing of videos. The approach, which is the first diffusion-based method of its kind, combines low-resolution spatio-temporal information from the original video with newly synthesized high-resolution information to align with a guiding text prompt, allowing one to create videos based on image and text inputs.

To improve the motion editability, the team has also proposed a mixed objective that jointly fine-tunes with full temporal attention and temporal attention masking. The developers have also introduced a new framework for image animation, which transforms an image into a coarse video through simple image processing operations and then uses the general video editor to animate it.

Our method supports multiple applications by application-dependent pre-processing, converting the input content into a uniform video format. For image-to-video, the input image is duplicated and transformed using perspective transformations, synthesizing a coarse video with some camera motion. For subject-driven video generation, the input is omitted - finetuning alone takes care of fidelity," commented the team. "This coarse video is then edited using our general Dreamix Video Editor: we first corrupt the video by downsampling followed by adding noise. We then apply the finetuned text-guided video diffusion model, which upscales the video to the final spatio-temporal resolution"

Learn more here. Also, don't forget to join our 80 Level Talent platform, our Reddit page, and our Telegram channel, follow us on Instagram and Twitter, where we share breakdowns, the latest news, awesome artworks, and more.

Dreamix: A New Diffusion Model for Video Editing

Join discussion

Comments 0

You might also like

We need your consent