ModelScope is a new open-source AI model that generates short videos from text prompts.
Anyone can now try out the text-to-video system online.
- ModelScope comes from DAMO Vision Intelligence Lab, a research unit of Chinese e-commerce giant Alibaba.
- Now available on Hugging Face, the system creates two-second videos from text prompts, though its creations are far from perfect and often unsettling.
- The "text2video" diffusion model was trained on LAION5B, ImageNet, and Webvid.
- The
data included many images and videos taken from Shutterstock, causing
the stock photo site's logo to appear in some of its generations.
- In September, Meta announced Make-a-Video, which generates brief video clips from words as well as images or similar videos. It's not available to the public yet.
- Several months later, Runway AI introduced what is described as the "first publicly available text-to-video model on the market." The Stable Diffusion firm's Gen-2 model creates original videos that are three seconds long.