In the ever-evolving world of Artificial Intelligence, developers regularly launch special tools for mainstream users and content creators alike. Not long ago, OpenAI introduced Sora and Google launched Vids, which can generate videos from text.
Stepping up the game, Microsoft has now unveiled the VASA-1 AI Video Generator. Showcasing a range of samples on their official website, the tool can craft a person's video from just a photo, with the resulting samples demonstrating remarkable outcomes.
Named Visual Affective Skills Audio, or VASA-1, this top-of-the-line model by
Microsoft
exquisitely generates facial expressions. It is capable of reflecting a wide spectrum of emotions and feelings through photos on our faces.
Put simply, one can generate videos with various expressions from a single, static picture. This tool uses facial muscles, lips, nose, the tilt of the head, and other factors to craft the video. Some of these groundbreaking videos have been made public.
Currently, Microsoft VASA-1 can create videos with a maximum resolution of 512×512 pixels at 40fps. The company claims that with VASA-1, the videos produced are as expressive as you are in real life.
Source: aajtak
The VASA-1 has been presented by Microsoft as a research demonstration and they've made it clear that there is no plan to release the product to the public or to provide an API, as there's potential for misuse.
The VASA-1 tool is akin to OpenAI's Sora. Both tools excel in generating realistic videos. While Sora includes complex videos with backgrounds and artifacts, VASA-1 focuses primarily on human expressions. Nevertheless, neither tool is currently available to the public domain.