Meta is releasing a new multimodal AI model


Meta is releasing a new multimodal AI model, called ImageBind,;as an open-source tool.;

While still in the early stages, ImageBind acts as a framework for eventually creating;complex scenes and environments from one or several inputs, such as a text or image prompt.

For example, if fed a picture of a beach, ImageBind could identify the sound of waves. Similarly, if given a photo of a tiger along with the sound of a waterfall, the system could produce a video of both.

  • The model currently works with six types of data, which are text, visual (image/video), audio, depth, temperature, and movement.
  • Its approach is comparable to how humans gather information through multiple senses and can relate inputs between the different data modes.

Meta says the model gives machines a "holistic understanding" that links objects in a photo to their corresponding sound, 3D structure, temperature, and motion.

  • While Meta hasn't released it as a product, ImageBind's applications could include enhancing search functionality for photos and videos or creating mixed-reality environments. Meta plans to expand ImageBind's data modes to other senses in the future.
  • Meta's research paper on ImageBind is;available here.

Post a Comment

Previous Next

Contact Form