Definition
A multimodal AI model can process and generate different types of data: text, images, audio, video. For example, it can analyze a photo of a defective component and describe the problem in text, or read a scanned document and extract structured information. Multimodality opens scenarios such as automated visual quality control in factories.
Related terms
EXPLORE