MLLM RSS

AI, AI Models, Artificial Cognition, Digital Minds, Generative AI, Kosmos-1, MLLM, Multimodal Large Language Model - March 09, 2023

Language Is Not All You Need: Aligning Perception with Language Models

A big convergence of language, multimodal perception, action, and world modeling is a key step toward general artificial cognition. In this work, we introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot). We train Kosmos-1 from scratch on web-scale multimodal corpora, including arbitrarily interleaved text and images, image-caption pairs, and text data. We evaluate various settings, including zero-shot, few-shot, and multimodal chain-of-thought prompting, on a wide range of tasks without any gradient updates or finetuning. Experimental results show that Kosmos-1 achieves impressive performance on (i)...

Read more

Humanity

Universe

MLLM RSS

Language Is Not All You Need: Aligning Perception with Language Models