MLLM RSS

David Shapiro, Max Tegmark, MLLM -

Patreon: https://www.patreon.com/daveshap LinkedIn: https://www.linkedin.com/in/dave-shap-automator/ Consulting: https://www.daveshap.io/Consulting GitHub: https://github.com/daveshap Medium: https://medium.com/@dave-shap Papers referenced: Language Models Represent Space and Time - https://arxiv.org/abs/2310.02207v1 Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model - https://arxiv.org/abs/2306.05720 Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task - https://arxiv.org/abs/2210.13382 Unveiling Theory of Mind in Large Language Models: A Parallel to Single Neurons in the Human Brain - https://arxiv.org/abs/2309.01660 Boosting Theory-of-Mind Performance in Large Language Models via Prompting - https://arxiv.org/abs/2304.11490 Tiny and Efficient Model for the Edge Detection Generalization - https://arxiv.org/abs/2308.06468 ART: Automatic multi-step reasoning and tool-use for large language models - https://arxiv.org/abs/2303.09014 Not referenced, but hilarious because Max Tegmark has the best paper titles: Omnigrok: Grokking Beyond Algorithmic Data - https://arxiv.org/abs/2210.01117v2

Read more

AGI, AI, Artificial Cognition, MLLM -

Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, Nan Duan ChatGPT is attracting a cross-field interest as it provides a language interface with remarkable conversational competency and reasoning capabilities across many domains. However, since ChatGPT is trained with languages, it is currently not capable of processing or generating images from the visual world. At the same time, Visual Foundation Models, such as Visual Transformers or Stable Diffusion, although showing great visual understanding and generation capabilities, they are only experts on specific tasks with one-round fixed inputs and outputs....

Read more

AGI, Artificial Cognition, MLLM, Sebastien Bubeck -

CSAIL lectures with Sebastien Bubeck (April 6,2023) The new wave of AI systems, ChatGPT and its more powerful successors, exhibit extraordinary capabilities across a broad swath of domains. In light of this, we discuss whether artificial INTELLIGENCE has arrived.

Read more

AI, AI Models, Ilya Sutskever, MLLM -

Asked Ilya Sutskever (Chief Scientist of OpenAI) about - time to AGI - leaks and spies - what's after generative models - post AGI futures - working with MSFT and competing with Google - difficulty of aligning superhuman AI      Timestamps 00:00 Time to AGI 05:57 What’s after generative models? 10:57 Data, models, and research 15:27 Alignment 20:53 Post AGI Future 26:56 New ideas are overrated 36:22 Is progress inevitable? 41:27 Future Breakthroughs     

Read more

MLLM, Multimodal Large Language Model -

This article is part of Demystifying AI, a series of posts that (try to) disambiguate the jargon and myths surrounding AI. OpenAI has released GPT-4, the latest edition of its flagship large language model (LLM). And though few details are available, what we do know is that it will be a “multimodal” LLM, according to a Microsoft executive who spoke at a company event last week. Basically, multimodal LLMs combine text with other kinds of information, such as images, videos, audio, and other sensory data. Multimodality can solve some of the problems of the current generation of LLMs. Multimodal language models will also unlock new...

Read more

Tags
#WebChat .container iframe{ width: 100%; height: 100vh; }