AI Heroes

Emanuele Aiello, PhD Researcher, Politecnico di Torino

Emanuele is a PhD researcher specialising in Multimodal Deep Learning and Generative Models. His work has been published at top-notch conferences like NeurIPS.
Recently, Emanuele expanded his horizons with a Research Internship at Meta AI in Menlo Park. This experience sharpened his practical skills and offered a closer look at real-world AI challenges.
With a strong academic background and a dash of industry experience, Emanuele is on a quest to push the boundaries of the machine learning frontier.


Session

12-01
12:10
40min
Multimodal Large Language Models can Generate Images
Emanuele Aiello, PhD Researcher, Politecnico di Torino

Multimodal Large Language Models (MLLMs) are becoming a buzzing term in the machine learning community, thanks to their ability to seamleassy handle a huge variety of input modalities. In essence, these models leverage the knowledge power of Large Language Models (LLMs) to bridge text with images/video, speech etc. This talk aims to unfold the vibrant world of MLLMs, showcasing how they're revolutionising the field of multimodal deep learning.
The spotlight of our discourse will be the journey of MLLMs from multimodal understanding to multimodal generative capabilities. We'll delve into the recently published the Joint Autoregressive Mixture (JAM) framework. This framework marries text and image generation models in a unique model, bringing to the table a unified architecture that excels in high quality interleaved mixed-modal generation. Through a blend of theory, real-world applications, and a sneak peek into the future, this talk is aimed to explore how the world of machine learning is being reshaped with the advent of MLLMs and what this means for the broader pursuit of Artificial General Intelligence.

Tech Stage