ICONIP2025 Tutorial 1

Mathematical theories of Deep Foundation Models

Date: November 20, 2025
Time: 10:00–12:00
Location: Auditorium

Tutorial slide: Main presentation slide.

Appendix I: Optimization theory of neural network to estimate Gaussian single index models.
Appendix II: Computational complexity for ICL and its test time feature learning.
Appendix III: Theory of chain of thought.

Tutorial Organizer

Taiji Suzuki (The University of Tokyo / RIKEN-AIP)

Abstract

This lecture explains the mathematical theory for understanding the learning capabilities of deep foundation models. While the development of the deep foundation models is driven by the scaling law, a theoretical understanding of the learning principles behind the scaling law is increasingly important.

Generalization is essential for biological intelligence, as organisms must adapt to changing environments and select appropriate actions. To achieve superior generalization, it is necessary to acquire compressed representations that avoid rote memorization, making representation learning and feature learning fundamental. It has been theoretically shown that deep learning naturally achieves feature learning through its deep structure, thereby gaining various advantages in generalization. This is particularly crucial for diffusion models and Transformers.

However, due to the non-convexity of the loss function, it is not obvious whether appropriate features can be acquired by stochastic gradient descent. This lecture will also cover the theoretical guarantees for this process. Furthermore, feature learning is significant not only during pre-training but also during test-time inference. This will be demonstrated concisely using in-context learning as an example. Additionally, as a theory of test-time inference, the principles by which chain-of-thought and reinforcement learning improve learning efficiency will be introduced.

Short bio of the organizer

Taiji Suzuki is currently a full Professor in the Department of Mathematical Informatics at the University of Tokyo. He also serves as the team director of “Deep learning theory” team in AIP-RIKEN. He received his Ph.D. degree in information science and technology from the University of Tokyo in 2009. He worked as an assistant professor in the department of mathematical informatics, the University of Tokyo between 2009 and 2013, and then he was an associate professor in the department of mathematical and computing science, Tokyo Institute of Technology between 2013 and 2017. After that, he was an associate professor in the department of mathematical informatics at the University of Tokyo between 2017 and 2024.

He served as area chairs of premier conferences such as NeurIPS, ICML, ICLR and AISTATS, a program chair of ACML2019, and an action editor of the Annals of Statistics. He received the Outstanding Paper Award at ICLR in 2021, the MEXT Young Scientists’ Prize, and Outstanding Achievement Award in 2017 from the Japan Statistical Society. He is interested in deep learning theory, nonparametric statistics, high dimensional statistics, and stochastic optimization. In particular, he is mainly working on deep learning theory from several aspects such as representation ability, generalization ability and optimization ability. He also has devoted stochastic optimization to accelerate large scale machine learning problems including variance reduction methods, Nesterov’s acceleration, federated learning and non-convex noisy optimization.