Michael Gastpar

I am a Professor in the Information Processing Group at the School of Computer and Communication Sciences, EPFL. From 2003-2011, I was at the University of California, Berkeley, earning my tenure in 2008. My research interests are centered around Information Theory and Signal Processing. I am grateful to the generous support from the ERC Starting Grant "ComCom" (2011-2016) and from the Swiss National Science Foundation.

Research Group

Ashok Vardhan Makkuva (Postdoctoral Scholar)
Marco Bondaschi

Recent Research Results (see also arxiv/scholar/ieeexplore)

Which Algorithms Have Tight Generalization Bounds?
Michael Gastpar, Ido Nachum, Jonathan Shafer, Thomas Weinberger. (NeurIPS 2025 Spotlight), San Diego, U.S.A., December 2025. (Arxiv version)

What One Cannot, Two Can: Two-Layer Transformers Provably Represent Induction Heads on Any-Order Markov Chains?
Chanakya Ekbote, Ashok Vardhan Makkuva, Marco Bondaschi, Nived Rajaraman, Michael Gastpar, Jason D. Lee, Paul Pu Liang. (NeurIPS 2025 Spotlight), San Diego, U.S.A., December 2025. (Arxiv version)

zip2zip: Inference-Time Adaptive Vocabularies for Language Models via Token Compression
Saibo Geng, Nathan Ranchin, Yunzhen Yao, Maxime Peyrard, Chris Wendler, Michael Gastpar, Robert West. (NeurIPS 2025), San Diego, U.S.A., December 2025. (Arxiv version)

The Conditional Regret-Capacity Theorem for Batch Universal Prediction
Marco Bondaschi, Michael Gastpar. IEEE Information Theory Workshop (ITW 2025), Sydney, Australia, Sept-Oct 2025. (Arxiv version)

Characterising high-order interdependence via entropic conjugation
Fernando E Rosas, Aaron J Gutknecht, Pedro AM Mediano, Michael Gastpar. Communication Physics, 8(1):347, 2025. (Journal)

Sibson alpha-mutual information and its variational representations
Amedeo Roberto Esposito, Michael Gastpar, and Ibrahim Issa. IEEE Transactions on Information Theory, 2025. (Journal, Arxiv version)
A generalized information measure with many virtues and useful properties.

Leveraging Sparsity for Sample-Efficient Preference Learning: A Theoretical Perspective
Yunzhen Yao, Lie He, and Michael Gastpar. Forty-second International Conference on Machine Learning (ICML 2025), Vancouver, Canada, July 2025. (Arxiv version)

Model non-collapse: Minimax bounds for recursive discrete distribution estimation
Millen Kanabar, Michael Gastpar. IEEE International Symposium on Information Theory (ISIT 2025), Ann Arbor, MI, USA, June 2025. (Arxiv version)

Attention with Markov: A Curious Case of Single-layer Transformers
Ashok Vardhan Makkuva, Marco Bondaschi, Alliot Nagle, Adway Girish, Hyeji Kim, Martin Jaggi, and Michael Gastpar. The Thirteenth International Conference on Learning Representation (ICLR 2025 Spotlight), Singapore, April 2025. (Open Review version)

From Markov to Laplace: How Mamba In-Context Learns Markov Chains
Marco Bondaschi, Nived Rajaraman, Xiuying Wei, Kannan Ramchandran, Razvan Pascanu, Caglar Gulcehre, Michael Gastpar, Ashok Vardhan Makkuva. ICLR 2025 Workshop XAI4Science, Singapore, April 2025. (Arxiv version)

Alpha-NML Universal Predictors
Marco Bondaschi and Michael Gastpar. IEEE Transactions on Information Theory, 71(2):1171-1183, 2025. (Journal)
They interpolate between the Laplace estimator and the Normalized Maximum Likelihood estimator.

Lower Bounds on the Bayesian Risk via Information Measures
Amedeo Roberto Esposito, Adrien Vandenbroucque, and Michael Gastpar. Journal of Machine Learning Research (JMLR), 25(340):1-45, 2024.(Journal)
Impossibility results for estimation problems via f-divergences, Rényi divergence, and Maximal Leakage.

Research (Selected)

Generalization Error Bounds Via Rényi-, f-Divergences and Maximal Leakage
Amedeo Roberto Esposito, Michael Gastpar, and Ibrahim Issa. IEEE Transactions on Information Theory, 67(8):4986-5004, August 2021. (Journal, Arxiv version)
New classes of concentration bounds on the generalization error. For example, we show that the probability that the generalization error is larger than η is bounded by exp(-n η + L(D→A)), where the size of the training set D is n, the output of the learning algorithm is A, and L denotes Sibson's mutual information of order infinity.

Quantifying high-order interdependencies via multivariate extensions of the mutual information
Fernando E Rosas, Pedro AM Mediano, Michael Gastpar, Henrik J Jensen. Physical Review E, 100(3):032305, 2019. (Journal)
Proposal of a compact multivariate information measure, capturing synergy and redundancy.

Remote source coding under Gaussian noise: Dueling roles of power and entropy power
Krishnan Eswaran, Michael Gastpar. IEEE Transactions on Information Theory, 65(7):4486-4498, August 2019. (Journal, Arxiv version)
New converse bounds for the CEO problem with a general source observed in Gaussian noise.

The sampling rate-distortion tradeoff for sparsity pattern recovery in compressed sensing
Galen Reeves, Michael Gastpar. IEEE Transactions on Information Theory, 58(5):3065-3092 May 2012. (Journal)
Information theory of compressed sensing.

Compute-and-forward: Harnessing interference through structured codes
Bobak Nazer, Michael Gastpar. IEEE Transactions on Information Theory, 57(10):6463-6486, October 2011. (Journal)
New ways in cooperative communications.

Uncoded transmission is exactly optimal for a simple Gaussian "sensor" network
Michael Gastpar. IEEE Transactions on Information Theory, 54(11):5247-5251, November 2008. (Journal)
An intricate network for which a full and exact analysis can be given.

Computation over multiple-access channels
Bobak Nazer, Michael Gastpar. IEEE Transactions on Information Theory, 53(10):3498-3516, October 2007. (Journal)
Computation and Communication are two sides of one coin.

Cooperative strategies and capacity theorems for relay networks
Gerhard Kramer, Michael Gastpar, Piyush Gupta. IEEE Transactions on Information Theory, 51(9):3037-3063, September 2005. (Journal)
Information-theoretic principles for networks with many relaying nodes.

To code, or not to code: Lossy source-channel communication revisited
Michael Gastpar, Bixio Rimoldi, Martin Vetterli. IEEE Transactions on Information Theory, 49(5):1147-1158, May 2003. (Journal)
All instances where uncoded transmission is as good as the best error-control code.

Alumni PhD (see also Mathematics Genealogy)

Anand Sarwate (2008 at UC Berkeley, now professor at Rutgers)
Bobak Nazer (2009 at UC Berkeley, now professor at Boston University)
Krishnan Eswaran (2009 at UC Berkeley, jointly with Kannan Ramchandran, now at Google)
Galen Reeves (2011 at UC Berkeley, now professor at Duke University)
Jiening Zhan (2012 at UC Berkeley, now at Google)
Naveen Goela (2013 at UC Berkeley, now at Tanium)
Nebojsa Milosavljevic (2013 at UC Berkeley, jointly with Kannan Ramchandran, now at Cisco/Meraki)
Kelvin So (2013 at UC Berkeley, jointly with Jose Carmena, now at Apple)
Chien-Yi Wang (2015 at EPFL, now at Realtek)
Zhijie Ren (2016 at Technische Universiteit Delft, jointly with Jos Weber)
Jingge Zhu (2016 at EPFL, now Lecturer at the University of Melbourne)
Giel Op 't Veld (2017 at EPFL, now at ASML)
Saeid Sahraei (2018 at EPFL, now at Qualcomm)
Su Li (2020 at EPFL, now at Huawei)
Erixhen Sula (2021 at EPFL, now postdoc at MIT)
Amedeo Roberto Esposito (2022 at EPFL, now Assistant Professor at OIST)

Education and Employment

Since 2011: Professor, EPFL

2003-11: Assistant, then Associate (with tenure) Professor, University of California, Berkeley

1999-2002: PhD thesis, EPFL. 1997-1998: MS, Univ. Illinois. 1992-1997: ETH Zurich.

Courses

I teach (taught) COM-102 "Advanced Information, Computation, Communication II" (2021-), EE-205 "Signals and Systems" (2015-19), COM-404 "Information Theory and Coding" (2016), COM-406 "Foundations of Data Science" (2017-19, 2021, 2023-), COM-421 "Statistical Neuroscience" (2012, 2014), COM-510 "Advanced Digital Communication" (2011-15), COM-621 "Advanced Topics in Information Theory" (2021,2023)
Previously at the University of California, Berkeley, I taught EE-120 "Signals and Systems" (2003-05), EE-121 "Communications" (2005), EE-123 "Signal Processing" (2007), EE-224A "Communications" (2010), EE-225A "Signal Processing" (2006-08), EE-226A "Random Processes" (2009), EE-290S "Advanced Information Theory (2004, 2006), MCB-262 "Theoretical Neuroscience" (2003, 05, 07, 09).

Program Committees

I have served as Technical Program Co-Chair for the IEEE International Symposium on Information Theory in 2010 (Austin, TX, USA) and in 2021 (Melbourne, Australia).