Generative AI

A rigorous deep dive into the architectures, mathematics, and code defining the future of Artificial Intelligence.

Join the Google Classroom

Enroll to access course materials, assignments, and announcements.

Class code: 2pxf2bro
Join Classroom →
Prof. Fabrizio Silvestri

Prof. Fabrizio Silvestri

Course Instructor

View Profile →
Teaching Assistant

Ali Ghasemi

Teaching Assistant

View Profile →

Course Syllabus

Foundations of AI
4 Lectures

Part I: Foundations

  • Probability Theory & Linear Algebra
  • Optimization & Information Theory
  • Deep Learning Architectures & Attention
  • CLIP, Contrastive Learning & Autoencoders
Generative Art
4 Lectures

Part II: Generative AI for Images

  • VQ-VAE & Generative Adversarial Networks
  • Normalizing Flows (Continuous Flows, Neural ODEs)
  • Diffusion Models (DDPM, Score-Based)
  • Diffusion Architectures (LDM, DiT, Adapters)

💻 Hands-on Labs

  • Lab 1: VAE & GAN Implementation
  • Lab 2: Diffusion Models: DDPM Training & Sampling
Natural Language Processing
5 Lectures

Part III: Generative AI for Text

  • NLP Foundations (Tokenization, Embeddings)
  • LLM Architecture (GPT, LLaMA)
  • LLM Architecture & Scaling Laws
  • Alignment (RLHF, DPO, ORPO, LoRA)
  • Retrieval Augmented Generation (RAG)

💻 Hands-on Labs

  • Lab 3: NanoGPT: Building a Micro-GPT
  • Lab 4: LLM Applications: RAG & Agents
Future of AI
3 Lectures

Part IV: Frontiers & Advanced Topics

  • Agentic AI (ReAct, Reflexion, Tree of Thoughts, Tool Use)
  • JEPA (I-JEPA, V-JEPA 2, LLM-JEPA)
  • Multimodal LLMs (CLIP, LLaVA, GPT-4o, Gemini)

💻 Hands-on Labs

  • Lab 4: LLM Applications: RAG & Agents
  • Lab 5: Multimodal: VLM Applications & Fine-tuning

Tentative Schedule

Date Day Type Topic Content
Part I: Foundations (Ch 1–7)
Feb 25 Wed Lecture 01: Foundations I Probability Theory, Linear Algebra.
Feb 27 Fri Lecture 02: Foundations II Optimization, Information Theory.
Mar 04 Wed Lecture 03: Deep Learning & Attention DL Architectures, CNNs, Transformers.
Mar 06 Fri Lecture 04: CLIP & Autoencoders Contrastive Learning, VAEs, ELBO.
Part II: Generative AI for Images (Ch 8–13)
Mar 11 Wed Lecture 05: VQ-VAE & GANs Vector Quantized Models, Adversarial Training.
Mar 13 Fri Lab 01 VAE & GAN Lab VAE implementation, GAN training.
Mar 18 Wed Lecture 06: Normalizing Flows Invertible Networks, Continuous Flows, Neural ODEs.
Mar 20 Fri Lecture 07: Diffusion Models DDPM, Score-Based Models, U-Net.
Mar 25 Wed Lab 02 Diffusion Lab DDPM & Latent Diffusion Training.
Mar 27 Fri Exercise 08: Comprehensive Vision AI Review Exercises on Foundations, VAEs, GANs, Diffusion.
Part III: Generative AI for Text (Ch 14–19)
Apr 01 Wed Lecture 09: NLP Foundations Tokenization, Embeddings, RNNs.
Apr 03 Fri Holiday Easter Break (No Class)
Apr 08 Wed Lecture 10: LLM Architecture GPT, LLaMA, Inference Optimization.
Apr 10 Fri Lecture 10: LLM Architecture (cont.) Scaling Laws, KV Cache, GQA, Inference.
Apr 15 Wed Lecture 11: Alignment RLHF, DPO, ORPO, PEFT.
Apr 17 Fri Lab 03 NanoGPT Lab Building a Micro-GPT from Scratch.
Apr 22 Wed Lecture 11/12: Alignment (Part II) + RAG & Agentic AI (Part I) Close RLHF/DPO/ORPO/LoRA; open RAG (Lewis 2020, DPR, pipeline).
Apr 24 Fri Lecture 12: RAG & Agentic AI (Part II) Review of NLP, LLMs, Alignment, RAG.
Apr 29 Wed Lab 04 LLM Applications Lab RAG & Agent Implementation.
Part IV: Frontiers & Advanced Topics (Ch 20–24)
May 01 Fri Holiday Labor Day (No Class)
May 06 Wed Lecture 15: JEPA LeCun's non-generative bet. EBMs, collapse & anti-collapse (BYOL, DINO, VICReg), I-JEPA, V-JEPA 2, LLM-JEPA, theory & outlook.
May 08 Fri Exercise 16: Deep Generative Modeling - Theory & Practice VAE, GAN, Diffusion, Transformer Exercises.
May 13 Wed Lecture 17: Multimodal LLMs CLIP recap, Flamingo, LLaVA, GPT-4o, Gemini, Chameleon, audio/video/3D, evaluation.
May 15 Fri Lab 05 Multimodal Lab VLM Applications & Fine-tuning.
May 20 Wed Exercise 18: Exercises I Foundations, Images & Diffusion Review.
May 22 Fri Lecture Invited Lecture — PhD Students Guest research lecture by PhD students.
May 27 Wed Exercise 19: Final Comprehensive Exercise Exam-style review across all course topics.

Important Papers

Canonical references for every paper introduced in the lecture decks or the lecture notes, in IEEE citation style. Peer-reviewed venues are preferred over preprints; where a paper appeared at both, the conference or journal is cited and the arXiv ID is given in parentheses.

Part I — Foundations

  1. C. E. Shannon, "A Mathematical Theory of Communication," Bell System Technical Journal, vol. 27, no. 3, pp. 379–423, 1948.
  2. H. Robbins and S. Monro, "A Stochastic Approximation Method," The Annals of Mathematical Statistics, vol. 22, no. 3, pp. 400–407, 1951.
  3. F. Rosenblatt, "The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain," Psychological Review, vol. 65, no. 6, pp. 386–408, 1958.
  4. K. Hornik, M. Stinchcombe, and H. White, "Multilayer Feedforward Networks Are Universal Approximators," Neural Networks, vol. 2, no. 5, pp. 359–366, 1989.
  5. D. P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization," in Proc. ICLR, 2015.
  6. I. Loshchilov and F. Hutter, "Decoupled Weight Decay Regularization," in Proc. ICLR, 2019.
  7. A. Vaswani et al., "Attention Is All You Need," in Advances in NeurIPS, 2017.
  8. D. Bahdanau, K. Cho, and Y. Bengio, "Neural Machine Translation by Jointly Learning to Align and Translate," in Proc. ICLR, 2015.
  9. J. Su et al., "RoFormer: Enhanced Transformer with Rotary Position Embedding," Neurocomputing, vol. 568, Art. no. 127063, 2024 (arXiv:2104.09864).
  10. T. Dao et al., "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness," in Advances in NeurIPS, 2022.
  11. K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in Proc. CVPR, 2016, pp. 770–778.
  12. O. Ronneberger, P. Fischer, and T. Brox, "U-Net: Convolutional Networks for Biomedical Image Segmentation," in Proc. MICCAI, 2015, pp. 234–241.
  13. S. Ioffe and C. Szegedy, "Batch Normalization," in Proc. ICML, 2015, pp. 448–456.
  14. J. L. Ba, J. R. Kiros, and G. E. Hinton, "Layer Normalization," arXiv:1607.06450, 2016.

Part II — Generative AI for Images

  1. D. P. Kingma and M. Welling, "Auto-Encoding Variational Bayes," in Proc. ICLR, 2014.
  2. I. Goodfellow et al., "Generative Adversarial Nets," in Advances in NeurIPS, 2014.
  3. I. Higgins et al., "β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework," in Proc. ICLR, 2017.
  4. A. van den Oord, O. Vinyals, and K. Kavukcuoglu, "Neural Discrete Representation Learning," in Advances in NeurIPS, 2017.
  5. L. Dinh, J. Sohl-Dickstein, and S. Bengio, "Density Estimation Using Real NVP," in Proc. ICLR, 2017.
  6. D. P. Kingma and P. Dhariwal, "Glow: Generative Flow with Invertible 1×1 Convolutions," in Advances in NeurIPS, 2018.
  7. D. Rezende and S. Mohamed, "Variational Inference with Normalizing Flows," in Proc. ICML, 2015, pp. 1530–1538.
  8. J. Sohl-Dickstein et al., "Deep Unsupervised Learning Using Nonequilibrium Thermodynamics," in Proc. ICML, 2015, pp. 2256–2265.
  9. J. Ho, A. Jain, and P. Abbeel, "Denoising Diffusion Probabilistic Models," in Advances in NeurIPS, 2020.
  10. Y. Song and S. Ermon, "Generative Modeling by Estimating Gradients of the Data Distribution," in Advances in NeurIPS, 2019.
  11. A. Nichol and P. Dhariwal, "Improved Denoising Diffusion Probabilistic Models," in Proc. ICML, 2021, pp. 8162–8171.
  12. P. Dhariwal and A. Nichol, "Diffusion Models Beat GANs on Image Synthesis," in Advances in NeurIPS, 2021.
  13. J. Ho and T. Salimans, "Classifier-Free Diffusion Guidance," in NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021 (arXiv:2207.12598).
  14. W. Peebles and S. Xie, "Scalable Diffusion Models with Transformers," in Proc. ICCV, 2023, pp. 4195–4205.
  15. K. He et al., "Masked Autoencoders Are Scalable Vision Learners," in Proc. CVPR, 2022, pp. 16000–16009.
  16. M. Heusel et al., "GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium (FID)," in Advances in NeurIPS, 2017.

Part III — Generative AI for Text

  1. T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient Estimation of Word Representations in Vector Space," in Workshop at ICLR, 2013.
  2. S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
  3. J. Devlin et al., "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," in Proc. NAACL, 2019, pp. 4171–4186.
  4. A. Radford et al., "Improving Language Understanding by Generative Pre-Training," OpenAI Tech. Rep., 2018.
  5. T. Brown et al., "Language Models Are Few-Shot Learners (GPT-3)," in Advances in NeurIPS, 2020.
  6. H. Touvron et al., "LLaMA: Open and Efficient Foundation Language Models," arXiv:2302.13971, 2023.
  7. H. Touvron et al., "LLaMA 2: Open Foundation and Fine-Tuned Chat Models," arXiv:2307.09288, 2023.
  8. J. Kaplan et al., "Scaling Laws for Neural Language Models," arXiv:2001.08361, 2020.
  9. J. Hoffmann et al., "Training Compute-Optimal Large Language Models (Chinchilla)," in Advances in NeurIPS, 2022.
  10. N. Shazeer, "Fast Transformer Decoding: One Write-Head Is All You Need (MQA)," arXiv:1911.02150, 2019.
  11. J. Ainslie et al., "GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints," in Proc. EMNLP, 2023 (arXiv:2305.13245).
  12. L. Ouyang et al., "Training Language Models to Follow Instructions with Human Feedback (InstructGPT)," in Advances in NeurIPS, 2022.
  13. J. Schulman et al., "Proximal Policy Optimization Algorithms," arXiv:1707.06347, 2017.
  14. J. Schulman et al., "Trust Region Policy Optimization," in Proc. ICML, 2015, pp. 1889–1897.
  15. R. Rafailov et al., "Direct Preference Optimization: Your Language Model Is Secretly a Reward Model," in Advances in NeurIPS, 2023.
  16. J. Hong, N. Lee, and J. Thorne, "ORPO: Monolithic Preference Optimization without Reference Model," in Proc. EMNLP, 2024.
  17. Z. Shao et al., "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (GRPO)," arXiv:2402.03300, 2024.
  18. E. J. Hu et al., "LoRA: Low-Rank Adaptation of Large Language Models," in Proc. ICLR, 2022.
  19. T. Dettmers et al., "QLoRA: Efficient Finetuning of Quantized LLMs," in Advances in NeurIPS, 2023.
  20. P. Lewis et al., "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks," in Advances in NeurIPS, 2020.
  21. K. Guu et al., "REALM: Retrieval-Augmented Language Model Pre-training," in Proc. ICML, 2020.
  22. V. Karpukhin et al., "Dense Passage Retrieval for Open-Domain Question Answering," in Proc. EMNLP, 2020, pp. 6769–6781.
  23. O. Khattab and M. Zaharia, "ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT," in Proc. SIGIR, 2020, pp. 39–48.
  24. K. Santhanam et al., "ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction," in Proc. NAACL, 2022.
  25. T. Formal, B. Piwowarski, and S. Clinchant, "SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking," in Proc. SIGIR, 2021, pp. 2288–2292.
  26. G. Izacard and É. Grave, "Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering (FiD)," in Proc. EACL, 2021, pp. 874–880.
  27. A. Asai et al., "Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection," in Proc. ICLR, 2024 (arXiv:2310.11511).
  28. D. Edge et al., "From Local to Global: A Graph RAG Approach to Query-Focused Summarization," arXiv:2404.16130, 2024.
  29. Y. A. Malkov and D. A. Yashunin, "Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 42, no. 4, pp. 824–836, 2020.
  30. N. Thakur et al., "BEIR: A Heterogeneous Benchmark for Zero-Shot Evaluation of Information Retrieval Models," in Advances in NeurIPS Datasets and Benchmarks, 2021.
  31. N. Muennighoff et al., "MTEB: Massive Text Embedding Benchmark," in Proc. EACL, 2023, pp. 2014–2037.
  32. N. Kandpal et al., "Large Language Models Struggle to Learn Long-Tail Knowledge," in Proc. ICML, 2023, pp. 15696–15707.
  33. N. F. Liu et al., "Lost in the Middle: How Language Models Use Long Contexts," Transactions of the ACL, vol. 12, pp. 157–173, 2024.
  34. F. Cuconasu et al., "The Power of Noise: Redefining Retrieval for RAG Systems," in Proc. SIGIR, 2024, pp. 719–729.
  35. G. Trappolini, F. Cuconasu, S. Filice, Y. Maarek, and F. Silvestri, "Redefining Retrieval Evaluation in the Era of LLMs," in Proc. EACL, 2026, pp. 8359–8375.
  36. S. Es, J. James, L. Espinosa-Anke, and S. Schockaert, "RAGAS: Automated Evaluation of Retrieval Augmented Generation," in Proc. EACL: System Demonstrations, 2024, pp. 150–158.
  37. S. Yao et al., "ReAct: Synergizing Reasoning and Acting in Language Models," in Proc. ICLR, 2023 (arXiv:2210.03629).
  38. N. Shinn et al., "Reflexion: Language Agents with Verbal Reinforcement Learning," in Advances in NeurIPS, 2023 (arXiv:2303.11366).
  39. S. Yao et al., "Tree of Thoughts: Deliberate Problem Solving with Large Language Models," in Advances in NeurIPS, 2023 (arXiv:2305.10601).
  40. T. Schick et al., "Toolformer: Language Models Can Teach Themselves to Use Tools," in Advances in NeurIPS, 2023 (arXiv:2302.04761).
  41. G. Wang et al., "Voyager: An Open-Ended Embodied Agent with Large Language Models," Transactions on Machine Learning Research (TMLR), 2024.
  42. C. E. Jimenez et al., "SWE-bench: Can Language Models Resolve Real-World GitHub Issues?," in Proc. ICLR, 2024 (arXiv:2310.06770).
  43. S. Zhou et al., "WebArena: A Realistic Web Environment for Building Autonomous Agents," in Proc. ICLR, 2024.
  44. K. Greshake et al., "Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection," in Proc. ACM Workshop on AI and Security (AISec), 2023, pp. 79–90.

Part IV — Frontiers (JEPA, Multimodal)

  1. Y. LeCun, "A Path Towards Autonomous Machine Intelligence," OpenReview preprint, Version 0.9.2, 2022.
  2. M. Assran et al., "Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture (I-JEPA)," in Proc. CVPR, 2023, pp. 15619–15629.
  3. A. Bardes, J. Ponce, and Y. LeCun, "VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning," in Proc. ICLR, 2022.
  4. J. Zbontar et al., "Barlow Twins: Self-Supervised Learning via Redundancy Reduction," in Proc. ICML, 2021, pp. 12310–12320.
  5. J.-B. Grill et al., "Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning (BYOL)," in Advances in NeurIPS, 2020.
  6. M. Caron et al., "Emerging Properties in Self-Supervised Vision Transformers (DINO)," in Proc. ICCV, 2021, pp. 9650–9660.
  7. T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, "A Simple Framework for Contrastive Learning of Visual Representations (SimCLR)," in Proc. ICML, 2020, pp. 1597–1607.
  8. Y. Tian, X. Chen, and S. Ganguli, "Understanding Self-Supervised Learning Dynamics Without Contrastive Pairs," in Proc. ICML, 2021, pp. 10268–10278.
  9. R. Shwartz-Ziv, R. Balestriero, K. Kawaguchi, T. G. J. Rudner, and Y. LeCun, "An Information Theory Perspective on Variance-Invariance-Covariance Regularization," in Advances in NeurIPS, 2023 (arXiv:2303.00633).
  10. E. Littwin, O. Saremi, M. Advani, C. Huang, P. Nakkiran, J. Susskind, and V. Thilak, "How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self-Distillation Networks," in Advances in NeurIPS, 2024 (arXiv:2407.03475).
  11. M. Assran et al., "V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning," arXiv:2506.09985, 2025.
  12. H. Huang, Y. LeCun, and R. Balestriero, "LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures," arXiv:2509.14252, 2025.
  13. A. Radford et al., "Learning Transferable Visual Models from Natural Language Supervision (CLIP)," in Proc. ICML, 2021, pp. 8748–8763.
  14. M. Tsimpoukelli et al., "Multimodal Few-Shot Learning with Frozen Language Models," in Advances in NeurIPS, 2021.
  15. J.-B. Alayrac et al., "Flamingo: A Visual Language Model for Few-Shot Learning," in Advances in NeurIPS, 2022.
  16. J. Li, D. Li, S. Savarese, and S. Hoi, "BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models," in Proc. ICML, 2023, pp. 19730–19742.
  17. H. Liu, C. Li, Q. Wu, and Y. J. Lee, "Visual Instruction Tuning (LLaVA)," in Advances in NeurIPS, 2023.
  18. H. Liu et al., "Improved Baselines with Visual Instruction Tuning (LLaVA-1.5)," in Proc. CVPR, 2024, pp. 26296–26306 (arXiv:2310.03744).
  19. Chameleon Team (Meta), "Chameleon: Mixed-Modal Early-Fusion Foundation Models," arXiv:2405.09818, 2024.
  20. A. Brohan et al., "RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control," in Proc. CoRL, 2023, pp. 2165–2183.
  21. D. Driess et al., "PaLM-E: An Embodied Multimodal Language Model," in Proc. ICML, 2023, pp. 8469–8488.
  22. A. Radford et al., "Robust Speech Recognition via Large-Scale Weak Supervision (Whisper)," in Proc. ICML, 2023, pp. 28492–28518.
  23. Z. Borsos et al., "AudioLM: A Language Modeling Approach to Audio Generation," IEEE/ACM Trans. Audio, Speech and Language Processing, vol. 31, pp. 2523–2533, 2023.
  24. X. Yue et al., "MMMU: A Massive Multi-Discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI," in Proc. CVPR, 2024, pp. 9556–9567.
  25. Y. Li et al., "Evaluating Object Hallucination in Large Vision-Language Models (POPE)," in Proc. EMNLP, 2023, pp. 292–305.