Date Lecture Readings Logistics
Tues 01/17/23 Lecture #1:
  • Course introduction
  • Logistics
  • Transformers
Presenter(s):
  • Arman
[ slides ]
Main readings:
  • Attention is all you need (2017) [link]

Thu 01/19/23 Lecture #2:
  • Transfer learning
  • Pre-training
  • Pre-trained transformers
Presenter(s):
  • Arman
[ slides ]
Main readings:
  • ELMo, Deep Contextualized Representations [link]
  • ULMFit, Universal Language Model Fine-tuning for Text Classification [link]
  • BERT, Pre-training of Deep Bidirectional Transformers for Language Understanding [link]
Optional readings:
  • RoBERTa, A Robustly Optimized BERT Pretraining Approach [link]

Tue 01/24/23 Lecture #3:
  • Transfer learning
  • Pre-training
Presenter(s):
  • Kejian
  • Huangrui
[ slides | slides 2 ]
[ questions form ]
Main readings:
  • Language Models are Unsupervised Multitask Learners (GPT-2) [link]
  • Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (T5) [link]
Optional readings:
  • BART, Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension [link]
  • ELECTRA, Pre-training Text Encoders as Discriminators Rather Than Generators [link]

Th 01/26/23 Lecture #4:
  • Model Architecture and Training Objectives
Presenter(s):
  • Zhangir
[ slides ]
[ questions form ]
Main readings:
  • UL2- Unifying Language Learning Paradigms (2022) [link]
  • What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? (2022) [link]

Tue 01/31/23 Lecture #5:
  • Scaling Laws for Langauge Models
  • Tips on choosing a project [Arman]
Presenter(s):
  • Ruixiao
  • Chenxi
[ slides | slides 2 ]
[ questions form ]
Main readings:
  • Scaling Laws for Neural Language Models (2020) [link]
  • Training Compute-Optimal Large Language Models (2022) [link]
Optional readings:
  • Scale Efficiently - Insights from Pre-training and Fine-tuning Transformers (2021) [link]

[Tips on choosing projects]

[Projects Doc]

Th 02/02/23 Lecture #6:
  • LLMs
  • Power of scale
Presenter(s):
  • Arman
  • Ziqing
[ slides | slides 2 ]
[ questions form ]
Main readings:
  • Language Models are Few-Shot Learners (GPT3; 2020) [link]
  • PaLM - Scaling Language Modeling with Pathways (2022) [link]

Tue 2/7/23 Lecture #7:
  • Prompting and Few-shot learning
Presenter(s):
  • Ayla
  • Hailey
[ slides | slides 2 ]
[ questions form ]
Main readings:
  • It’s Not Just Size That Matters- Small Language Models Are Also Few-Shot Learners (2020) [link]
  • Making Pre-trained Language Models Better Few-shot Learners (2021) [link]

Thu 2/9/23 Lecture #8:
  • Prompting and ICL
  • Why ICL works
Presenter(s):
  • Yujie
  • David

[ questions form ]
Main readings:
  • Rethinking the Role of Demonstrations- What Makes In-Context Learning Work? (2022) [link]
  • Data Distributional Properties Drive Emergent In-Context Learning in Transformers (2022) [link]
Optional readings:
  • What learning algorithm is in-context learning? Investigations with linear models (2022) [link]
  • Transformers as Algorithms- Generalization and Stability in In-context Learning (2023) [link]

Finalize project teams

Tue 2/14/23 Lecture #9:
  • Instruction tuning
Presenter(s):
  • Huangrui
  • Hyoungseob

[ questions form ]
Main readings:
  • Training language models to follow instructions with human feedback (2022) [link]
  • Scaling Instruction-Finetuned Language Models (2022) [link]

Thu 2/16/23 Lecture #10:
  • Parameter efficient fine-tuning
Presenter(s):
  • Zhangir
  • Hailey
[ slides ]
[ questions form ]
Main readings:
  • Towards a Unified View of Parameter-Efficient Transfer Learning (2021) [link]
  • Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning (2022) [link]
Optional readings:
  • Parameter-Efficient Transfer Learning for NLP (2019) [link]
  • Prefix-Tuning- Optimizing Continuous Prompts for Generation (2021) [link]

Tues 2/21/23 Lecture #11:
  • Chain of thought reasoning
  • Emergence
Presenter(s):
  • Ayla
  • Ziqing

[ questions form ]
Main readings:
  • Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (2022) [link]
  • Emergent Abilities of Large Language Models (2022) [link]

Thu 2/23/23 Lecture #12:
  • Efficient transformers
Presenter(s):
  • Kejian
  • Ruixiao

[ questions form ]
Main readings:
  • Longformer- The Long-Document Transformer (2020) [link]
  • Big Bird- Transformers for Longer Sequences (2020) [link]
Optional readings:
  • Efficient Transformers- A Survey [link]

Tue 2/28/23 Lecture #13:
  • Efficient transformers
Presenter(s):
  • David
  • Yujie
Main readings:
  • Efficiently Modeling Long Sequences with Structured State Spaces (2021) [link]
  • Simplifying S4 [link]
Optional readings:
  • On the Parameterization and Initialization of Diagonal State Space Models (2022) [link]
  • Hungry Hungry Hippos- Towards Language Modeling with State Space Models (2022) [link]
  • FlashAttention- Fast and Memory-Efficient Exact Attention with IO-Awareness (2022) [link]

Thu 3/2/23 Lecture #14:
  • Memory
Presenter(s):
  • Arman
[ slides ]
[ questions form ]
Main readings:
  • Memorizing Transformers (2022) [link]
Optional readings:
  • Training Language Models with Memory Augmentation (2022) [link]

Project proposals due on 3/4

Tue 3/7/23 Lecture #15:
  • Guest Lecture
  • Scott Yih - Efficient and Scalable NLP through Retrieval-Augmented Language Models
[ slides ]
Main readings:
  • Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (2020) [link]
  • Improving language models by retrieving from trillions of tokens (2021) [link]
  • REPLUG, Retrieval-Augmented Black-Box Language Models [link]
  • Retrieval-Augmented Multimodal Language Modeling [link]

Thu 3/9/23 Lecture #16:
  • Iterative methods
Presenter(s):
  • Hailey

[ questions form ]
Main readings:
  • PEER- A Collaborative Language Model (2022) [link]
Optional readings:
  • Generating Sequences by Learning to Self-Correct (2022) [link]

3/14 - 3/26 Spring break - No classes

Tue 3/28/23 Lecture #17:
  • Societal and ethical considerations, bias, safety in AI/NLP
Presenter(s):
  • Kejian

[ questions form ]
Main readings:
  • On the Dangers of Stochastic Parrots- Can Language Models Be Too Big? (2020) [link]
Optional readings:
  • Self-Diagnosis and Self-Debiasing- A Proposal for Reducing Corpus-Based Bias in NLP (2021) [link]

4/1 Progress report due

Thu 3/30/23 Lecture #18:
  • Mixture of experts and sparse models
Presenter(s):
  • Ayla

[ questions form ]
Main readings:
  • Switch Transformers- Scaling to Trillion Parameter Models (2021) [link]
Optional readings:
  • Sparse Upcycling- Training Mixture-of-Experts from Dense Checkpoints (2022) [link]

4/4/23 Lecture #19:
  • Data
  • Spurious biases
  • Dataset difficulty
Presenter(s):
  • Yujie
[ slides ]
[ questions form ]
Main readings:
  • Competency Problems- On Finding and Removing Artifacts in Language Data (2021) [link]
Optional readings:
  • Understanding Dataset Difficulty with V-Usable Information (2021) [link]

4/6/23 Lecture #20:
  • Multi-modal models
Presenter(s):
  • Zhangir
[ slides ]
Main readings:
  • Flamingo- a Visual Language Model for Few-Shot Learning (2022) [link]
Optional readings:
  • CM3- A Causal Masked Multimodal Model of the Internet (2022) [link]

4/11/23 Lecture #21:
  • Decoding methods for natural language generation
Presenter(s):
  • Ruixiao

[ questions form ]
Main readings:
  • NEUROLOGIC DECODING- (Un)supervised Neural Text Generation with Predicate Logic Constraints (2021) [link]
Optional readings:
  • NEUROLOGIC A*esque Decoding- Constrained Text Generation with Lookahead Heuristics (2022) [link]

4/13/23 Lecture #22:
  • Training LLMs with human feedback
  • AI Alignment
Presenter(s):
  • David

[ questions form ]
Main readings:
  • A General Language Assistant as a Laboratory for Alignment (2021) [link]
Optional readings:
  • Learning to Summarize with Human Feedback (2020) [link]

4/18/23 Lecture #23:
  • AI Alignment
Presenter(s):
  • Ziqing

[ questions form ]
Main readings:
  • Fine-tuning language models to find agreement among humans with diverse preferences (2022) [link]
Optional readings:
  • Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback (2022) [link]

4/20/23 Lecture #24:
  • LLMs for Code
Presenter(s):
  • Huangrui

[ questions form ]
Main readings:
  • CodeGen- An Open Large Language Model for Code with Multi-Turn Program Synthesis (2022) [link]
Optional readings:
  • InCoder- A Generative Model for Code Infilling and Synthesis (2022) [link]

4/25/23 Lecture #25: Main readings:
  • LLM.int8()- 8-bit Matrix Multiplication for Transformers at Scale [link]
  • Efficiently Scaling Transformer Inference [link]

4/27/23 Lecture #26:
  • Final presentations for projects
Presenter(s):
  • All students

4/27 Presentations; 5/10 Final project report