Introduction

In this course, we’ll read and discuss the latest language modeling and representation learning methods in natural language processing. This includes prominent deep learning architectures including transformers, methods of self-supervised learning and transfer learning, contrastive learning, large language models and the power of scale, emergent properties of large language models, parameter efficient fine-tuning methods, learning from few training examples and task instructions, methods for making large language models more efficient, applications to other fields, and other recent topics in contemporary NLP.

Learning Resources

Textbook

No required textbook. But if you are interested in textbooks or book chapters:

We will be reading research papers from premier conferences in the field E.g., ACL, EMNLP, NAACL, ICLR, NeurIPS, ICML, …

Communication

We use Canvas and email for main announcements. For questions about the course, discussions about material, and faciliatating discussions for projects between students, we will mainly use Slack.


Grading

This is a seminar level course and instead of exams, grades will be based on leading/participating in class discussions and a final project.

Paper presentation and discussions (40%)

  • 20%: Paper presentation
    • Each student will lead the presentation of up to 4 sessions (depending on the size of the class). The students will be encouraged to think of themselves as the author of the paper presenting it at a conference venue. The purpose of this is to discuss the main insights and findings of the paper and connect the paper with other papers and lectures discussed in class. The presenter is also encouraged to prepare a few discussion points/questions after the presentation.
  • 10% Active participation
    • Each student, when not presenting, will engage in discussions about the paper. They will act as audience or reviewers of the paper. They will discuss strengths, weaknesses and possible extensions/solutions
  • 10% Turn in questions and occasional quiz sheets.
    • Quizes will be based on small group discussions and will be distributed occasionally in some (not all) of the sessions and must be turned in by next day after the class.

Class project (60%

Students must complete a final research project on a topic of their choice related to the class. The students can team up with other students but the team size is limited to 2 students. (In rare cases and depending on the scope of the proposed project, a group of size 3 may be also allowed.) 

  • 10%: proposal
    • Students should submit a 1-2 page proposal for their project by week 4-5. The proposal should: state and motivate the problem, and position the proposed project within related work. The project should propose either a novel research, a novel investigation of existing methods, an extension of prior work for a specific purpose, or a new application. It should also include a brief description of the approach as well as the experimental plan (e.g., baselines, datasets, etc) to validate the effectiveness of the approach.
  • 10%: project progress report
    • 2-3 page document due by week 10-11 (around the time of mid-term). It should describe the project goal and related work, initial results, and the plan continuing the project. 
  • 10%: code 
    • Your project code should be clean, readable, with clear running instructions, and the results should be fully reproducible
  • 10%: final project presentation
    • We will dedicate the final session of the class to presentations. Depending on the size of the class this can be either a poster presentation or oral presentation. We may need to extend the class time to fully accomodate all presentations.
  • 20%: final project report
    • 6-8 page conference format report (e.g., ACL/EMNLP Links to an external site.) detailing the project motivation, related work, proposed approach, results, and discussion. You can think of this as a conference paper. Negative results will not be penalized, but should be accompanied with detailed analysis of why the proposed methods didn’t work and provide some additional insights into the problem.