Course Objectives

  • Understand Transformer architectures and foundational ML concepts.
  • Learn about Large Language Models and the recent advancements in AI.
  • Gain experience implementing AI models to address practical industry problems.
  • Develop competencies in building interactive web demos.

Lecture Format

Each week includes instructor-led lectures and interactive student presentations:

  1. Instructor Lectures: Basic principles and industry insights.
  2. Paper Reviews: Starting third week, students present selected papers for deeper comprehension and critical understanding.

Tentative Schedule

Part 1 (Pre Mid-term) Part 2 (Post Mid-term)
# Topic # Topic
1 Introduction & Course Information 9 Large Language Models (LLMs)
2 ML Basics & Embeddings Introduction 10 Large Vision Language Models (LVLMs)
3 Transformer Architectures & Attention Mechanism 11 High-Resolution, High-Performing LVLMs
4 Vision Transformers - Principles & Applications 12 LVLMs for Document AI
5 Image-to-Text Technologies (Part 1) 13 Video Understanding with LLMs
6 Image-to-Text: Real-World Applications (Part 2) 14 Reasoning in LLMs and LVLMs
7 Interactive Gradio Web Demo Development 15 Final Project: LVLM In-Context Learning Applications
8 Mid-Term Project: Document Parsing Demo

Grading Policy

  • Class Participation (20%): Attendance (14%), First Presentation (5%), Additional Presentations (+1% each)
  • Mid-term Project (40%): Practical AI Demo Project
  • Final Project (40%): Advanced Vision-Language Demo Project

Instructor

Geewook Kim
Email: gwkim.rsrch@gmail.com
Homepage: https://geewook.kim
Q&A hours: After each class session

Lectures

Here are the lecture slides with notes.