Course Objectives

  • Understand Transformer architectures and foundational ML concepts.
  • Learn about Large Language Models and the recent advancements in AI.
  • Gain experience implementing AI models to address practical industry problems.
  • Develop competencies in building interactive web demos.

Lecture Format

Each week includes instructor-led lectures and interactive student presentations:

  1. Instructor Lectures: Basic principles and industry insights.
  2. Paper Reviews: Starting third week, students present selected papers for deeper comprehension and critical understanding.

Tentative Schedule

Part 1 (Pre Mid-term) Part 2 (Post Mid-term)
# Topic # Topic
1 Introduction & Course Information 9 LLM Era and API Utilization
2 ML Basics & Embeddings Introduction 10 Large Language Models - In-Context Learning
3 Transformer Architectures & Attention Mechanism 11 LLMs - Fine-tuning & Advanced Usage
4 Vision Transformers - Principles & Applications 12 Large Vision-Language Models - LLaVA (Part 1)
5 Image-to-Text Technologies (Part 1) 13 Large Vision-Language Models - Applications (Part 2)
6 Image-to-Text: Real-World Applications (Part 2) 14 Large Vision-Language Models - Implementation (Part 3)
7 Interactive Gradio Web Demo Development 15 Final Project: LVLM-based Document Parsing Applications
8 Mid-Term Project: Document Parsing Demo

Grading Policy

  • Class Participation (20%): Attendance (14%), First Presentation (5%), Additional Presentations (+1% each)
  • Mid-term Project (40%): Practical AI Demo Project
  • Final Project (40%): Advanced Vision-Language Demo Project

Instructor

Geewook Kim
Email: gwkim.rsrch@gmail.com
Homepage: https://geewook.kim
Q&A hours: After each class session

Lectures

Here are the lecture slides with notes.