Course Objectives
- Understand Transformer architectures and foundational ML concepts.
- Learn about Large Language Models and the recent advancements in AI.
- Gain experience implementing AI models to address practical industry problems.
- Develop competencies in building interactive web demos.
Lecture Format
Each week includes instructor-led lectures and interactive student presentations:
- Instructor Lectures: Basic principles and industry insights.
- Paper Reviews: Starting third week, students present selected papers for deeper comprehension and critical understanding.
Tentative Schedule
Part 1 (Pre Mid-term) | Part 2 (Post Mid-term) | ||
---|---|---|---|
# | Topic | # | Topic |
1 | Introduction & Course Information | 9 | Large Language Models (LLMs) |
2 | ML Basics & Embeddings Introduction | 10 | Large Vision Language Models (LVLMs) |
3 | Transformer Architectures & Attention Mechanism | 11 | High-Resolution, High-Performing LVLMs |
4 | Vision Transformers - Principles & Applications | 12 | LVLMs for Document AI |
5 | Image-to-Text Technologies (Part 1) | 13 | Video Understanding with LLMs |
6 | Image-to-Text: Real-World Applications (Part 2) | 14 | Reasoning in LLMs and LVLMs |
7 | Interactive Gradio Web Demo Development | 15 | Final Project: LVLM In-Context Learning Applications |
8 | Mid-Term Project: Document Parsing Demo |
Grading Policy
- Class Participation (20%): Attendance (14%), First Presentation (5%), Additional Presentations (+1% each)
- Mid-term Project (40%): Practical AI Demo Project
- Final Project (40%): Advanced Vision-Language Demo Project
Instructor
Geewook Kim
Email: gwkim.rsrch@gmail.com
Homepage: https://geewook.kim
Q&A hours: After each class session
Lectures
Here are the lecture slides with notes.