Course Objectives
- Understand Transformer architectures and foundational ML concepts.
- Learn about Large Language Models and the recent advancements in AI.
- Gain experience implementing AI models to address practical industry problems.
- Develop competencies in building interactive web demos.
Lecture Format
Each week includes instructor-led lectures and interactive student presentations:
- Instructor Lectures: Basic principles and industry insights.
- Paper Reviews: Starting third week, students present selected papers for deeper comprehension and critical understanding.
Tentative Schedule
Part 1 (Pre Mid-term) | Part 2 (Post Mid-term) | ||
---|---|---|---|
# | Topic | # | Topic |
1 | Introduction & Course Information | 9 | LLM Era and API Utilization |
2 | ML Basics & Embeddings Introduction | 10 | Large Language Models - In-Context Learning |
3 | Transformer Architectures & Attention Mechanism | 11 | LLMs - Fine-tuning & Advanced Usage |
4 | Vision Transformers - Principles & Applications | 12 | Large Vision-Language Models - LLaVA (Part 1) |
5 | Image-to-Text Technologies (Part 1) | 13 | Large Vision-Language Models - Applications (Part 2) |
6 | Image-to-Text: Real-World Applications (Part 2) | 14 | Large Vision-Language Models - Implementation (Part 3) |
7 | Interactive Gradio Web Demo Development | 15 | Final Project: LVLM-based Document Parsing Applications |
8 | Mid-Term Project: Document Parsing Demo |
Grading Policy
- Class Participation (20%): Attendance (14%), First Presentation (5%), Additional Presentations (+1% each)
- Mid-term Project (40%): Practical AI Demo Project
- Final Project (40%): Advanced Vision-Language Demo Project
Instructor
Geewook Kim
Email: gwkim.rsrch@gmail.com
Homepage: https://geewook.kim
Q&A hours: After each class session
Lectures
Here are the lecture slides with notes.