Course Objectives

Understand Transformer architectures and foundational ML concepts.
Learn about Large Language Models and the recent advancements in AI.
Gain experience implementing AI models to address practical industry problems.
Develop competencies in building interactive web demos.

Topic and Schedule

Part 1 (Pre Mid-term)		Part 2 (Post Mid-term)
#	Topic	#	Topic
1	Introduction & Course Information	9	Large Language Models (LLMs)
2	ML Basics & Embeddings Introduction	10	Large Vision Language Models (LVLMs)
3	Transformer Architectures & Attention Mechanism	11	High-Resolution, High-Performing LVLMs
4	Vision Transformers - Principles & Applications	12	LVLMs for Document AI
5	Image-to-Text Technologies (Part 1)	13	Video Understanding with LLMs
6	Image-to-Text: Real-World Applications (Part 2)	14	Reasoning in LLMs and LVLMs
7	Interactive Gradio Web Demo Development	15	Final Project: LVLM In-Context Learning Applications
8	Mid-Term Project: Document Parsing Demo

Lecture Format

Each week includes instructor-led lectures and interactive student presentations:

Instructor Lectures: Basic principles and industry insights.
Paper Reviews: Starting third week, students present selected papers for deeper comprehension and critical understanding.

Grading Policy

Class Participation (20%): Attendance (14%), First Presentation (5%), Additional Presentations (+1% each)
Mid-term Project (40%): Practical AI Demo Project
Final Project (40%): Advanced Vision-Language Demo Project

Instructor

Geewook Kim
Email: gwkim.rsrch@gmail.com
Homepage: https://geewook.kim
Q&A hours: After each class session

Lectures

Here are the lecture slides with notes.