Research

My research interests span machine learning, focusing on both its underlying principles and practical applications. I specialize in creating robust, generalizable machine learning systems for real-world application.

Recent Projects

Highlighted Open-Source Projects

I have built several open source repositories as part of my journey in the field of machine learning and artificial intelligence. If you have any questions or feedback, feel free to leave an issue or email me. Your input is greatly appreciated!

List of Publications

Please see my Google Scholar or Semantic Scholar for an up-to-date list.

(C: Peer-Reviewed International Conference Papers, W: Peer-Reviewed Workshop Papers, O: Other Publications/Arxiv Preprints)

[C16] G. Paik, G. Kim, and J. Lim, “MMRefine: Unveiling the Obstacles to Robust Refinement in Multimodal Large Language Models”, Findings of the Association for Computational Linguistics (ACL Findings), 2025 (to appear).
[C15] S. Park, and G. Kim (co-first and corresponding author), “Evaluating Multimodal Generative AI with Korean Educational Standards”, Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2025. Paper / GitHub
[C14] S. Lee, G. Kim (co-first author), J. Kim (co-first author), H. Lee, H. Chang, S. Park, and M. Seo, “How Does Vision-Language Adaptation Impact the Safety of Vision Language Models?”, Proceedings of the Thirteenth International Conference on Learning Representations (ICLR), 2025. Paper
[C13] G. Kim, and M. Seo, “On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning”, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024. Paper / Slide / Poster / GitHub
[C12] S. Lee, S. Kim, S. Park, G. Kim, and M. Seo, “Prometheus-Vision: Vision-Language Model as a Judge for Fine-Grained Evaluation”, Findings of the Association for Computational Linguistics (ACL Findings), 2024. Paper / GitHub
[C11] Y. Okamoto, Y. Baek, G. Kim, R. Nakao, D. Kim, M. Yim, S. Park, and B. Lee, “CREPE: Coordinate-Aware Cost-Efficient Document Parsing End-to-End Model”, Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), 2024. Paper
[O3] HyperCLOVA X Team, “HyperCLOVA X Technical Report”, Arxiv preprint, 2024. Paper
[C10] G. Kim, H. Lee, D. Kim, H. Jung, S. Park, Y. Kim, S. Yun, T. Kil, B. Lee, and S. Park, “Cream: Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models”, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023. Paper / Slide / Poster / GitHub
[C9] D. Kim, Y. Kim, D. Kim, Y. Lim, G. Kim, and T. Kil, “SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap”, Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), 2023. Paper / GitHub
[C8] D. Kim, T. Hong, M. Yim, Y. Kim, and G. Kim (corresponding author), “On Web-based Visual Corpus Construction for Visual Document Understanding”, Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), 2023. Paper / GitHub
[W4] G. Kim, S. Yokoo (co-first author), S. Seo, A. Osanai, Y. Okamoto and Y. Baek, “On Text Localization in End-to-End OCR-Free Document Understanding Transformer Without Text Localization Supervision”, Proceedings of the International Conference on Document Analysis and Recognition Workshops, 2023. Paper / Slide
[C7] G. Kim, T. Hong, M. Yim, J. Nam, J. Park, J. Yim, W. Hwang, S. Yun, D. Han, and S. Park, “OCR-Free Document Understanding Transformer”, Proceedings of the European Conference on Computer Vision (ECCV), 2022. Paper / Slide / Poster / GitHub / PyPi Package
[W3] G. Kim, W. Hwang, M. Seo, and S. Park, “Semi-Structured Query Grounding for Document-Oriented Databases with Deep Retrieval and Its Application to Receipt and POI Matching”, Proceedings of the AAAI-22 Workshop on Knowledge Discovery from Unstructured Data in Financial Services, 2022. Paper
[C6] W. Hwang, H. Lee, J. Yim, G. Kim, and M. Seo, “Cost-effective End-to-end Information Extraction for Semi-structured Document Images”, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021. Paper
[W2] M. Naito, S. Yokoi, G. Kim, and H. Shimodaira, “Revisiting Additive Compositionality: AND, OR and NOT Operations with Word Embeddings”, Proceedings of the ACL-IJCNLP 2021 Student Research Workshop, 2021. Paper
[C5] S. Park, G. Kim, J. Lee, J. Cha, J. Kim, and H. Lee, “Scale down Transformer by Grouping Features for a Lightweight Character-level Language Model”, Proceedings of the 28th International Conference on Computational Linguistics (COLING), 2020. Paper / GitHub
[O2] M. Mizutani, A. Okuno, G. Kim, and H. Shimodaira, “Stochastic Neighbor Embedding of Multimodal Relational Data for Image-Text Simultaneous Visualization”, Arxiv preprint, 2020. Paper
[C4] G. Kim, A. Okuno, K. Fukui, and H. Shimodaira, “Representation Learning with Weighted Inner Product for Universal Approximation of General Similarities”, Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI), 2019. Paper / GitHub / Slide
[C3] G. Kim, K. Fukui, and H. Shimodaira, “Segmentation-free Compositional n-gram Embedding”, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2019. Paper / GitHub
[C2] A. Okuno, G. Kim, and H. Shimodaira, “Graph Embedding with Shifted Inner Product Similarity and Its Improved Approximation Capability”, Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), 2019. Paper / GitHub
[C1] J. Baek, G. Kim, J. Lee, S. Park, D. Han, S. Yun, S. J. Oh, and H. Lee, “What is wrong with scene text recognition model comparisons? dataset and model analysis”, Proceedings of the 2019 IEEE International Conference on Computer Vision (ICCV), 2019. Paper / GitHub
- Selected as an oral presentation : 4.3% (187/4303)
[O1] G. Kim, A. Okuno, and H. Shimodaira, “Embedding Words into Pseudo-Euclidean Space”, Proceedings of the 25th Annual Meeting of the Association for Natural Language Processing (in Japanese), 2019. Paper
- Selected to receive both Young Researcher Award and Best Poster Award
[W1] G. Kim, K. Fukui, and H. Shimodaira, “Word-like Character n-gram Embedding”, Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text, 2018. Paper / GitHub

Invited Talks / Seminars / Teaching

I am enthusiastic about delivering guest lectures and invited talks at universities and other institutions. Please feel free to reach out via email.

(Regular Course) “AI Engineering in Production”, University of Seoul. Spring. 2025. Course Link
(Seminar) “HyperCLOVA X Vision: Open Your Eyes, CLOVA X!”, TEAM NAVER CONFERENCE DAN 24. Nov. 2024. Session Link / Slide
(Seminar) “HyperCLOVA X Vision: Open Your Eyes, CLOVA X!”, NAVER ENGINEERING DAY. Oct. 2024.
(Invited Talk) “Vision-Language Models for Context-Rich Image Understanding Tasks”, University of Seoul. Apr. 2024. Slide / Session Link
(Seminar) “Fine-Grained Evaluation of Vision-Language Models through VLM as a Judge”, NAVER Tech Meetup. Feb. 2024. Slide
(Invited Talk) “Recent Advances in Document AI”, Korea University. Mar. 2023.
(Invited Talk) “Recent Advances in Document AI”, Kookmin University. Dec. 2022.
(Invited Talk) “OCR-Free Document Understanding Transformer”, Microsoft. Nov. 2022. Slide
(Seminar) “Identifying a store from a receipt image”, Developer Conference DEVIEW. 2021. Video / Slide / Session Link
(Invited Talk) “Representation Learning with Weighted Inner Product for Universal Approximation of General Similarities”, Michinoku Communication Science Seminar, Tohoku University. May 2019. Session Link

Selected Honors & Awards

Young Researcher Award of the Twenty-fifth Annual Meeting of the Association for Natural Language Processing. 2019.
Seiwa International Students Scholarship. 2019.
Korea-Japan Joint Government Scholarship. 2013–2018.
- Admission and tuition fees, and living costs covered for a year of preliminary education and four years of Bachelor’s studies

Academic Service

Serve as a reviewer at:
- (Journal) IEEE TPAMI (from 2025), IEEE Access (from 2023)
- (Conference) CVPR (from 2024), ICCV (from 2025), Industry Track of NAACL/EMNLP/ACL/COLING (from 2022), ACL ARR (from 2024), NeurIPS 2024 Workshop Video-Langauge Models, etc.

Industry Experience / Employment

Technical Leader and Applied Research Scientist at NAVER Cloud Corp., Korea (May 2023 – Present)
- Conducting research and software engineering for developing multimodal solutions and products leveraging Large Language Models (HyperCLOVA Web Page)
- Leading and managing multiple R&D projects, including Cream and HyperCLOVA X Vision – VLM
Part-Time Lecturer at University of Seoul, Korea (Mar 2025 – Present)
- Teaching “AI Engineering in Production” (산업AI공학, Course Code: 91.035)
- Course Web Page: https://geewook.kim/lecture/uos25spring-91035
Applied Research Scientist at NAVER Corp., Korea (Apr. 2020-Apr. 2023)
- Worked on research and software engineering for Document AI family of solutions and products (CLOVA OCR, Web Demo)
- Managed and led several R&D projects, e.g., Donut, Webvicob, etc
Shimodaira Lab. (Statistics and Machine Learning), Kyoto University, Japan (Apr. 2017-Mar. 2020)
- Worked on several representation learning related projects
- Advisor : Prof. Hidetoshi Shimodaira
Mathematical Statistics Team, RIKEN Center for Advanced Intelligence Project, Japan (Sep. 2017-Feb. 2020)
- Worked on several representation learning related projects as a research part-timer / trainee
- Advisor : Prof. Hidetoshi Shimodaira
Research Internship, CLOVA OCR Team, NAVER Corp., Korea (Aug. 2018-Oct. 2018 and Aug. 2019-Sep. 2019)
- Worked on several OCR related projects as a research intern
- Advisor : Dr. Hwalsuk Lee
Engineering Internship, Recruit Holdings Co., Ltd., Japan (Feb. 2017-Mar. 2017)
- Worked on a recommender system related project as an engineering intern
Engineering Internship, ABEJA, Inc., Japan (Aug. 2016-Sep. 2016)
- Worked on a data visualization related project as an engineering intern
Planning Internship, SoftBank Group Corp., Japan (Aug. 2015-Sep. 2015)
- Worked on a market research as an intern

Education

Ph.D. in Artificial Intelligence, Kim Jaechul Graduate School of AI, Korea Advanced Institute of Science and Technology, Korea (Aug. 2023-)
- Major : Artificial Intelligence
- Laboratory : Language & Knowledge Lab.
- Advisor : Prof. Minjoon Seo
Master of Informatics, Graduate School of Informatics, Kyoto University, Japan (Apr. 2018-Mar. 2020)
- Major : Systems Science
- Laboratory : Shimodaira Lab. (Statistics and Machine Learning)
- Advisor : Prof. Hidetoshi Shimodaira
Bachelor of Engineering, School of Informatics and Mathematical Science, Kyoto University, Japan (Apr. 2014-Mar. 2018)
- Major : Informatics and Mathematical Science (Applied Mathematics and Physics Course)
- Laboratory : Shimodaira Lab. (Statistics and Machine Learning)
- Advisor : Prof. Hidetoshi Shimodaira