Research

My research interests span machine learning, focusing on both its underlying principles and practical applications. I specialize in creating robust, generalizable machine learning systems for real-world application.

Selected Projects

Highlighted Open Source Projects

I have built several open source repositories as part of my journey in the field of machine learning and artificial intelligence. If you have any questions or feedback, feel free to leave an issue or email me. Your input is greatly appreciated!

List of Publications

Please see my Google Scholar or Semantic Scholar for an up-to-date list.

(C: Peer-Reviewed International Conference Papers, W: Peer-Reviewed Workshop Papers, O: Other Publications/Arxiv Preprints)

  • [O5] S. Park, and G. Kim (co-first and corresponding author), “Evaluating Multimodal Generative AI with Korean Educational Standards”, Arxiv preprint, 2024 (to appear).
  • [O4] S. Lee, G. Kim (co-first author), J. Kim (co-first author), H. Lee, H. Chang, S. Park, and M. Seo, “How Does Vision-Language Adaptation Impact the Safety of Vision Language Models?”, Arxiv preprint, 2024. Paper
  • [C13] G. Kim, and M. Seo, “On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning”, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024 (to appear). Paper
  • [C12] S. Lee, S. Kim, S. Park, G. Kim, and M. Seo, “Prometheus-Vision: Vision-Language Model as a Judge for Fine-Grained Evaluation”, Findings of the Association for Computational Linguistics (ACL Findings), 2024. Paper / GitHub
  • [C11] Y. Okamoto, Y. Baek, G. Kim, R. Nakao, D. Kim, M. Yim, S. Park, and B. Lee, “CREPE: Coordinate-Aware Cost-Efficient Document Parsing End-to-End Model”, Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), 2024. Paper
  • [O3] HyperCLOVA X Team, “HyperCLOVA X Technical Report”, Arxiv preprint, 2024. Paper
  • [C10] G. Kim, H. Lee, D. Kim, H. Jung, S. Park, Y. Kim, S. Yun, T. Kil, B. Lee, and S. Park, “Cream: Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models”, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023. Paper / Slide / Poster / GitHub
  • [C9] D. Kim, Y. Kim, D. Kim, Y. Lim, G. Kim, and T. Kil, “SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap”, Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), 2023. Paper / GitHub
  • [C8] D. Kim, T. Hong, M. Yim, Y. Kim, and G. Kim (corresponding author), “On Web-based Visual Corpus Construction for Visual Document Understanding”, Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), 2023. Paper / GitHub Stars
  • [W4] G. Kim, S. Yokoo (co-first author), S. Seo, A. Osanai, Y. Okamoto and Y. Baek, “On Text Localization in End-to-End OCR-Free Document Understanding Transformer Without Text Localization Supervision”, Proceedings of the International Conference on Document Analysis and Recognition Workshops, 2023. Paper / Slide
  • [C7] G. Kim, T. Hong, M. Yim, J. Nam, J. Park, J. Yim, W. Hwang, S. Yun, D. Han, and S. Park, “OCR-Free Document Understanding Transformer”, Proceedings of the European Conference on Computer Vision (ECCV), 2022. Paper / Slide / Poster / GitHub Stars / PyPi Package Downloads
  • [W3] G. Kim, W. Hwang, M. Seo, and S. Park, “Semi-Structured Query Grounding for Document-Oriented Databases with Deep Retrieval and Its Application to Receipt and POI Matching”, Proceedings of the AAAI-22 Workshop on Knowledge Discovery from Unstructured Data in Financial Services, 2022. Paper
  • [C6] W. Hwang, H. Lee, J. Yim, G. Kim, and M. Seo, “Cost-effective End-to-end Information Extraction for Semi-structured Document Images”, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021. Paper
  • [W2] M. Naito, S. Yokoi, G. Kim, and H. Shimodaira, “Revisiting Additive Compositionality: AND, OR and NOT Operations with Word Embeddings”, Proceedings of the ACL-IJCNLP 2021 Student Research Workshop, 2021. Paper
  • [C5] S. Park, G. Kim, J. Lee, J. Cha, J. Kim, and H. Lee, “Scale down Transformer by Grouping Features for a Lightweight Character-level Language Model”, Proceedings of the 28th International Conference on Computational Linguistics (COLING), 2020. Paper / GitHub
  • [O2] M. Mizutani, A. Okuno, G. Kim, and H. Shimodaira, “Stochastic Neighbor Embedding of Multimodal Relational Data for Image-Text Simultaneous Visualization”, Arxiv preprint, 2020. Paper
  • [C4] G. Kim, A. Okuno, K. Fukui, and H. Shimodaira, “Representation Learning with Weighted Inner Product for Universal Approximation of General Similarities”, Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI), 2019. Paper / GitHub / Slide
  • [C3] G. Kim, K. Fukui, and H. Shimodaira, “Segmentation-free Compositional n-gram Embedding”, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2019. Paper / GitHub
  • [C2] A. Okuno, G. Kim, and H. Shimodaira, “Graph Embedding with Shifted Inner Product Similarity and Its Improved Approximation Capability”, Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), 2019. Paper / GitHub
  • [C1] J. Baek, G. Kim, J. Lee, S. Park, D. Han, S. Yun, S. J. Oh, and H. Lee, “What is wrong with scene text recognition model comparisons? dataset and model analysis”, Proceedings of the 2019 IEEE International Conference on Computer Vision (ICCV), 2019. Paper / GitHub Stars
    • Selected as an oral presentation : 4.3% (187/4303)
  • [O1] G. Kim, A. Okuno, and H. Shimodaira, “Embedding Words into Pseudo-Euclidean Space”, Proceedings of the 25th Annual Meeting of the Association for Natural Language Processing (in Japanese), 2019. Paper
  • [W1] G. Kim, K. Fukui, and H. Shimodaira, “Word-like Character n-gram Embedding”, Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text, 2018. Paper / GitHub

Invited Talks / Seminars / Guest Lectures

I am enthusiastic about delivering guest lectures and invited talks at universities and other institutions. Please feel free to reach out via email.

  • [T9] “HyperCLOVA X Vision: Open Your Eyes, CLOVA X!”, TEAM NAVER CONFERENCE DAN 24. Nov. 2024. Session Link / Slide
  • [T8] “HyperCLOVA X Vision: Open Your Eyes, CLOVA X!”, NAVER ENGINEERING DAY. Oct. 2024.
  • [T7] “Vision-Language Models for Context-Rich Image Understanding Tasks”, University of Seoul. Apr. 2024. Slide / Session Link
  • [T6] “Fine-Grained Evaluation of Vision-Language Models through VLM as a Judge”, NAVER Tech Meetup. Feb. 2024. Slide
  • [T5] “Recent Advances in Document AI”, Korea University. Mar. 2023.
  • [T4] “Recent Advances in Document AI”, Kookmin University. Dec. 2022.
  • [T3] “OCR-Free Document Understanding Transformer”, Microsoft. Nov. 2022. Slide
  • [T2] “Identifying a store from a receipt image”, Developer Conference DEVIEW. 2021. Video / Slide / Session Link
  • [T1] “Representation Learning with Weighted Inner Product for Universal Approximation of General Similarities”, Michinoku Communication Science Seminar, Tohoku University. May 2019. Session Link

Selected Honors & Awards

  • Young Researcher Award of the Twenty-fifth Annual Meeting of the Association for Natural Language Processing. 2019.
  • Seiwa International Students Scholarship. 2019.
  • Korea-Japan Joint Government Scholarship. 2013–2018.
    • Admission and tuition fees, and living costs covered for a year of preliminary education and four years of Bachelor’s studies

Academic Service

  • Serve as a reviewer at: NAACL 2022 Industry Track, EMNLP 2022 Industry Track, ACL 2023 Industry Track, NAACL 2024 Industry Track, COLING 2025 Industry Track, NAACL 2025 Industry Track, ACL 2024 (ARR 2024 Feb), ARR 2024 Apr, EMNLP 2024 (ARR 2024 June), ARR 2024 Aug, NAACL 2025 (ARR 2024 Oct), IEEE Access, NeurIPS 2024 Workshop Video-Langauge Models, etc.

Industry Experience

  • Technical Leader and Applied Research Scientist at NAVER Cloud Corp., Korea (May 2023-)
    • Working on LLM-based multimodal solutions and products through research and software engineering (Web Page)
    • Managed and led several R&D projects, e.g., Cream, HyperCLOVA X Vision - VLM, etc
  • Applied Research Scientist at NAVER Corp., Korea (Apr. 2020-Apr. 2023)
    • Worked on research and software engineering for Document AI family of solutions and products (CLOVA OCR, Web Demo)
    • Managed and led several R&D projects, e.g., Donut, Webvicob, etc
  • Shimodaira Lab. (Statistics and Machine Learning), Kyoto University, Japan (Apr. 2017-Mar. 2020)
  • Mathematical Statistics Team, RIKEN Center for Advanced Intelligence Project, Japan (Sep. 2017-Feb. 2020)
    • Worked on several representation learning related projects as a research part-timer / trainee
    • Advisor : Prof. Hidetoshi Shimodaira
  • Research Internship, CLOVA OCR Team, NAVER Corp., Korea (Aug. 2018-Oct. 2018 and Aug. 2019-Sep. 2019)
    • Worked on several OCR related projects as a research intern
    • Advisor : Dr. Hwalsuk Lee
  • Engineering Internship, Recruit Holdings Co., Ltd., Japan (Feb. 2017-Mar. 2017)
    • Worked on a recommender system related project as an engineering intern
  • Engineering Internship, ABEJA, Inc., Japan (Aug. 2016-Sep. 2016)
    • Worked on a data visualization related project as an engineering intern
  • Planning Internship, SoftBank Group Corp., Japan (Aug. 2015-Sep. 2015)
    • Worked on a market research as an intern

Education

Last updated on 24.07.06