Research

My research interests span machine learning, focusing on both its underlying principles and practical applications. I specialize in creating robust, generalizable machine learning systems, including vision and language models, for real-world application.

Publications

Please see my Google Scholar or Semantic Scholar for a full list.

(C: Peer-Reviewed International Conference Papers, W: Peer-Reviewed Workshop Papers, O: Other Publications/Arxiv Preprints)

  • [C11] Y. Okamoto, Y. Baek, G. Kim, R. Nakao, D. Kim, M. Yim, S. Park, and B. Lee, “CREPE: Coordinate-Aware Cost-Efficient Document Parsing End-to-End Model”, Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), 2024 (to appear). Paper
  • [W5] S. Lee, S. Kim, S. Park, G. Kim, and M. Seo, “Prometheus-Vision: Vision-Language Model as a Judge for Fine-Grained Evaluation”, ICLR 2024 Workshop on Mathematical and Empirical Understanding of Foundation Models, 2024 (to appear). Paper / GitHub
  • [O3] HyperCLOVA X Team, “HyperCLOVA X Technical Report”, Arxiv preprint, 2024. Paper
  • [C10] G. Kim, H. Lee, D. Kim, H. Jung, S. Park, Y. Kim, S. Yun, T. Kil, B. Lee, and S. Park, “Cream: Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models”, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023. Paper / Slide / Poster / GitHub
  • [C9] D. Kim, Y. Kim, D. Kim, Y. Lim, G. Kim, and T. Kil, “SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap”, Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), 2023. Paper / GitHub
  • [C8] D. Kim, T. Hong, M. Yim, Y. Kim, and G. Kim, “On Web-based Visual Corpus Construction for Visual Document Understanding”, Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), 2023. Paper / GitHub Stars
  • [W4] G. Kim, S. Yokoo, S. Seo, A. Osanai, Y. Okamoto and Y. Baek, “On Text Localization in End-to-End OCR-Free Document Understanding Transformer Without Text Localization Supervision”, Proceedings of the International Conference on Document Analysis and Recognition Workshops, 2023. Paper / Slide
  • [C7] G. Kim, T. Hong, M. Yim, J. Nam, J. Park, J. Yim, W. Hwang, S. Yun, D. Han, and S. Park, “OCR-Free Document Understanding Transformer”, Proceedings of the European Conference on Computer Vision (ECCV), 2022. Paper / Slide / Poster / GitHub Stars / PyPi Package Downloads
  • [W3] G. Kim, W. Hwang, M. Seo, and S. Park, “Semi-Structured Query Grounding for Document-Oriented Databases with Deep Retrieval and Its Application to Receipt and POI Matching”, Proceedings of the AAAI-22 Workshop on Knowledge Discovery from Unstructured Data in Financial Services, 2022. Paper
  • [C6] W. Hwang, H. Lee, J. Yim, G. Kim, and M. Seo, “Cost-effective End-to-end Information Extraction for Semi-structured Document Images”, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021. Paper
  • [W2] M. Naito, S. Yokoi, G. Kim, and H. Shimodaira, “Revisiting Additive Compositionality: AND, OR and NOT Operations with Word Embeddings”, Proceedings of the ACL-IJCNLP 2021 Student Research Workshop, 2021. Paper
  • [C5] S. Park, G. Kim, J. Lee, J. Cha, J. Kim, and H. Lee, “Scale down Transformer by Grouping Features for a Lightweight Character-level Language Model”, Proceedings of the 28th International Conference on Computational Linguistics (COLING), 2020. Paper / GitHub
  • [O2] M. Mizutani, A. Okuno, G. Kim, and H. Shimodaira, “Stochastic Neighbor Embedding of Multimodal Relational Data for Image-Text Simultaneous Visualization”, Arxiv preprint, 2020. Paper
  • [C4] G. Kim, A. Okuno, K. Fukui, and H. Shimodaira, “Representation Learning with Weighted Inner Product for Universal Approximation of General Similarities”, Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI), 2019. Paper / GitHub / Slide
  • [C3] G. Kim, K. Fukui, and H. Shimodaira, “Segmentation-free Compositional n-gram Embedding”, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2019. Paper / GitHub
  • [C2] A. Okuno, G. Kim, and H. Shimodaira, “Graph Embedding with Shifted Inner Product Similarity and Its Improved Approximation Capability”, Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), 2019. Paper / GitHub
  • [C1] J. Baek, G. Kim, J. Lee, S. Park, D. Han, S. Yun, S. J. Oh, and H. Lee, “What is wrong with scene text recognition model comparisons? dataset and model analysis”, Proceedings of the 2019 IEEE International Conference on Computer Vision (ICCV), 2019. Paper / GitHub Stars
    • Selected as an oral presentation : 4.3% (187/4303)
  • [O1] G. Kim, A. Okuno, and H. Shimodaira, “Embedding Words into Pseudo-Euclidean Space”, Proceedings of the 25th Annual Meeting of the Association for Natural Language Processing (in Japanese), 2019. Paper
  • [W1] G. Kim, K. Fukui, and H. Shimodaira, “Word-like Character n-gram Embedding”, Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text, 2018. Paper / GitHub

Invited Talks / Guest Lectures

  • “Vision-Language Models for Context-Rich Image Understanding Tasks”, University of Seoul. Apr. 2024. Slide / Session Link
  • “Fine-Grained Evaluation of Vision-Language Models through VLM as a Judge”, NAVER Tech Meetup. Feb. 2024. Slide
  • “Recent Advances in Document AI”, Korea University. Mar. 2023.
  • “Recent Advances in Document AI”, Kookmin University. Dec. 2022.
  • “OCR-Free Document Understanding Transformer”, Microsoft. Nov. 2022. Slide
  • “Identifying a store from a receipt image”, Developer Conference DEVIEW. 2021. Video / Slide / Session Link
  • “Representation Learning with Weighted Inner Product for Universal Approximation of General Similarities”, Michinoku Communication Science Seminar, Tohoku University. May 2019. Session Link

Selected Honors, Awards & Services

  • Serve as a reviewer at IEEE Access, Industry Track @ (NAACL 22, EMNLP 22, ACL 23, NAACL 24), ACL ARR 24.
  • Young Researcher Award of the Twenty-fifth Annual Meeting of the Association for Natural Language Processing. 2019.
  • Seiwa International Students Scholarship. 2019.
  • Korea-Japan Joint Government Scholarship. 2013–2018.
    • Admission and tuition fees, and living costs covered for a year of preliminary education and four years of Bachelor’s studies

Career

  • Technical Leader and Applied Research Scientist at NAVER Cloud Corp. (May 2023-)
    • Working on LLM-based solutions and products through research and software engineering (Web Page)
    • Managed and led several R&D projects, e.g., Cream
  • Applied Research Scientist at NAVER Corp. (Apr. 2020-Apr. 2023)
    • Worked on research and software engineering for Document AI family of solutions and products (CLOVA OCR, Web Demo)
    • Managed and led several R&D projects, e.g., Donut, Webvicob, etc
  • Shimodaira Lab. (Statistics and Machine Learning), Kyoto University (Apr. 2017-Mar. 2020)
  • Mathematical Statistics Team, RIKEN Center for Advanced Intelligence Project (Sep. 2017-Feb. 2020)
    • Worked on several representation learning related projects as a research part-timer / trainee
    • Advisor : Prof. Hidetoshi Shimodaira
  • CLOVA OCR Team, NAVER Corp. (Aug. 2018-Oct. 2018 and Aug. 2019-Sep. 2019)
    • Worked on several OCR related projects as a research intern
    • Advisor : Dr. Hwalsuk Lee

Education

Last updated on 24.01.26