자연어처리 연구실(지도교수: 김학수 교수)의 논문 2편이 인공지능 및 자연어처리 분야에서 저명한 국제 학술대회(BK21 우수학술대회)인 The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation(LREC-COLING)에 최종 논문 게재되었습니다.
논문 #1: "Title-based Extractive Summarization via MRC Framework" (인공지능학과 박사과정 김홍진)
[Abstract] Existing studies on extractive summarization have primarily focused on scoring and selecting summary sentences independently. However, these models are limited to sentence-level extraction and tend to select highly generalized sentences while overlooking the overall content of a document. To effectively consider the semantics of a document, in this study, we introduce a novel machine reading comprehension (MRC) framework for extractive summarization (MRCSUM) by setting a query as the title. Our framework enables MRCSUM to consider the semantic coherence and relevance of summary sentences in relation to the overall content. In particular, when a title is not available, we generate a TITLE-LIKE QUERY, which is expected to achieve the same effect as a title. Our TITLE-LIKE QUERY consists of the topic and keywords to serve as information on the main topic or theme of the document. We conduct experiments in both Korean and English languages, evaluating the performance of MRCSUM on datasets comprising both long and short summaries. Our results demonstrate the effectiveness of MRCSUM in extractive summarization, showcasing its ability to generate concise and informative summaries with or without explicit titles. Furthermore, our MRCSUM outperforms existing models by capturing the essence of the document content and producing more coherent summaries.
링크: https://aclanthology.org/2024.lrec-main.1406/
논문 #2: "Bridging the Code Gap: A Joint Learning Framework Across Medical Coding Systems" (인공지능학과 석박통합과정 정근영, 인공지능학과 박사과정 선주오)
[Abstract] Automated Medical Coding (AMC) is the task of automatically converting free-text medical documents into predefined codes according to a specific medical coding system. Although deep learning has significantly advanced AMC, the class imbalance problem remains a significant challenge. To address this issue, most existing methods consider only a single coding system and disregard the potential benefits of reflecting the relevance between different coding systems. To bridge this gap, we introduce a Joint learning framework for Across Medical coding Systems (JAMS), which jointly learns different coding systems through multi-task learning. It learns various representations using a shared encoder and explicitly captures the relationships across these coding systems using the medical code attention network, a modification of the graph attention network. In the experiments on the MIMIC-IV ICD-9 and MIMIC-IV ICD-10 datasets, connected through General Equivalence Mappings, JAMS improved the performance consistently regardless of the backbone models. This result demonstrates its model-agnostic characteristic, which is not constrained by specific model structures. Notably, JAMS significantly improved the performance of low-frequency codes. Our analysis shows that these performance gains are due to the connections between the codes of the different coding systems.
링크: https://aclanthology.org/2024.lrec-main.227/