The extraction of a brief summary from scientific documents using machine learning methods
Gulden Murzabekova, Galiya Mukhamedrakhimova, Zhazira Taszhurekova, Yerbol Yerbayev, Zhanagul Doumcharieva, Valentina Makhatova, Moldir Tolganbaeva, Sandugash Serikbayeva
Abstract
This study proposes a machine learning-based approach for automatic summarization of scientific documents using a fine-tuned DistilBART model a lightweight and efficient version of the bidirectional and auto-regressive transformers (BART) architecture. The model was trained on a large corpus of 12,540 scientific articles (2015–2023) collected from the arXiv repository, enabling it to effectively capture domain-specific terminology and structural patterns. The proposed pipeline integrates advanced text preprocessing techniques, including tokenization, stopword removal, and stemming, to enhance the quality of semantic representation. Experimental evaluation demonstrates that the fine-tuned DistilBART achieves high summarization performance, with ROUGE-2=0.472 and ROUGE-L=0.602, outperforming baseline transformer-based models. Unlike conventional approaches, the method shows strong applicability beyond academic research, including automated indexing of technical documentation, metadata extraction in digital libraries, and real-time text processing in embedded natural language processing (NLP) systems. The results highlight the potential of transformer-based summarization to accelerate scientific knowledge discovery and improve the efficiency of information retrieval across various domains.
Keywords
Auto-regressive decoder; Bidirectional and auto-regressive transformers; DistilBART; Encoder; Natural language processing; Text extraction method
DOI:
https://doi.org/10.11591/eei.v14i6.10660
Refbacks
There are currently no refbacks.
This work is licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License .
<div class="statcounter"><a title="hit counter" href="http://statcounter.com/free-hit-counter/" target="_blank"><img class="statcounter" src="http://c.statcounter.com/10241695/0/5a758c6a/0/" alt="hit counter"></a></div>
Bulletin of EEI Stats
Bulletin of Electrical Engineering and Informatics (BEEI) ISSN: 2089-3191 , e-ISSN: 2302-9285 This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU) .