Citation: Yang, S.; Zhang, S.; Fang,
M.; Yang, F.; Liu, S. A Hierarchical
Representation Model Based on
Longformer and Transformer for
Extractive Summarization. Electronics
2022, 11, 1706. https://doi.org/
10.3390/electronics11111706
Academic Editors: Phivos Mylonas,
Katia Lida Kermanidis and
Manolis Maragoudakis
Received: 30 April 2022
Accepted: 24 May 2022
Published: 27 May 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Article
A Hierarchical Representation Model Based on Longformer and
Transformer for Extractive Summarization
Shihao Yang , Shaoru Zhang, Ming Fang, Fengqin Yang and Shuhua Liu *
School of Information Science and Technology, Northeast Normal University, Changchun 130117, China;
yangsh861@nenu.edu.cn (S.Y.); zhangsr030@nenu.edu.cn (S.Z.); fangm000@nenu.edu.cn (M.F.);
yangfq147@nenu.edu.cn (F.Y.)
* Correspondence: liush129@nenu.edu.cn
Abstract:
Automatic text summarization is a method used to compress documents while preserving
the main idea of the original text, including extractive summarization and abstractive summarization.
Extractive text summarization extracts important sentences from the original document to serve
as the summary. The document representation method is crucial for the quality of the generated
summarization. To effectively represent the document, we propose a hierarchical document represen-
tation model Long-Trans-Extr for Extractive Summarization, which uses Longformer as the sentence
encoder and Transformer as the document encoder. The advantage of Longformer as sentence
encoder is that the model can input long document up to 4096 tokens with adding relative a little
calculation. The proposed model Long-Trans-Extr is evaluated on three benchmark datasets: CNN
(Cable News Network), DailyMail, and the combined CNN/DailyMail. It achieves 43.78 (Rouge-1)
and 39.71 (Rouge-L) on CNN/DailyMail and 33.75 (Rouge-1), 13.11 (Rouge-2), and 30.44 (Rouge-L)
on the CNN datasets. They are very competitive results, and furthermore, they show that our model
has better performance on long documents, such as the CNN corpus.
Keywords: extractive summarization; transformer; longformer; deep learning
1. Introduction
Since Luhn [
1
] started automatic summarization research in 1958, great achievements
have been made in this field. Text summarization can be divided into two categories:
namely, abstractive and extractive summarization. Abstractive summarization [
2
] refines
its ideas and concepts on the basis of understanding the semantic meaning of the original
text to realize semantic reconstruction. Although more similar to the logic of human beings,
abstractive summarization still faces a great challenge to produce a coherent, grammatical,
and general summary of the original text, due to the limitations of natural language
generation technology. The extractive summarization method extracts key sentences from
a document to generate a summary. The input document is initially encoded, and then, the
scores of sentences in the document are calculated. The sentences are sorted according to
the scores, and those with high scores are selected to form a summary.
This study focuses on extractive summarization, since it not only generates semanti-
cally and grammatically correct sentences in news articles but also computes faster than
abstractive summarization. At present, both generative and extractive summarization
methods have some difficulties in processing long text, which is caused by the computa-
tional complexity of the encoder network. Recent studies have shown that Transformer [
3
]
outperforms LSTM [
4
] in the area of natural language processing, both in terms of ex-
perimental results and computational complexity. However, even Transformer, which is
capable of parallel computation, is unable to handle long text, resulting in the text summa-
rization method being limited to short text. For a long text, there are usually two processing
methods: (1) Discard the exceeding part directly. This method is simple to implement,
but it has a great impact on the quality of the final summary. (2) Divide the long text into
Electronics 2022, 11, 1706. https://doi.org/10.3390/electronics11111706 https://www.mdpi.com/journal/electronics