This thesis investigates recent techniques for transfer learning and their influence on machine summarization systems. A current trend in Natural Language Processing (NLP) is to pre-train extensive language models in advance and adapt these to address problems in various task domains. Since these techniques have rarely been investigated in the context of text summarization, this thesis develops a workflow to integrate and evaluate pre-trained language models in neural text summarization. Based on news articles of the CNN / DailyMail dataset  and the CopyNet  summarization model, the conducted experiments show that transfer learning can have a positive impact on summarising texts. Further findings suggest that datasets with less historical data are more likely to benefit from transfer learning. On the other hand, however, this work demonstrates that the components of text summarization models limit the abilities of state-of-the-art transfer learning techniques. In the field of machine learning, this thesis is designed for readers interested in the state-of-the-art in transfer learning for NLP and its influence on the generation of summaries.
Summarizing is the ability of writing a brief summary of the essential content given in a text. Humans use their literacy to understand the overall meaning of the text, identify crucial parts and write the summary in their own words. Driven by the rise of the world wide web, however, the amount of publicly available texts has been rising sharply. The overwhelming extent of resources reaches the limits of human abilities to process the available data. In this context, automatic summarization systems have a great potential to compress texts and to aid users to focus on the essentials.
Two types of approaches for automatic summarization systems can be distinguished. Extractive methods aim to identify the crucial information of a written text and solely copy these parts as summary [15, 80]. As a result, longer sequences of words in a summary are usually not fluid and easily readable text. To encounter this, abstractive methods aim to express the summaries in coherent and fluent text [75, 61]. However, teaching a computer to summarize with abstractive methods is a complicated task. This requires a brief introduction to the field of Natural Language Processing (NLP).
The understanding and processing of natural language is still a challenging task in artificial intelligence. Computational techniques are applied to understand single characters and words, the connection between words and the big picture of a text. On top of this, computers cannot interpret words as they are but have to transfer them to numerical representations. Fortunately, the developments in the field of machine learning show promising results in learning machine-readable word representations [58, 64].
In the field of text summarization, recent abstractive methods profit from the rise of deep neural networks . First, neural approaches learn deep representations to understand the overall meaning of the text and to identify crucial parts. Besides this, the second challenge in abstractive text summarization is to write a summary with sequences of words in a fluent style. The neural approaches in abstractive text summarization face this challenge by writing new sequences word-by-word and are thus categorized as sequence-to-sequence models. This type of models has recently been addressed with neural networks by separating the reading (encoding) of words from the writing (decoding) of words .
From another point of view, a summarization system is optimized for the objective of a single task. In contrast to this, a question answering system, for instance, is only applicable to understand and answer questions. Regarding the humans’ behavior, this workflow would be on par with learning the literacy and the task-specific skills from scratch. Consequently, the human ability to transfer learnings is essential to solving unseen and new tasks in natural language.
In order to be able to reuse previously learned knowledge, transfer learning methods share beneficial information across multiple tasks. In other fields of machine learning like computer vision, transfer learning for image classification with ImageNet  has become a widespread workflow [7, 70]. For natural language, word embeddings compress the sparse input data and capture the meaning of words in a dense representation [58, 64]. Even though these representations cannot overcome the obstacle of learning the task- and domain-specific knowledge from scratch, word embeddings provide a better starting point for the initial layer in neural networks.
Going one step further, recent approaches in NLP transfer deep neural networks for multiple text classification tasks [65, 68, 19]. The large-scale models are trained once and subsequently adapted to several tasks with varying objectives like question answering or natural language inference (NLI) . This process is referred to as sequential transfer learning . Since approaches in sequential transfer learning incorporate a deep language understanding across multiple tasks, they have great potential to address the challenges in text summarization.