{"id":21109,"date":"2019-07-08T08:29:16","date_gmt":"2019-07-08T06:29:16","guid":{"rendered":"https:\/\/www.inovex.de\/blog\/?p=16431"},"modified":"2022-11-24T10:37:43","modified_gmt":"2022-11-24T09:37:43","slug":"text-summarization-seq2seq-neural-networks","status":"publish","type":"post","link":"https:\/\/www.inovex.de\/de\/blog\/text-summarization-seq2seq-neural-networks\/","title":{"rendered":"Summarizing Long Texts with Seq2Seq Neural Networks"},"content":{"rendered":"<p>This blog post describes my master thesis &#8222;Abstractive Summarization for Long Texts&#8220;. We\u2019ve extended existing state-of-the-art\u00a0 sequence-to-sequence (Seq2Seq) neural networks to process documents across content windows. By shifting the objective towards the learning of inter-window transitions, we circumvent the limitation of existing models which can summarize documents only up to a certain length.\u00a0 With our windowing model we are able to process arbitrary long texts during inference. We evaluate on CNN\/Dailymail <a href=\"https:\/\/github.com\/abisee\/cnn-dailymail\">[1]<\/a> and WikiHow <a href=\"https:\/\/arxiv.org\/abs\/1810.09305\">[2]<\/a>.<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_83 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\"><p class=\"ez-toc-title\" style=\"cursor:inherit\"><\/p>\n<\/div><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.inovex.de\/de\/blog\/text-summarization-seq2seq-neural-networks\/#The-standard-architecture-for-abstractive-summarization-in-a-nutshell\" >The standard architecture for abstractive summarization in a nutshell<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.inovex.de\/de\/blog\/text-summarization-seq2seq-neural-networks\/#Limitation-of-the-standard-Seq2Seq-model\" >Limitation of the standard Seq2Seq model<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.inovex.de\/de\/blog\/text-summarization-seq2seq-neural-networks\/#Solution-Windowed-attention-and-inter-window-transition-learning\" >Solution: Windowed attention and inter-window transition learning<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.inovex.de\/de\/blog\/text-summarization-seq2seq-neural-networks\/#Results-measured-by-ROUGE-on-Validation-Sets\" >Results measured by ROUGE on Validation Sets<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.inovex.de\/de\/blog\/text-summarization-seq2seq-neural-networks\/#Conclusion\" >Conclusion<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.inovex.de\/de\/blog\/text-summarization-seq2seq-neural-networks\/#Join-us\" >Join us!<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.inovex.de\/de\/blog\/text-summarization-seq2seq-neural-networks\/#Sources\" >Sources<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"The-standard-architecture-for-abstractive-summarization-in-a-nutshell\"><\/span>The standard architecture for abstractive summarization in a nutshell<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p style=\"text-align: left;\">In a typical recurrent encoder-decoder Seq2Seq model used for abstractive summarization, a recurrent neural network (RNN) is trained to map an input sequence \\(x_1, x_2,&#8230;,x_{T_x}\\) of word vectors to an output sequence \\(y_1,&#8230;,y_{T_y}\\) which is not necessarily of the same length. Each of the \\(x_1, x_2,&#8230;,x_{T_x}\\) words is coming from a fixed vocabulary \\(\\mathcal{V}\\) of size \\(\\left| \\mathcal{V} \\right|\\). The Seq2Seq model directly models the probability of the summary<\/p>\n<p style=\"text-align: center;\">\\(\\sum_{t=1}^{T_y}P(y_t|y_{t-1},y_{t-2},&#8230;,y_1,c_t; \\theta)\\).<\/p>\n<p style=\"text-align: left;\">Given a set of \\(N\\) training examples \\(\\{x^{(i)}, y^{(i)}\\}_{i=1}^N\\), the usual training objective is to maximize the log-likelihood or equivalently minimize the negative log-likelihood of the training data,<\/p>\n<p style=\"text-align: center;\">\\(\\tilde{\\theta}_{MLE} = argmax_{\\theta}\\{\\mathcal{L}_{mle}\\}\\)<\/p>\n<p>where<\/p>\n<p style=\"text-align: center;\">\\(\\mathcal{L}_{mle}(\\theta) = \\sum_{i=1}^N \\sum_{t=1}^{T_y^{(i)}} \\mathrm{log}\\ P(y_t^{(i)}|y_{t-1}^{(i)},y_{t-2}^{(i)},&#8230;,y_1^{(i)},c_t^{(i)}; \\theta)\\)<\/p>\n<p>A typical encoder variant is a bidirectional RNN, which yields an encoded representation \\(h_j = [\\overrightarrow{h}_j ; \\overleftarrow{h}_j]\\) for each input symbol. The decoder is usually trained to act as a conditional language model which attempts to model the probability of the next target word conditioned on the input sequence and the target history. At each decoding step, it uses attention to focus on parts of the document relevant for the next prediction step. It forms an attention context vector\u00a0 \\(c_t = \\sum_{j=1}^{T_x} \\alpha_{tj}h_j \\label{eq:convec}\\) with attention weights \\(\\alpha_{tj} =\\frac{exp(e_{tj})}{\\sum_{k=1}^{T_x} exp(e_{tk})}\\) and energies \\(e_{tj} =a(s_{t-1}, h_j)\\), where \\(s_{t-1}\\) is the previous decoder hidden state. In case of attention by Luong <a href=\"https:\/\/arxiv.org\/abs\/1508.04025\">[3]<\/a>, the alignment is \\(a(s_t, h_j) = s_t^T h_j\\).<\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-16443 aligncenter\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/06\/attention-300x142.png\" alt=\"\" width=\"600\" height=\"285\" \/><strong>Attention on bidirectional encoded hidden states forming the context vector<\/strong><\/p>\n<p>In attention by Luong, the decoder hidden state is updated by \\(s_t\u00a0 = \\tilde{f}(s_{t-1}, y_{t-1})\\). The conditional distribution of the next word over the vocabulary\u00a0\\(\\mathcal{V}\\) is<\/p>\n<p style=\"text-align: center;\">\\(P_{\\mathcal{V}}(y_t|y_{t-1},y_{t-2},&#8230;,y_1,c_t) =\u00a0\\frac{\\mathrm{exp}(l_{tk})}{\\sum_{k=1}^{\\mathcal{V}} \\mathrm{exp}(l_{tk})}\\),<\/p>\n<p>where the energy distribution is calculated by \\(l_t = W^l \\mathrm{tanh}([c_t;s_t]) + b_l\\).<\/p>\n<p style=\"text-align: center;\"><a href=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/06\/pointer.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-16446\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/06\/pointer-300x99.png\" alt=\"Schematic including the pinter generator.\" width=\"600\" height=\"198\" \/><\/a><strong>Decoder with pointer-generator generates composed distribution to predict next word<\/strong><\/p>\n<p>The model is still constrained to model from a fixed vocabulary \\(\\mathcal{V}\\). If words are out-of-this-vocabulary, we cannot model them. To fix this, we use a pointer-generator network <a href=\"https:\/\/arxiv.org\/abs\/1704.04368\">[4]<\/a> which additionally allows copying form source. A generation probability \\(p_{gen}\\) is calculated end-to-end at each decoding step \\(t\\). It serves as a soft switch between sampling form \\(P_V\\) and copying from the input sequence. The conditional probability for a word \\(k\\) over the extended vocabulary \\(\\tilde{\\mathcal{V}}\\) is<\/p>\n<p style=\"text-align: center;\">\\(P_{\\tilde{V}}(y_t=k|y_{t-1},y_{t-2},&#8230;,y_1,c_t) = p_{gen} P_V[k] + (1-p_{gen}) \\sum_{j:x_j=k} \\alpha_{tj}\\),<\/p>\n<p>where \\(\\tilde{\\mathcal{V}}\\) is defined as the union of \\(\\mathcal{V}\\) and all words in the source document.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Limitation-of-the-standard-Seq2Seq-model\"><\/span>Limitation of the standard Seq2Seq model<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>When summarizing an article with the standard model, we have to truncate the document to the threshold \\(T_x\\) that we trained on. For CNN\/Dailymail, this is typically around \\(400\\) words.<\/p>\n<p style=\"text-align: center;\"><a href=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/06\/normal.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-16442\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/06\/normal-300x181.png\" alt=\"A text with highlighted words for the summarization.\" width=\"600\" height=\"362\" \/><\/a><strong>Maximum attention visualization for standard model<\/strong><\/p>\n<p>An alternative would be to summarize longer text in chunks. However, this limits the coherence of the final summary as semantic information cannot flow between chunks. On top, finding the right chunking break points is non-trivial, as we have to ensure that at least locally semantic coherent phrases are within the same chunk.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Solution-Windowed-attention-and-inter-window-transition-learning\"><\/span>Solution: Windowed attention and inter-window transition learning<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>We extend the standard recurrent Seq2Seq model with pointer-generator to process text across content windows. Attention is performed only at the window-level. A decoder shared across all windows spanning over the respective document poses a link between attentive fragments as the decoder has the ability to preserve semantic information from previous windows. The main idea is to transform the learning objective to a local decision problem. The model slides only in forward direction and processes information from the current window based on the history accumulated over previous windows.\u00a0By learning local transitions, during inference on long documents this capability can be exploited to process arbitrary long texts.\u00a0The window size \\(\\texttt{ws}\\) determines the number of words processed by attention. The step size \\(\\texttt{ss}\\) specifies how many words the window progresses every time it slides forward.\u00a0The key point is to determine when to slide the window. In our dynamic approach, the decoder ought to learn to generate a symbol (\\(\\rightarrow\\)) to signal saturation of the current window. However, when learning transitions (saturation points) end-to-end, we require some sort of supervision during training to steer the model towards the generation of the aforementioned transition symbol.<\/p>\n<p style=\"text-align: center;\"><a href=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/06\/overview.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-16440\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/06\/overview-300x210.png\" alt=\"Schematic including the decoder.\" width=\"600\" height=\"420\" \/><\/a><strong>The decoder uses the \\(\\rightarrow\\) to signal when it wants to move to the next window.<\/strong><\/p>\n<p>We construct the supervision via an unsupervised mapping heuristic between source and reference summary sentences. Each reference summary sentence \\(S_{r_i}\\) is scored against every source sentence \\(D_i\\) to determine the similarity between source and gold truth summary sentences. Two sentences are contrasted using cosine similarity, with<\/p>\n<p>\\(Sim(S_{r_i}, D_i) = \\frac{a(S_{r_i}) \\cdot a(D_i)}{\\left|\\left|a(S_{r_i})\\right|\\right| \\cdot \\left|\\left|a(D_i)\\right|\\right|}\\),<\/p>\n<p>where \\(a(S_{r_i})\\) aggregates the word embeddings \\(w_j \\in S_{r_i}\\) by simple summation, i.e. \\(a(S_{r_i}) = \\sum_j w_j\\). Using these pseudo links between source and target summaries, we can reconstruct the hypothetical windows from which the writing of the ground truth sentences was most likely inspired by. Each target sentence receives a window number. Whenever a shift occurs, we insert the \\(\\rightarrow\\) symbol after the respective sentence in the ground truth summary.<\/p>\n<p>Back to the example. We can see that we are now able to process the entire document. The model ejects \\(\\rightarrow\\) at some point, thereby signaling saturation of the first window. It then continues decoding the summary from the second window.<\/p>\n<p><a href=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/06\/dynamic.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-16441\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/06\/dynamic-300x257.png\" alt=\"Better prediction using maximum attention visualization.\" width=\"600\" height=\"513\" \/><\/a><\/p>\n<p style=\"text-align: center;\">\u00a0<strong>Maximum attention visualization for windowing model<\/strong><\/p>\n<p>In an ultimate stress test, we apply our model to the Wikipedia entry about Lionel Messi with a document length of \\(18,136\\) words which drastically exceeds the training boundary for the windowing model of \\(T_x=1,160\\).<\/p>\n<p style=\"text-align: center;\"><a href=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/06\/messisumm.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-16439\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/06\/messisumm-300x157.png\" alt=\"Predicted summarization of an extremely long text marked in different colors.\" width=\"600\" height=\"315\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/06\/messisumm-300x157.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/06\/messisumm-768x403.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/06\/messisumm-400x210.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/06\/messisumm-360x189.png 360w, https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/06\/messisumm.png 992w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/a><strong>Predicted summary for very long document (Wikipedia entry about Lionel Messi, extracted 2019\/06\/07) with windowing model<\/strong><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Results-measured-by-ROUGE-on-Validation-Sets\"><\/span>Results measured by ROUGE on Validation Sets<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n<table id=\"tablepress-20\" class=\"tablepress tablepress-id-20\">\n<thead>\n<tr class=\"row-1\">\n\t<th class=\"column-1\">Sentence<\/th><th class=\"column-2\">GMO<\/th><th class=\"column-3\">Diseases<\/th><th class=\"column-4\">Organic<\/th>\n<\/tr>\n<\/thead>\n<tbody class=\"row-striping row-hover\">\n<tr class=\"row-2\">\n\t<td class=\"column-1\">\"GMO food is bad\"<\/td><td class=\"column-2\">0.7<\/td><td class=\"column-3\">0.2<\/td><td class=\"column-4\">0.1<\/td>\n<\/tr>\n<tr class=\"row-3\">\n\t<td class=\"column-1\">\"GMO food causes chronic diseases\"<\/td><td class=\"column-2\">0.45<\/td><td class=\"column-3\">0.45<\/td><td class=\"column-4\">0.1<\/td>\n<\/tr>\n<tr class=\"row-4\">\n\t<td class=\"column-1\">\"Organic food is healthy\"<\/td><td class=\"column-2\">0.1<\/td><td class=\"column-3\">0.1<\/td><td class=\"column-4\">0.8<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<!-- #tablepress-20 from cache -->\n<p>The windowing models are often slightly worse compared to the standard model in terms of performance measured by ROUGE <a href=\"https:\/\/www.aclweb.org\/anthology\/W04-1013\">[5]<\/a>. For CNN\/Dailymail, this is due to the information bias towards the first few sentences and the highly extractive nature of the summaries. That&#8217;s also why the LEAD-3 baseline achieves the best performance by simply taking the first three sentences of the document as summary. For WikiHow and in general, the standard model leads to higher scores due to its flexibility, as it can attend to the whole document at each decoding step, while a sliding windowing model has only access to a small content window. We would need an evaluation dataset with longer documents to prove the benefit of the windowing model in terms of performance. Alternatively, we can limit the setup for existing datasets: When comparing a windowing model (WIKI_II) that has \\(\\texttt{ws}=200\\) and is trained to slide to document lengths up to \\(740\\) with a standard model (WIKI_I) that has only access to the first window, we can see that the windowing model clearly prevails.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span>Conclusion<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>We succeed in extending recurrent Seq2Seq models for summarization of arbitrary long texts. The windowing model learns transitions on the dataset and is able to extrapolate with respect to document length on texts of any length during inference. As the standard model has more flexibility than it can attend over the entire document during training, we can prove a benefit measured by ROUGE only when restricting the length scope of the datasets.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Join-us\"><\/span>Join us!<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Learn more about our <a href=\"https:\/\/www.inovex.de\/en\/our-services\/machine-perception-artificial-intelligence\/\">Machine Perception portfolio at inovex.de<\/a> or consider joining us as a <a href=\"https:\/\/www.inovex.de\/de\/karriere\/stellenangebote\/\">Machine Learning Engineer<\/a>.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Sources\"><\/span>Sources<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>[1] <a href=\"https:\/\/github.com\/abisee\/cnn-dailymail\">CNN\/Dailymail dataset<\/a><\/p>\n<p>[2] <a href=\"https:\/\/arxiv.org\/abs\/1810.09305\">WikiHow dataset\u00a0<\/a><\/p>\n<p>[3] <a href=\"https:\/\/arxiv.org\/abs\/1508.04025\">Effective Approaches to Attention-based Neural Machine Translation<\/a> (Minh-Thang Luong, Hieu Pham, Christopher D. Manning)<\/p>\n<p>[4] <a href=\"https:\/\/arxiv.org\/abs\/1704.04368\">Get To The Point: Summarization with Pointer-Generator Networks<\/a> (Abigail See, Peter J. Liu, Christopher D. Manning)<\/p>\n<p>[5]\u00a0<a href=\"https:\/\/www.aclweb.org\/anthology\/W04-1013\">ROUGE: A Package for Automatic Evaluation of Summaries<\/a> (Chin-Yew Lin)<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This blog post describes my master thesis &#8222;Abstractive Summarization for Long Texts&#8220;. We\u2019ve extended existing state-of-the-art\u00a0 sequence-to-sequence (Seq2Seq) neural networks to process documents across content windows. By shifting the objective towards the learning of inter-window transitions, we circumvent the limitation of existing models which can summarize documents only up to a certain length.\u00a0 With our [&hellip;]<\/p>\n","protected":false},"author":117,"featured_media":16532,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"ep_exclude_from_search":false,"footnotes":""},"tags":[509,151,141],"service":[76,75],"coauthors":[{"id":117,"display_name":"Leon Sch\u00fcller","user_nicename":"lschueller"}],"class_list":["post-21109","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","tag-ai-2","tag-deep-learning","tag-nlp","service-artificial-intelligence","service-nlp"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Summarizing Long Texts with Seq2Seq Neural Networks - inovex GmbH<\/title>\n<meta name=\"description\" content=\"We extend state-of-the-art\u00a0 sequence-to-sequence neural networks for summarization of long text across windows. By learning transitions, we are able to process arbitrarily long texts during inference.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.inovex.de\/de\/blog\/text-summarization-seq2seq-neural-networks\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Summarizing Long Texts with Seq2Seq Neural Networks - inovex GmbH\" \/>\n<meta property=\"og:description\" content=\"We extend state-of-the-art\u00a0 sequence-to-sequence neural networks for summarization of long text across windows. By learning transitions, we are able to process arbitrarily long texts during inference.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.inovex.de\/de\/blog\/text-summarization-seq2seq-neural-networks\/\" \/>\n<meta property=\"og:site_name\" content=\"inovex GmbH\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/inovexde\" \/>\n<meta property=\"article:published_time\" content=\"2019-07-08T06:29:16+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-11-24T09:37:43+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/seq2seq-text-summarization.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1920\" \/>\n\t<meta property=\"og:image:height\" content=\"1080\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Leon Sch\u00fcller\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/seq2seq-text-summarization-1024x576.png\" \/>\n<meta name=\"twitter:creator\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:site\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"Leon Sch\u00fcller\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"8\u00a0Minuten\" \/>\n\t<meta name=\"twitter:label3\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data3\" content=\"Leon Sch\u00fcller\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/text-summarization-seq2seq-neural-networks\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/text-summarization-seq2seq-neural-networks\\\/\"},\"author\":{\"name\":\"Leon Sch\u00fcller\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/e977002fdcb84b9ca689888236d17d68\"},\"headline\":\"Summarizing Long Texts with Seq2Seq Neural Networks\",\"datePublished\":\"2019-07-08T06:29:16+00:00\",\"dateModified\":\"2022-11-24T09:37:43+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/text-summarization-seq2seq-neural-networks\\\/\"},\"wordCount\":1586,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/text-summarization-seq2seq-neural-networks\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2019\\\/07\\\/seq2seq-text-summarization.png\",\"keywords\":[\"Ai\",\"Deep Learning\",\"nlp\"],\"articleSection\":[\"Analytics\",\"English Content\",\"General\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/text-summarization-seq2seq-neural-networks\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/text-summarization-seq2seq-neural-networks\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/text-summarization-seq2seq-neural-networks\\\/\",\"name\":\"Summarizing Long Texts with Seq2Seq Neural Networks - inovex GmbH\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/text-summarization-seq2seq-neural-networks\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/text-summarization-seq2seq-neural-networks\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2019\\\/07\\\/seq2seq-text-summarization.png\",\"datePublished\":\"2019-07-08T06:29:16+00:00\",\"dateModified\":\"2022-11-24T09:37:43+00:00\",\"description\":\"We extend state-of-the-art\u00a0 sequence-to-sequence neural networks for summarization of long text across windows. By learning transitions, we are able to process arbitrarily long texts during inference.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/text-summarization-seq2seq-neural-networks\\\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/text-summarization-seq2seq-neural-networks\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/text-summarization-seq2seq-neural-networks\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2019\\\/07\\\/seq2seq-text-summarization.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2019\\\/07\\\/seq2seq-text-summarization.png\",\"width\":1920,\"height\":1080,\"caption\":\"Stylized highlighted text being summarized in a bracket.\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/text-summarization-seq2seq-neural-networks\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Summarizing Long Texts with Seq2Seq Neural Networks\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#website\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"name\":\"inovex GmbH\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#organization\",\"name\":\"inovex GmbH\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"contentUrl\":\"https:\\\/\\\/www.inovex.de\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/inovex-logo-16-9-1.png\",\"width\":1921,\"height\":1081,\"caption\":\"inovex GmbH\"},\"image\":{\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/inovexde\",\"https:\\\/\\\/x.com\\\/inovexgmbh\",\"https:\\\/\\\/www.instagram.com\\\/inovexlife\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/inovex\",\"https:\\\/\\\/www.youtube.com\\\/channel\\\/UC7r66GT14hROB_RQsQBAQUQ\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/#\\\/schema\\\/person\\\/e977002fdcb84b9ca689888236d17d68\",\"name\":\"Leon Sch\u00fcller\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/fb1162510fedc5565fa4ce3b65e978d8a6866f7b7546dfde71a2e666bb2badb5?s=96&d=retro&r=gd43b9658412369fb1f210f594f7121d0\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/fb1162510fedc5565fa4ce3b65e978d8a6866f7b7546dfde71a2e666bb2badb5?s=96&d=retro&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/fb1162510fedc5565fa4ce3b65e978d8a6866f7b7546dfde71a2e666bb2badb5?s=96&d=retro&r=g\",\"caption\":\"Leon Sch\u00fcller\"},\"url\":\"https:\\\/\\\/www.inovex.de\\\/de\\\/blog\\\/author\\\/lschueller\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Summarizing Long Texts with Seq2Seq Neural Networks - inovex GmbH","description":"We extend state-of-the-art\u00a0 sequence-to-sequence neural networks for summarization of long text across windows. By learning transitions, we are able to process arbitrarily long texts during inference.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.inovex.de\/de\/blog\/text-summarization-seq2seq-neural-networks\/","og_locale":"de_DE","og_type":"article","og_title":"Summarizing Long Texts with Seq2Seq Neural Networks - inovex GmbH","og_description":"We extend state-of-the-art\u00a0 sequence-to-sequence neural networks for summarization of long text across windows. By learning transitions, we are able to process arbitrarily long texts during inference.","og_url":"https:\/\/www.inovex.de\/de\/blog\/text-summarization-seq2seq-neural-networks\/","og_site_name":"inovex GmbH","article_publisher":"https:\/\/www.facebook.com\/inovexde","article_published_time":"2019-07-08T06:29:16+00:00","article_modified_time":"2022-11-24T09:37:43+00:00","og_image":[{"width":1920,"height":1080,"url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/seq2seq-text-summarization.png","type":"image\/png"}],"author":"Leon Sch\u00fcller","twitter_card":"summary_large_image","twitter_image":"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/seq2seq-text-summarization-1024x576.png","twitter_creator":"@inovexgmbh","twitter_site":"@inovexgmbh","twitter_misc":{"Verfasst von":"Leon Sch\u00fcller","Gesch\u00e4tzte Lesezeit":"8\u00a0Minuten","Written by":"Leon Sch\u00fcller"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.inovex.de\/de\/blog\/text-summarization-seq2seq-neural-networks\/#article","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/blog\/text-summarization-seq2seq-neural-networks\/"},"author":{"name":"Leon Sch\u00fcller","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/e977002fdcb84b9ca689888236d17d68"},"headline":"Summarizing Long Texts with Seq2Seq Neural Networks","datePublished":"2019-07-08T06:29:16+00:00","dateModified":"2022-11-24T09:37:43+00:00","mainEntityOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/text-summarization-seq2seq-neural-networks\/"},"wordCount":1586,"commentCount":0,"publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/text-summarization-seq2seq-neural-networks\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/seq2seq-text-summarization.png","keywords":["Ai","Deep Learning","nlp"],"articleSection":["Analytics","English Content","General"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.inovex.de\/de\/blog\/text-summarization-seq2seq-neural-networks\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.inovex.de\/de\/blog\/text-summarization-seq2seq-neural-networks\/","url":"https:\/\/www.inovex.de\/de\/blog\/text-summarization-seq2seq-neural-networks\/","name":"Summarizing Long Texts with Seq2Seq Neural Networks - inovex GmbH","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/text-summarization-seq2seq-neural-networks\/#primaryimage"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/text-summarization-seq2seq-neural-networks\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/seq2seq-text-summarization.png","datePublished":"2019-07-08T06:29:16+00:00","dateModified":"2022-11-24T09:37:43+00:00","description":"We extend state-of-the-art\u00a0 sequence-to-sequence neural networks for summarization of long text across windows. By learning transitions, we are able to process arbitrarily long texts during inference.","breadcrumb":{"@id":"https:\/\/www.inovex.de\/de\/blog\/text-summarization-seq2seq-neural-networks\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.inovex.de\/de\/blog\/text-summarization-seq2seq-neural-networks\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/blog\/text-summarization-seq2seq-neural-networks\/#primaryimage","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/seq2seq-text-summarization.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2019\/07\/seq2seq-text-summarization.png","width":1920,"height":1080,"caption":"Stylized highlighted text being summarized in a bracket."},{"@type":"BreadcrumbList","@id":"https:\/\/www.inovex.de\/de\/blog\/text-summarization-seq2seq-neural-networks\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.inovex.de\/de\/"},{"@type":"ListItem","position":2,"name":"Summarizing Long Texts with Seq2Seq Neural Networks"}]},{"@type":"WebSite","@id":"https:\/\/www.inovex.de\/de\/#website","url":"https:\/\/www.inovex.de\/de\/","name":"inovex GmbH","description":"","publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.inovex.de\/de\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/www.inovex.de\/de\/#organization","name":"inovex GmbH","url":"https:\/\/www.inovex.de\/de\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","width":1921,"height":1081,"caption":"inovex GmbH"},"image":{"@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/inovexde","https:\/\/x.com\/inovexgmbh","https:\/\/www.instagram.com\/inovexlife\/","https:\/\/www.linkedin.com\/company\/inovex","https:\/\/www.youtube.com\/channel\/UC7r66GT14hROB_RQsQBAQUQ"]},{"@type":"Person","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/e977002fdcb84b9ca689888236d17d68","name":"Leon Sch\u00fcller","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/secure.gravatar.com\/avatar\/fb1162510fedc5565fa4ce3b65e978d8a6866f7b7546dfde71a2e666bb2badb5?s=96&d=retro&r=gd43b9658412369fb1f210f594f7121d0","url":"https:\/\/secure.gravatar.com\/avatar\/fb1162510fedc5565fa4ce3b65e978d8a6866f7b7546dfde71a2e666bb2badb5?s=96&d=retro&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/fb1162510fedc5565fa4ce3b65e978d8a6866f7b7546dfde71a2e666bb2badb5?s=96&d=retro&r=g","caption":"Leon Sch\u00fcller"},"url":"https:\/\/www.inovex.de\/de\/blog\/author\/lschueller\/"}]}},"_links":{"self":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/21109","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/users\/117"}],"replies":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/comments?post=21109"}],"version-history":[{"count":1,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/21109\/revisions"}],"predecessor-version":[{"id":39556,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/21109\/revisions\/39556"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media\/16532"}],"wp:attachment":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media?parent=21109"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/tags?post=21109"},{"taxonomy":"service","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/service?post=21109"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/coauthors?post=21109"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}