{"id":31893,"date":"2021-10-12T09:11:21","date_gmt":"2021-10-12T08:11:21","guid":{"rendered":"https:\/\/www.inovex.de\/?p=31893"},"modified":"2023-06-06T08:05:00","modified_gmt":"2023-06-06T06:05:00","slug":"pretraining-language-models","status":"publish","type":"post","link":"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/","title":{"rendered":"Pretraining Language Models: Quality Over Quantity?"},"content":{"rendered":"<p>Transformer-based language models trained on massive datasets, such as Google\u2019s BERT, have undeniably pushed the frontier of natural language processing (NLP) in recent years. Due to the heterogeneous nature of the training data, the models improve when shown supplementary knowledge during pretraining.<\/p>\n<p>Pretraining algorithms often call for large datasets \u2013 the bigger the better. But what if only a limited amount of data is available for pretraining? Is it beneficial to focus on quality instead of quantity? In this article, we investigate whether it is possible to invest in small, high-quality datasets for pretraining language models instead of relying on large, more general corpora.<!--more--><\/p>\n<p>&nbsp;<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_79_2 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\"><p class=\"ez-toc-title\" style=\"cursor:inherit\"><\/p>\n<\/div><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/#What-is-Domain-Adaptation\" >What is Domain Adaptation?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/#Who-is-BERT\" >Who is BERT?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/#What-is-a-Domain-How-do-we-Measure-it\" >What is a Domain? How do we Measure it?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/#Mission-QUANTITY\" >Mission: QUANTITY<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/#Mission-QUALITY\" >Mission: QUALITY<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/#Conclusion-and-Takeaways-Pretraining-Language-Models\" >Conclusion and Takeaways: Pretraining Language Models<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"What-is-Domain-Adaptation\"><\/span>What is Domain Adaptation?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Humans are pretty good at using knowledge from previously learned tasks when learning new tasks. This is generally referred to as positive transfer. For instance, learning to drive a car could facilitate learning to drive a truck.<\/p>\n<figure id=\"attachment_31869\" aria-describedby=\"caption-attachment-31869\" style=\"width: 300px\" class=\"wp-caption alignright\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-31869 size-medium\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/\/traditional-300x164.png\" alt=\"Traditional machine learning diagram\" width=\"300\" height=\"164\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/traditional-300x164.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/traditional-768x421.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/traditional-400x219.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/traditional-528x290.png 528w, https:\/\/www.inovex.de\/wp-content\/uploads\/traditional-360x197.png 360w, https:\/\/www.inovex.de\/wp-content\/uploads\/traditional.png 1000w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><figcaption id=\"caption-attachment-31869\" class=\"wp-caption-text\">Figure 1: Traditional machine learning<\/figcaption><\/figure>\n<p>It would be nice if machine learning models could do the same. Traditionally, if we wanted to train models for different tasks, we would train each model separately (see Figure 1). This is where transfer learning comes into play. Instead of starting from scratch in every task, the goal of transfer learning is to transfer knowledge from previous tasks to a target task (see Figure 2).<\/p>\n<p>Training a decent model requires an immense amount of data which we often source from publicly available datasets. The type of data published for anyone to use tends to be what is considered &#8222;standard&#8220; or canonical \u2013 mostly Wikipedia and news articles. Due to the homogeneity of these texts, they are often a poor match to other domains. Studies show that models perform substantially better when shown (additional) domain-specific text during (pre-) training. The task of adapting models from a training distribution to a different target distribution is generally referred to as domain adaptation.<\/p>\n<figure id=\"attachment_31867\" aria-describedby=\"caption-attachment-31867\" style=\"width: 300px\" class=\"wp-caption alignright\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-31867 size-medium\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/\/transfer-300x164.png\" alt=\"Transfer learning diagram\" width=\"300\" height=\"164\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/transfer-300x164.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/transfer-768x420.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/transfer-400x219.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/transfer-528x290.png 528w, https:\/\/www.inovex.de\/wp-content\/uploads\/transfer-360x197.png 360w, https:\/\/www.inovex.de\/wp-content\/uploads\/transfer.png 1002w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><figcaption id=\"caption-attachment-31867\" class=\"wp-caption-text\">Figure 2: Transfer learning<\/figcaption><\/figure>\n<p>In domain adaptation, we deal with the scenario where we have a source domain and a target domain, which are different in some way, e.g. the source domain is reviews of computers and the target domain is movie reviews. More precisely, source and target domain have different marginal probability distributions. The goal of domain adaptation is to learn representations specifically for the target domain.<\/p>\n<p>Since annotation of data is a costly enterprise and labeled data in the target domain is thus often scarce, most recent research focuses on studying unsupervised domain adaptation, where only unlabeled data for the target domain is assumed to be available.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Who-is-BERT\"><\/span>Who is BERT?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">BERT, or <\/span><i><span style=\"font-weight: 400;\">Bidirectional Encoder Representations for Transformers<\/span><\/i><span style=\"font-weight: 400;\">, is a language model that has been pretrained on a large amount of texts from Wikipedia and books. It has achieved state-of-the-art results on a wide variety of NLP tasks.\u00a0\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">BERT makes use of transformers \u2013 an attention mechanism that learns relations between words (or sub-words) in a text. What differentiates BERT from other language models is that its transformer encoder is bidirectional, meaning that it can learn about a word based on its context from both left and right.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">There are two training strategies for BERT: masked language modeling and next sentence prediction. In the former, 15% of words in each sequence that is fed to the model are replaced by a [MASK] token. The model learns by trying to predict the original value of the masked words based on the other words in the sequence. In the latter sentence prediction strategy, BERT is fed a pair of sentences and learns to predict whether the second sentence follows the first one in the original document. The cool thing about both of these strategies is that they enable the model to learn in a <\/span><i><span style=\"font-weight: 400;\">self-supervised<\/span><\/i><span style=\"font-weight: 400;\"> manner: they are supervised in the sense that there is a gold standard against which the model can check its predictions, but no labeled data is required for this step!<\/span><\/p>\n<figure id=\"attachment_31875\" aria-describedby=\"caption-attachment-31875\" style=\"width: 640px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-31875 size-large\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/\/lm2-1-1024x163.png\" alt=\"Fine-tuning diagram\" width=\"640\" height=\"102\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/lm2-1-1024x163.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/lm2-1-300x48.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/lm2-1-768x122.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/lm2-1-400x64.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/lm2-1-360x57.png 360w, https:\/\/www.inovex.de\/wp-content\/uploads\/lm2-1.png 1148w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><figcaption id=\"caption-attachment-31875\" class=\"wp-caption-text\">Figure 3: Fine-tuning<\/figcaption><\/figure>\n<figure id=\"attachment_31873\" aria-describedby=\"caption-attachment-31873\" style=\"width: 640px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-31873 size-large\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/\/lm2-1024x156.png\" alt=\"BERT adaptive pretraining diagram\" width=\"640\" height=\"98\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/lm2-1024x156.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/lm2-300x46.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/lm2-768x117.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/lm2-400x61.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/lm2-360x55.png 360w, https:\/\/www.inovex.de\/wp-content\/uploads\/lm2.png 1148w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><figcaption id=\"caption-attachment-31873\" class=\"wp-caption-text\">Figure 4: Adaptive pretraining \/ domain adaptation<\/figcaption><\/figure>\n<p><span style=\"font-weight: 400;\">In order to use BERT for classification tasks, such as sentiment analysis or topic detection, a classification layer is added. This layer can be trained in traditional machine learning fashion, i.e by feeding the model labeled data in a process called <\/span><i><span style=\"font-weight: 400;\">fine-tuning<\/span><\/i><span style=\"font-weight: 400;\"> (see Figure 3).<\/span><\/p>\n<p><span style=\"font-weight: 400;\">To take things one step further, BERT models actually perform even better when shown <\/span><i><span style=\"font-weight: 400;\">additional <\/span><\/i><span style=\"font-weight: 400;\">relevant data prior to fine-tuning. Indeed, research shows that <\/span><i><span style=\"font-weight: 400;\">pretraining <\/span><\/i><span style=\"font-weight: 400;\">on data from the <\/span><i><span style=\"font-weight: 400;\">domain<\/span><\/i><span style=\"font-weight: 400;\"> of a specific task improves results significantly. That is to say, BERT is not trained from scratch on different data but rather, the pretrained BERT model is pretrained <\/span><i><span style=\"font-weight: 400;\">again <\/span><\/i><span style=\"font-weight: 400;\">on data relevant for the task (see Figure 4). This practice is referred to as <\/span><i><span style=\"font-weight: 400;\">adaptive pretraining<\/span><\/i><span style=\"font-weight: 400;\">\u00a0\u2013 pretraining followed by secondary stages of pretraining on additional data.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"What-is-a-Domain-How-do-we-Measure-it\"><\/span>What is a Domain? How do we Measure it?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">The term <\/span><i><span style=\"font-weight: 400;\">domain <\/span><\/i><span style=\"font-weight: 400;\">is used quite liberally in NLP literature. There is no consensus on what really constitutes a domain. The definition which is cited the most (to my knowledge) distinguishes domains based on their feature spaces and their marginal probability distributions. Not only are the boundaries between these two concepts blurry \u2013 at what point are differences in vocabulary large enough to justify the distinction of feature spaces? \u2013 they also leave questions about granularity unanswered. Is <\/span><i><span style=\"font-weight: 400;\">online reviews<\/span><\/i><span style=\"font-weight: 400;\"> a domain, or <\/span><i><span style=\"font-weight: 400;\">Amazon reviews,<\/span><\/i><span style=\"font-weight: 400;\">\u00a0or <\/span><i><span style=\"font-weight: 400;\">Amazon reviews of wireless headphones<\/span><\/i><span style=\"font-weight: 400;\">? We could continue this line of argument until each document is its own domain.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Despite the muddle of conflicting notions about the domain, there are quite a few studies attempting to measure domain similarity. It is pretty important to have some measure of domain similarity. You need similarity measures to be able to make claims about a model\u2019s generalization and performance across domains, and they are also used directly in domain adaptation algorithms. Commonly used metrics are the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Kullback%E2%80%93Leibler_divergence\" target=\"_blank\" rel=\"noopener\">Kullback-Leibler divergence<\/a> and the\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Jensen%E2%80%93Shannon_divergence\" target=\"_blank\" rel=\"noopener\">Jensen-Shannon divergence<\/a>\u00a0which measure the difference between two probability distributions. For model evaluation purposes, vocabulary overlap, where you calculate the intersections between, say, the top 10k words of each domain, seems to be a popular measure.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Mission-QUANTITY\"><\/span>Mission: QUANTITY<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>For Mission: QUANTITY, we want to find a baseline for adaptively pretraining BERT across different tasks and domains. As explained above, the notion of <em>domain<\/em> is pretty fuzzy. For now, let&#8217;s go with the intuitively defined &#8222;domains&#8220; reviews, news, social media, and scientific papers (aka domains where we can find lots of publicly available datasets). Here is a list of the datasets and tasks:<\/p>\n<p><strong>Reviews<\/strong><\/p>\n<ul>\n<li><a href=\"https:\/\/www.kaggle.com\/nicapotato\/womens-ecommerce-clothing-reviews\" target=\"_blank\" rel=\"noopener\">E-Commerce<\/a> (sentiment analysis)<\/li>\n<li><a href=\"https:\/\/huggingface.co\/datasets\/imdb\" target=\"_blank\" rel=\"noopener\">IMDB<\/a> (sentiment analysis)<\/li>\n<li><a href=\"https:\/\/www.cs.jhu.edu\/~mdredze\/datasets\/sentiment\/\" target=\"_blank\" rel=\"noopener\">Amazon<\/a> (sentiment analysis and topic detection)<\/li>\n<\/ul>\n<p><strong>News<\/strong><\/p>\n<ul>\n<li><a href=\"https:\/\/huggingface.co\/datasets\/ag_news\" target=\"_blank\" rel=\"noopener\">AGNews<\/a> (topic detection)<\/li>\n<li><a href=\"https:\/\/www.kaggle.com\/c\/learn-ai-bbc\" target=\"_blank\" rel=\"noopener\">BBC<\/a> (topic detection)<\/li>\n<li><a href=\"http:\/\/qwone.com\/~jason\/20Newsgroups\/\" target=\"_blank\" rel=\"noopener\">20Newsgroups<\/a> (topic detection)<\/li>\n<\/ul>\n<p><strong>Social media<\/strong><\/p>\n<ul>\n<li><a href=\"https:\/\/www.kaggle.com\/kazanova\/sentiment140\" target=\"_blank\" rel=\"noopener\">Sentiment140 <\/a>(sentiment analysis)<\/li>\n<li><a href=\"https:\/\/huggingface.co\/datasets\/tweet_eval\" target=\"_blank\" rel=\"noopener\">Tweet Eval<\/a> (sentiment analysis)<\/li>\n<li><a href=\"https:\/\/huggingface.co\/datasets\/emotion\" target=\"_blank\" rel=\"noopener\">Emotion<\/a> (sentiment analysis)<\/li>\n<li><a href=\"https:\/\/huggingface.co\/datasets\/hatexplain\" target=\"_blank\" rel=\"noopener\">HateXplain<\/a> (hate speech detection)<\/li>\n<li><a href=\"https:\/\/huggingface.co\/datasets\/ethos\" target=\"_blank\" rel=\"noopener\">Ethos<\/a> (hate speech detection)<\/li>\n<\/ul>\n<p><strong>Scientific papers<\/strong><\/p>\n<ul>\n<li><a href=\"https:\/\/github.com\/Franck-Dernoncourt\/pubmed-rct\" target=\"_blank\" rel=\"noopener\">PubMed<\/a> (abstract analysis)<\/li>\n<li><a href=\"https:\/\/data.mendeley.com\/datasets\/9rw3vkcfy4\/6\" target=\"_blank\" rel=\"noopener\">Web of Science<\/a> (topic detection)<\/li>\n<\/ul>\n<p>First, we only fine-tune BERT on each dataset without any additional pretraining. To speed up training time, we use DistilBERT, which is smaller and faster than BERT but was pretrained on the same corpus (Wikipedia and BookCorpus). Table 1 shows the results.<\/p>\n<figure id=\"attachment_31879\" aria-describedby=\"caption-attachment-31879\" style=\"width: 1000px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-31879\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/\/baseline-1024x361.png\" alt=\"BERT baseline results\" width=\"1000\" height=\"353\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/baseline-1024x361.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/baseline-300x106.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/baseline-768x271.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/baseline-1536x542.png 1536w, https:\/\/www.inovex.de\/wp-content\/uploads\/baseline-400x141.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/baseline-360x127.png 360w, https:\/\/www.inovex.de\/wp-content\/uploads\/baseline.png 1677w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><figcaption id=\"caption-attachment-31879\" class=\"wp-caption-text\">Table 1: DistilBERT fine-tuning results (SD = standard deviation)<\/figcaption><\/figure>\n<p>As you can see, the model did pretty well on the IMDB and Amazon datasets, as well as the AGNews and PubMed ones. It struggled with many of the social media datasets. I wondered whether performance was in some way related to domain similarity between BERT&#8217;s original training domain (Wikipedia and BookCorpus) and the task&#8217;s domain. Testing different similarity measures like vocabulary overlap, I unfortunately could not find a correlation. As explained above, there is no real consensus on what constitutes a domain and how to quantify differences between domains. We need some more studies on the topic and maybe someone else can find a correlation in the future.<\/p>\n<p>Moving on, we compare the baseline results (Table 1) to results after domain-adaptive pretraining (Table 2). Here, we pretrain the DistilBERT model on a large domain dataset in hopes of providing it with some useful additional knowledge. For reviews, we use the <a href=\"https:\/\/huggingface.co\/datasets\/amazon_polarity\" target=\"_blank\" rel=\"noopener\">Amazon polarity<\/a> dataset; for news, we use the <a href=\"https:\/\/webis.de\/data\/pan-semeval-hyperpartisan-news-detection-19.html\" target=\"_blank\" rel=\"noopener\">SemEval 2019 news<\/a> dataset; for social media, we use a <a href=\"https:\/\/archive.org\/details\/twitter_cikm_2010\" target=\"_blank\" rel=\"noopener\">Twitter scrape<\/a> and a collection of Reddit comments; and for the science domain, we use an <a href=\"https:\/\/www.kaggle.com\/Cornell-University\/arxiv\" target=\"_blank\" rel=\"noopener\">ArXiv<\/a> dataset. All datasets are the same size (200 million tokens), so the we can compare results fairly.<\/p>\n<figure id=\"attachment_31881\" aria-describedby=\"caption-attachment-31881\" style=\"width: 1000px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-31881\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/\/dapt-1024x385.png\" alt=\"BERT domain-adaptive pretraining results\" width=\"1000\" height=\"376\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/dapt-1024x385.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/dapt-300x113.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/dapt-768x289.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/dapt-1536x578.png 1536w, https:\/\/www.inovex.de\/wp-content\/uploads\/dapt-400x150.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/dapt-360x135.png 360w, https:\/\/www.inovex.de\/wp-content\/uploads\/dapt.png 1702w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><figcaption id=\"caption-attachment-31881\" class=\"wp-caption-text\">Table 2: DistilBERT domain-adaptive pretraining results &#8211; scores show improvement from baseline (SD = standard deviation)<\/figcaption><\/figure>\n<p>Looking at the results, we can see that domain-adaptive pretraining does indeed improve performance across most datasets. There is some negative transfer for the BBC dataset, implying that it is not within the domain of the news articles we chose for pretraining.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Mission-QUALITY\"><\/span>Mission: QUALITY<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Let&#8217;s get to the juicy part: Mission: QUALITY. Can we do adaptive pretraining using small, high-quality datasets?<\/p>\n<p>In order to generate &#8222;high-quality&#8220; datasets \u2013 aka datasets that are more closely tied to the target task \u2013 we employ a data selection algorithm. For each target task we:<\/p>\n<ol>\n<li>Generate embeddings for each document in the task dataset, as well as the documents in the corresponding domain dataset,<\/li>\n<li>Select candidates using Nearest Neighbors with cosine similarity as the matric, resulting in a task-specific dataset and<\/li>\n<li>Pretrain BERT on task-specific dataset.<\/li>\n<\/ol>\n<p>For instance, when generating a high-quality dataset for IMDB, we take our domain dataset (Amazon polarity) and select candidates based on each IMDB movie review. The resulting task-specific dataset contained lots of reviews of movies, books and music (it ended up choosing around 50k reviews). Reviews of beauty products or gardening tools are irrelevant, so they did not end up in the task-specific dataset.<\/p>\n<figure id=\"attachment_31884\" aria-describedby=\"caption-attachment-31884\" style=\"width: 640px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-31884 size-large\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/\/01_tapt_a-1024x933.png\" alt=\"results summary of Comparison of domain-adaptive pretraining and task-adaptive pretraining \" width=\"640\" height=\"583\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/01_tapt_a-1024x933.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/01_tapt_a-300x273.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/01_tapt_a-768x700.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/01_tapt_a-1536x1400.png 1536w, https:\/\/www.inovex.de\/wp-content\/uploads\/01_tapt_a-2048x1867.png 2048w, https:\/\/www.inovex.de\/wp-content\/uploads\/01_tapt_a-1920x1750.png 1920w, https:\/\/www.inovex.de\/wp-content\/uploads\/01_tapt_a-400x365.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/01_tapt_a-360x328.png 360w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><figcaption id=\"caption-attachment-31884\" class=\"wp-caption-text\">Figure 5: Comparison of domain-adaptive pretraining (DAPT200 for pretraining on 200m tokens, DAPT100 for pretraining on 100m tokens) and task-adaptive pretraining (TAPT10NN for choosing 10 nearest neighbors, TAPT5NN for choosing 5 nearest neighbors)<\/figcaption><\/figure>\n<figure id=\"attachment_31886\" aria-describedby=\"caption-attachment-31886\" style=\"width: 640px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-31886 size-large\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/\/01_tapt_b-1024x918.png\" alt=\"results summary\" width=\"640\" height=\"574\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/01_tapt_b-1024x918.png 1024w, https:\/\/www.inovex.de\/wp-content\/uploads\/01_tapt_b-300x269.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/01_tapt_b-768x689.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/01_tapt_b-1536x1378.png 1536w, https:\/\/www.inovex.de\/wp-content\/uploads\/01_tapt_b-2048x1837.png 2048w, https:\/\/www.inovex.de\/wp-content\/uploads\/01_tapt_b-1920x1722.png 1920w, https:\/\/www.inovex.de\/wp-content\/uploads\/01_tapt_b-400x359.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/01_tapt_b-360x323.png 360w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><figcaption id=\"caption-attachment-31886\" class=\"wp-caption-text\">Figure 6: Comparison of domain-adaptive pretraining (DAPT200 for pretraining on 200m tokens, DAPT100 for pretraining on 100m tokens) and task-adaptive pretraining (TAPT10NN for choosing 10 nearest neighbors, TAPT5NN for choosing 5 nearest neighbors)<\/figcaption><\/figure>\n<p>Looking at Figure 5 and 6, which compare how much the model&#8217;s performance improved after pretraining, we can see that even though task-adaptive pretraining uses <em>much<\/em> smaller datasets, the model still gets a nice performance boost! In those cases where we had negative transfer from domain-adaptive pretraining (see BBC), selecting &#8222;high-quality&#8220; relevant candidates actually remedied the issue and resulted in positive transfer.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Conclusion-and-Takeaways-Pretraining-Language-Models\"><\/span>Conclusion and Takeaways: Pretraining Language Models<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>We saw that language models trained on massive, heterogeneous datasets can be adapted to a task&#8217;s domain to improve performance. Usually this type of domain adaptation asks for large datasets but we learned that it works (almost) equally well with small, high-quality datasets. You can extract task-specific datasets from large, more general ones with data selection algorithms (check out <a href=\"https:\/\/arxiv.org\/abs\/1702.02426\" target=\"_blank\" rel=\"noopener\">Data Selection Strategies for Multi-Domain Sentiment Analysis <\/a>for more information).<\/p>\n<p>A takeaway from this is that if you find yourself in a situation where you need to generate a new dataset, you might want to try focusing on quality instead of quantity! This idea goes hand in hand with a new line of thinking that is becoming more popular in machine learning: choosing a <em>data-centric view<\/em> over a <em>model-centric<\/em> one. When faced with a machine learning problem, a data-centric approach would be focusing on data quality to improve your model&#8217;s performance, while the more traditional model-centric approach is to try different state-of-the-art architectures or to optimize hyperparameters. If you are interested in learning more about data-centric AI, check out <a href=\"https:\/\/www.youtube.com\/watch?v=06-AZXmwHjo\" target=\"_blank\" rel=\"noopener\">Andrew Ng&#8217;s talk on Youtube<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Transformer-based language models trained on massive datasets, such as Google\u2019s BERT, have undeniably pushed the frontier of natural language processing (NLP) in recent years. Due to the heterogeneous nature of the training data, the models improve when shown supplementary knowledge during pretraining. Pretraining algorithms often call for large datasets \u2013 the bigger the better. But [&hellip;]<\/p>\n","protected":false},"author":257,"featured_media":32110,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"ep_exclude_from_search":false,"footnotes":""},"tags":[151,140],"service":[76],"coauthors":[{"id":257,"display_name":"Jennifer Bitschene","user_nicename":"jbitschene"}],"class_list":["post-31893","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","tag-deep-learning","tag-machine-learning","service-artificial-intelligence"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Pretraining Language Models: Quality Over Quantity? - inovex GmbH<\/title>\n<meta name=\"description\" content=\"This article shows how we can invest in quality data instead of relying on quantity when pretraining transformer-based language models like BERT.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Pretraining Language Models: Quality Over Quantity? - inovex GmbH\" \/>\n<meta property=\"og:description\" content=\"This article shows how we can invest in quality data instead of relying on quantity when pretraining transformer-based language models like BERT.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/\" \/>\n<meta property=\"og:site_name\" content=\"inovex GmbH\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/inovexde\" \/>\n<meta property=\"article:published_time\" content=\"2021-10-12T08:11:21+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-06-06T06:05:00+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/machine-learning-bert.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1920\" \/>\n\t<meta property=\"og:image:height\" content=\"1080\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Jennifer Bitschene\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/machine-learning-bert-1024x576.png\" \/>\n<meta name=\"twitter:creator\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:site\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"Jennifer Bitschene\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"11\u00a0Minuten\" \/>\n\t<meta name=\"twitter:label3\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data3\" content=\"Jennifer Bitschene\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/\"},\"author\":{\"name\":\"Jennifer Bitschene\",\"@id\":\"https:\/\/www.inovex.de\/de\/#\/schema\/person\/a42fcb06bf550066121982305556af4e\"},\"headline\":\"Pretraining Language Models: Quality Over Quantity?\",\"datePublished\":\"2021-10-12T08:11:21+00:00\",\"dateModified\":\"2023-06-06T06:05:00+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/\"},\"wordCount\":1950,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.inovex.de\/de\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/machine-learning-bert.png\",\"keywords\":[\"Deep Learning\",\"Machine Learning\"],\"articleSection\":[\"Analytics\",\"English Content\",\"General\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/\",\"url\":\"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/\",\"name\":\"Pretraining Language Models: Quality Over Quantity? - inovex GmbH\",\"isPartOf\":{\"@id\":\"https:\/\/www.inovex.de\/de\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/machine-learning-bert.png\",\"datePublished\":\"2021-10-12T08:11:21+00:00\",\"dateModified\":\"2023-06-06T06:05:00+00:00\",\"description\":\"This article shows how we can invest in quality data instead of relying on quantity when pretraining transformer-based language models like BERT.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/#primaryimage\",\"url\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/machine-learning-bert.png\",\"contentUrl\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/machine-learning-bert.png\",\"width\":1920,\"height\":1080,\"caption\":\"Bert in trainers in front of the words machine learning\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.inovex.de\/de\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Pretraining Language Models: Quality Over Quantity?\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.inovex.de\/de\/#website\",\"url\":\"https:\/\/www.inovex.de\/de\/\",\"name\":\"inovex GmbH\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/www.inovex.de\/de\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.inovex.de\/de\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.inovex.de\/de\/#organization\",\"name\":\"inovex GmbH\",\"url\":\"https:\/\/www.inovex.de\/de\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png\",\"contentUrl\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png\",\"width\":1921,\"height\":1081,\"caption\":\"inovex GmbH\"},\"image\":{\"@id\":\"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/inovexde\",\"https:\/\/x.com\/inovexgmbh\",\"https:\/\/www.instagram.com\/inovexlife\/\",\"https:\/\/www.linkedin.com\/company\/inovex\",\"https:\/\/www.youtube.com\/channel\/UC7r66GT14hROB_RQsQBAQUQ\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.inovex.de\/de\/#\/schema\/person\/a42fcb06bf550066121982305556af4e\",\"name\":\"Jennifer Bitschene\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/www.inovex.de\/de\/#\/schema\/person\/image\/7d56f330a0a1c3fe1b099040144a71db\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/324bb7376a8bb268c1862f598faf58528abbfeff08b1686c0937f48d4797dd4e?s=96&d=retro&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/324bb7376a8bb268c1862f598faf58528abbfeff08b1686c0937f48d4797dd4e?s=96&d=retro&r=g\",\"caption\":\"Jennifer Bitschene\"},\"url\":\"https:\/\/www.inovex.de\/de\/blog\/author\/jbitschene\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Pretraining Language Models: Quality Over Quantity? - inovex GmbH","description":"This article shows how we can invest in quality data instead of relying on quantity when pretraining transformer-based language models like BERT.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/","og_locale":"de_DE","og_type":"article","og_title":"Pretraining Language Models: Quality Over Quantity? - inovex GmbH","og_description":"This article shows how we can invest in quality data instead of relying on quantity when pretraining transformer-based language models like BERT.","og_url":"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/","og_site_name":"inovex GmbH","article_publisher":"https:\/\/www.facebook.com\/inovexde","article_published_time":"2021-10-12T08:11:21+00:00","article_modified_time":"2023-06-06T06:05:00+00:00","og_image":[{"width":1920,"height":1080,"url":"https:\/\/www.inovex.de\/wp-content\/uploads\/machine-learning-bert.png","type":"image\/png"}],"author":"Jennifer Bitschene","twitter_card":"summary_large_image","twitter_image":"https:\/\/www.inovex.de\/wp-content\/uploads\/machine-learning-bert-1024x576.png","twitter_creator":"@inovexgmbh","twitter_site":"@inovexgmbh","twitter_misc":{"Verfasst von":"Jennifer Bitschene","Gesch\u00e4tzte Lesezeit":"11\u00a0Minuten","Written by":"Jennifer Bitschene"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/#article","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/"},"author":{"name":"Jennifer Bitschene","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/a42fcb06bf550066121982305556af4e"},"headline":"Pretraining Language Models: Quality Over Quantity?","datePublished":"2021-10-12T08:11:21+00:00","dateModified":"2023-06-06T06:05:00+00:00","mainEntityOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/"},"wordCount":1950,"commentCount":0,"publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/machine-learning-bert.png","keywords":["Deep Learning","Machine Learning"],"articleSection":["Analytics","English Content","General"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/","url":"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/","name":"Pretraining Language Models: Quality Over Quantity? - inovex GmbH","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/#primaryimage"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/machine-learning-bert.png","datePublished":"2021-10-12T08:11:21+00:00","dateModified":"2023-06-06T06:05:00+00:00","description":"This article shows how we can invest in quality data instead of relying on quantity when pretraining transformer-based language models like BERT.","breadcrumb":{"@id":"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/#primaryimage","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/machine-learning-bert.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/machine-learning-bert.png","width":1920,"height":1080,"caption":"Bert in trainers in front of the words machine learning"},{"@type":"BreadcrumbList","@id":"https:\/\/www.inovex.de\/de\/blog\/pretraining-language-models\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.inovex.de\/de\/"},{"@type":"ListItem","position":2,"name":"Pretraining Language Models: Quality Over Quantity?"}]},{"@type":"WebSite","@id":"https:\/\/www.inovex.de\/de\/#website","url":"https:\/\/www.inovex.de\/de\/","name":"inovex GmbH","description":"","publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.inovex.de\/de\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/www.inovex.de\/de\/#organization","name":"inovex GmbH","url":"https:\/\/www.inovex.de\/de\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","width":1921,"height":1081,"caption":"inovex GmbH"},"image":{"@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/inovexde","https:\/\/x.com\/inovexgmbh","https:\/\/www.instagram.com\/inovexlife\/","https:\/\/www.linkedin.com\/company\/inovex","https:\/\/www.youtube.com\/channel\/UC7r66GT14hROB_RQsQBAQUQ"]},{"@type":"Person","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/a42fcb06bf550066121982305556af4e","name":"Jennifer Bitschene","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/image\/7d56f330a0a1c3fe1b099040144a71db","url":"https:\/\/secure.gravatar.com\/avatar\/324bb7376a8bb268c1862f598faf58528abbfeff08b1686c0937f48d4797dd4e?s=96&d=retro&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/324bb7376a8bb268c1862f598faf58528abbfeff08b1686c0937f48d4797dd4e?s=96&d=retro&r=g","caption":"Jennifer Bitschene"},"url":"https:\/\/www.inovex.de\/de\/blog\/author\/jbitschene\/"}]}},"_links":{"self":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/31893","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/users\/257"}],"replies":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/comments?post=31893"}],"version-history":[{"count":6,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/31893\/revisions"}],"predecessor-version":[{"id":46045,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/31893\/revisions\/46045"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media\/32110"}],"wp:attachment":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media?parent=31893"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/tags?post=31893"},{"taxonomy":"service","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/service?post=31893"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/coauthors?post=31893"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}