{"id":36812,"date":"2022-07-08T09:57:06","date_gmt":"2022-07-08T07:57:06","guid":{"rendered":"https:\/\/www.inovex.de\/?p=36812"},"modified":"2024-05-14T07:21:13","modified_gmt":"2024-05-14T05:21:13","slug":"prompt-engineering-guide","status":"publish","type":"post","link":"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/","title":{"rendered":"Prompt Engineering and Zero-Shot\/Few-Shot Learning [Guide]"},"content":{"rendered":"<p><span style=\"font-weight: 400\">In this blog post, I will give you an overview of prompt engineering, talk about its fascinating capabilities, the definition of zero-shot and few-shot learning, and provide a practical guide on how to adopt prompt engineering for your task of interest.<\/span><!--more--><\/p>\n<p><span style=\"font-weight: 400\">Prompt engineering or prompt learning is a novel approach for leveraging pre-trained language models (LMs) to perform NLP tasks without fine-tuning. In this approach, the model is informed about the target task directly through a natural language task description which is integrated into the actual input sentence in some way. The task description is called a <\/span><i><span style=\"font-weight: 400\">prompt<\/span><\/i><span style=\"font-weight: 400\"> as it prompts the model to perform a specific task. Prompt engineering is often implemented in a zero-shot or few-shot setting, which means no or only a few labeled examples are used.<\/span><\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_79_2 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\"><p class=\"ez-toc-title\" style=\"cursor:inherit\"><\/p>\n<\/div><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/#From-Fine-Tuning-to-Prompt-Learning\" >From Fine-Tuning to Prompt Learning<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/#Zero-Shot-and-Few-Shot-Learning\" >Zero-Shot and Few-Shot Learning<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/#Zero-Shot-Learning\" >Zero-Shot Learning<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/#Few-Shot-Learning\" >Few-Shot Learning<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/#A-Guide-to-Utilizing-Prompt-Engineering\" >A Guide to Utilizing Prompt Engineering<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/#Choosing-the-Language-Model\" >Choosing the Language Model<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/#Designing-the-Prompt-Prompt-Engineering\" >Designing the Prompt (Prompt Engineering)<\/a><ul class='ez-toc-list-level-4' ><li class='ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/#Manually-designed-prompts\" >Manually designed prompts<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/#Text-classification-tasks\" >Text classification tasks<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/#Text-generation-tasks\" >Text generation tasks<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/#Automatic-prompt-search\" >Automatic prompt search<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/#Prompt-based-fine-tuning\" >Prompt-based fine-tuning<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/#Answer-Engineering\" >Answer Engineering<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/#Conclusion\" >Conclusion<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"From-Fine-Tuning-to-Prompt-Learning\"><\/span><b>From Fine-Tuning to Prompt Learning<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400\">Fine-tuning is a widely used, powerful transfer learning technique that can turn a pre-trained LM into a task-specific model. It has demonstrated remarkable performance for many downstream tasks including natural language inference, question answering, and text classification (<\/span><a href=\"https:\/\/aclanthology.org\/N19-1423\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400\">Devlin et al., 2019<\/span><\/a><span style=\"font-weight: 400\">;<\/span><a href=\"https:\/\/arxiv.org\/abs\/1905.05583\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">Sun et al., 2019<\/span><\/a><span style=\"font-weight: 400\">;<\/span><a href=\"https:\/\/arxiv.org\/abs\/1801.06146\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">Howard and Ruder, 2018<\/span><\/a><span style=\"font-weight: 400\">;<\/span><a href=\"https:\/\/arxiv.org\/abs\/2006.03654\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">He et al., 2021<\/span><\/a><span style=\"font-weight: 400\">). However, to achieve such high performance, the model usually requires sufficiently large annotated training data which is often costly and can be hard to obtain. Due to this constraint, methods that require less annotated data are attracting much interest from industry and researchers. Prompt engineering is one of the works in this direction.<\/span><\/p>\n<p><span style=\"font-weight: 400\">The idea is to ask a pre-trained language model to perform the target task by giving it a task description such as \u201cTranslate this French sentence into an English sentence\u201c,\u00a0 followed by the French sentence we want to translate. The model is supposed to understand the task description and return the English translation of the input sentence.<\/span><\/p>\n<div class=\"bg-color-blue\">\n<p><b>Prompt:<\/b><span style=\"font-weight: 400\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 Translate this French sentence into an English sentence.<\/span><\/p>\n<p><span style=\"font-weight: 400\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><b>J&#8217;aime la pizza.<\/b><\/p>\n<p><b>Model output:<\/b><span style=\"font-weight: 400\">\u00a0 \u00a0 I like pizza<\/span><\/p>\n<\/div>\n<p><span style=\"font-weight: 400\">This may seem impossible, but large-scale LMs like<\/span><a href=\"https:\/\/arxiv.org\/abs\/2005.14165\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">GPT3<\/span><\/a><span style=\"font-weight: 400\"> and its predecessor<\/span><a href=\"https:\/\/www.semanticscholar.org\/paper\/Language-Models-are-Unsupervised-Multitask-Learners-Radford-Wu\/9405cc0d6169988371b2755e573cc28650d14dfe?p2df\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">GPT2<\/span><\/a><span style=\"font-weight: 400\"> have shown that they are capable of performing a range of NLP tasks using prompting. For instance, machine translation, question answering, cloze questions, reasoning tasks, and domain adaptation. This achievement has been regarded as a paradigm shift for NLP<\/span><a href=\"https:\/\/www.semanticscholar.org\/paper\/Pre-train%2C-Prompt%2C-and-Predict%3A-A-Systematic-Survey-Liu-Yuan\/28692beece311a90f5fa1ca2ec9d0c2ce293d069\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">(Liu et al.,2021)<\/span><\/a><span style=\"font-weight: 400\">, where we only need to train a single LM to be powerful enough, design prompts that the LM can understand, and we can have it perform arbitrary tasks without or with only a few labeled data. After GPT3, there is an increasing trend in research on prompt learning. Some work focuses on finding optimal prompt templates for large LMs. Other work tries to enable this technique for smaller LMs in a few-shot setting or combining fine-tuning with prompt learning.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Zero-Shot-and-Few-Shot-Learning\"><\/span><b>Zero-Shot and Few-Shot Learning<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400\">To begin, I would like to clarify the definition of zero-shot and few-shot learning, as the terms are used differently in varying contexts and domains.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Zero-Shot-Learning\"><\/span>Zero-Shot Learning<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400\">In the prompt engineering literature, the term \u201czero-shot\u201c often refers to <\/span><b>a setting where zero-labeled data is used in model training or inference. <\/b><span style=\"font-weight: 400\">In a broader sense, \u201czero-shot\u201c refers to \u201c<\/span><b>teaching a model to do something it has not been explicitly trained to do\u201c.<\/b><\/p>\n<p><span style=\"font-weight: 400\">Zero-shot learning originates from the field of computer vision. It refers to the problem setting where the goal is<\/span><b> to train a classifier on a labeled dataset so that it can classify objects of unseen classes<\/b><a href=\"https:\/\/www.semanticscholar.org\/paper\/A-Survey-of-Zero-Shot-Learning-Wang-Zheng\/dafa29f1f0534448d205365796d68873a0068c6b\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">(Wang et al., 2019)<\/span><\/a><span style=\"font-weight: 400\">. Since samples from unseen classes are not available during training, solving the zero-shot problem requires some auxiliary information that connects the learned knowledge from the training phase with the unseen classes.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Zero-shot learning in this sense has also been applied to NLP tasks like text summarization and machine translation. <\/span><a href=\"https:\/\/arxiv.org\/pdf\/1910.00998.pdf\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400\">Liu et al., (2019)<\/span><\/a><span style=\"font-weight: 400\"> proposed a denoising autoencoder for text summarization trained only on source paragraphs. The model encodes the training paragraphs and each sentence of the paragraph in a shared space. It generates the output summary by decoding the encoded paragraph. The denoising objective plays an important role here. It works in a self-supervised way and serves as data augmentation. In machine translation, zero-shot learning can help create example pairs for language pairs that do not have parallel corpora (<\/span><a href=\"https:\/\/arxiv.org\/abs\/1606.04164\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400\">Firat et al., 2016<\/span><\/a><span style=\"font-weight: 400\">; <\/span><a href=\"https:\/\/arxiv.org\/abs\/1611.04558\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400\">Johnson et al., 2016<\/span><\/a><span style=\"font-weight: 400\">). Zero-shot learning is also used in semantic utterance classification <\/span><a href=\"https:\/\/arxiv.org\/abs\/1401.0509\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400\">(Dauphin et al., 2014)<\/span><\/a><span style=\"font-weight: 400\"> to classify new semantic classes.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Few-Shot-Learning\"><\/span>Few-Shot Learning<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400\">Few-shot learning is <\/span><b>a setting where the system is given only a very small number of supervised examples<\/b><a href=\"https:\/\/arxiv.org\/abs\/1904.05046\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">(Wang et al., 2019)<\/span><\/a><span style=\"font-weight: 400\">. Few-shot usually means two to five examples per class, but it can also be up to 100 examples <\/span><a href=\"https:\/\/dl.acm.org\/doi\/abs\/10.1145\/3447548.3467235\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400\">(Wang et al., 2021)<\/span><\/a><span style=\"font-weight: 400\">. When only one example is given, it is called <\/span><b>one-shot learning. <\/b><span style=\"font-weight: 400\">Typically, systems in a few-shot setting would require some prior knowledge (e.g. a pre-trained language model) to compensate for the small number of training examples. In prompt engineering, examples are often added directly to the prompt. Other applications in few-shot settings include parsing <\/span><a href=\"https:\/\/aclanthology.org\/P18-1110\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400\">(Joshi et al., 2018)<\/span><\/a><span style=\"font-weight: 400\">, translation <\/span><a href=\"https:\/\/arxiv.org\/abs\/1703.03129\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400\">(Kaiser et al., 2017)<\/span><\/a><span style=\"font-weight: 400\">, question answering <a href=\"https:\/\/arxiv.org\/abs\/2109.01951\" target=\"_blank\" rel=\"noopener\">(Chada and Natarajan, 2021)<\/a><\/span><span style=\"font-weight: 400\">, and relation classification <\/span><a href=\"https:\/\/aclanthology.org\/D18-1514\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400\">(Han et al., 2018)<\/span><\/a><span style=\"font-weight: 400\">.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"A-Guide-to-Utilizing-Prompt-Engineering\"><\/span><b>A Guide to Utilizing Prompt Engineering<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400\">There are many factors that could affect the performance of a prompt-based system, such as the choice of the language model, how the prompt is formulated, and whether the language model parameters are tuned or frozen. I will discuss them in this section.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Suppose you want to solve an NLP task using the prompt engineering approach. How can you get started? First, let&#8217;s categorize NLP tasks into<\/span><b> text classification <\/b><span style=\"font-weight: 400\">and <\/span><b>text generation<\/b><span style=\"font-weight: 400\"> tasks. It will help you later when selecting the other components.<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><b>Text classification<\/b><span style=\"font-weight: 400\"> is, for example, topic labeling, sentiment analysis, named entity recognition, and natural language inference.<\/span><\/li>\n<li style=\"font-weight: 400\"><b>Text generation<\/b><span style=\"font-weight: 400\"> is, for example, translation, text summarization, and open-domain question answering.<\/span><\/li>\n<\/ul>\n<h3><span class=\"ez-toc-section\" id=\"Choosing-the-Language-Model\"><\/span><b>Choosing the Language Model<\/b><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400\">There are a number of LMs that have been proposed so far. They differ in their structure, training objective, domain, and language. Which one should you choose? Here are the three popular types of LMs, categorized by their training method and directionality.<\/span><\/p>\n<ol>\n<li><b> Left-to-right Language Models: <\/b><span style=\"font-weight: 400\"> Left-to-right LMs are trained to predict the next token given a sequence of tokens, from left to right, one token at a time. Language models trained in this way are also known as<a href=\"https:\/\/huggingface.co\/docs\/transformers\/model_summary\" target=\"_blank\" rel=\"noopener\"> autoregressive models<\/a>. Left-to-right LMs language models have been dominant until recently with the introduction of masked language models.<\/span><\/li>\n<\/ol>\n<p style=\"padding-left: 80px\"><b>Models:<\/b><a href=\"https:\/\/arxiv.org\/abs\/2005.14165\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">GPT-3<\/span><\/a><span style=\"font-weight: 400\">,<\/span><a href=\"https:\/\/www.semanticscholar.org\/paper\/Language-Models-are-Unsupervised-Multitask-Learners-Radford-Wu\/9405cc0d6169988371b2755e573cc28650d14dfe?p2df\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">GPT-2<\/span><\/a><span style=\"font-weight: 400\">,<\/span><a href=\"https:\/\/github.com\/EleutherAI\/gpt-neo\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">GPT-Neo<\/span><\/a><\/p>\n<p style=\"padding-left: 80px\"><b>Application:<\/b><span style=\"font-weight: 400\"> text classification &amp; text generation<\/span><\/p>\n<ol start=\"2\">\n<li><b> Masked Language Models:<\/b><span style=\"font-weight: 400\"> A masked language model (MLM) is given\u00a0a text as input in which several tokens are masked. It is then trained to correctly predict these masked positions. MLM is a variant of<\/span><a href=\"https:\/\/huggingface.co\/docs\/transformers\/model_summary\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">autoencoding models<\/span><\/a><span style=\"font-weight: 400\">, which refer to models that have been trained with corrupted input sequences and attempt to reconstruct the original sequence. One of the most popular models of this type is BERT, which is based on bidirectional transformers. In general, MLMs are better suited for text classification tasks than left-to-right LMs. The reason is that text classification tasks can often be formulated as cloze texts, which aligns with the training objective of MLMs. BERT-based models are not suitable for text generation tasks due to their training objective, bidirectionality, and output format, which are not optimized for generating texts. However, several works have shown ways to use BERT for text generation, such as the works of<\/span><a href=\"https:\/\/arxiv.org\/abs\/1911.03829\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">Chen et al. (2019)<\/span><\/a><span style=\"font-weight: 400\"> and<\/span><a href=\"https:\/\/arxiv.org\/abs\/1902.04094\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">Wang and Cho (2019)<\/span><\/a><span style=\"font-weight: 400\">.<\/span><\/li>\n<\/ol>\n<p style=\"padding-left: 80px\"><b>Models:<\/b><a href=\"https:\/\/aclanthology.org\/N19-1423\/\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">BERT<\/span><\/a><span style=\"font-weight: 400\">,<\/span><a href=\"https:\/\/arxiv.org\/abs\/1907.11692\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">RoBERTa<\/span><\/a><span style=\"font-weight: 400\">,<\/span><a href=\"https:\/\/arxiv.org\/abs\/1904.09223\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">ERNIE<\/span><\/a><span style=\"font-weight: 400\">, and their variants.<\/span><\/p>\n<p style=\"padding-left: 80px\"><b>Application:<\/b><span style=\"font-weight: 400\"> text classification<\/span><\/p>\n<ol start=\"3\">\n<li><b> Encoder-Decoder Language Models: <\/b><span style=\"font-weight: 400\">Encoder-decoder models (also known as sequence-to-sequence models) are a common architecture for conditional text generation tasks such as machine translation or text summarization, where the output is not a direct mapping of the input<\/span><a href=\"https:\/\/web.stanford.edu\/~jurafsky\/slp3\/10.pdf\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">(Jurafsky and Martin, 2009)<\/span><\/a><span style=\"font-weight: 400\">. Encoder-decoder language models can be naturally used for text generation tasks. They also work for non-generation tasks that can be reformulated as generation problems in the form of prompts. For example,<\/span><a href=\"https:\/\/arxiv.org\/abs\/2106.01760\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">information extraction<\/span><\/a><span style=\"font-weight: 400\"> and<\/span><a href=\"https:\/\/arxiv.org\/abs\/2005.00700\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">question answering<\/span><\/a><b>.<\/b><\/li>\n<\/ol>\n<p style=\"padding-left: 80px\"><b>Encoder-decoder LMs:<\/b><a href=\"https:\/\/arxiv.org\/abs\/1905.03197\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">UniLM 1<\/span><\/a><span style=\"font-weight: 400\">, <\/span><a href=\"https:\/\/arxiv.org\/abs\/2002.12804\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400\">\u00a0<\/span><span style=\"font-weight: 400\">UniLM 2<\/span><\/a><span style=\"font-weight: 400\">,<\/span><a href=\"https:\/\/arxiv.org\/abs\/2012.15674\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">ERNIE-M<\/span><\/a><span style=\"font-weight: 400\">,<\/span><a href=\"https:\/\/jmlr.org\/papers\/v21\/20-074.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">T5<\/span><\/a><span style=\"font-weight: 400\">,<\/span><a href=\"https:\/\/aclanthology.org\/2020.acl-main.703\/\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">BART<\/span><\/a><span style=\"font-weight: 400\">,<\/span><a href=\"https:\/\/arxiv.org\/abs\/1905.02450\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">MASS<\/span><\/a><\/p>\n<p style=\"padding-left: 80px\"><b>Application:<\/b><span style=\"font-weight: 400\"> text classification &amp; text generation<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Designing-the-Prompt-Prompt-Engineering\"><\/span><b>Designing the Prompt (Prompt Engineering)<\/b><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400\">After choosing the LM, the next step is to design the prompt. Depending on their shape, text prompts can be categorized into cloze prompts and prefix prompts.<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><b>Cloze prompts<\/b><span style=\"font-weight: 400\"> are prompts in which one or more positions are hidden (masked) from the LM. The task of the LM is to fill these masked positions with text strings.<\/span><\/li>\n<\/ul>\n<div class=\"bg-color-blue\">\n<p style=\"padding-left: 40px\"><span style=\"font-weight: 400\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"font-weight: 400\">I don\u2019t like this movie at all. It was such a [MASK] movie. I would [MASK] recommend it.<\/span><\/p>\n<\/div>\n<ul>\n<li style=\"font-weight: 400\"><b>Prefix prompts<\/b><span style=\"font-weight: 400\"> are prompts that do not contain masked positions. The prompt is formulated as a text that should be continued by the LM.<\/span><\/li>\n<\/ul>\n<div class=\"bg-color-blue\">\n<p style=\"padding-left: 40px\"><span style=\"font-weight: 400\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"font-weight: 400\">The translation of \u201cIch arbeite von zu Hause aus\u201c is _______________<\/span><\/p>\n<\/div>\n<p><span style=\"font-weight: 400\">When choosing the prompt, we consider both the target task and the LM. For text generation tasks with a left-to-right autoregressive LM, prefix prompts are a good choice because they align with the left-to-right nature of the model and how the model is trained. Bidirectional models often underperform in text generation<\/span><a href=\"https:\/\/arxiv.org\/abs\/1908.04332\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">(Mangal et al., 2019)<\/span><\/a><span style=\"font-weight: 400\">. For classification tasks with a masked LM, cloze prompts are a good solution as they match the pre-training objective of the LM. Encoder-decoder LMs with original text reconstruction objectives are more versatile and can be used with both cloze and prefix prompts. You may want to take a look at the paper from<\/span><a href=\"https:\/\/www.semanticscholar.org\/paper\/Pre-train%2C-Prompt%2C-and-Predict%3A-A-Systematic-Survey-Liu-Yuan\/28692beece311a90f5fa1ca2ec9d0c2ce293d069\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">Liu et al. (2021)<\/span><\/a><span style=\"font-weight: 400\">. They have provided a very useful and comprehensive list of language models and their applicable tasks (page 46).<\/span><\/p>\n<p><span style=\"font-weight: 400\">There are two ways to obtain the prompts: create them manually, or use an algorithm to compute them automatically. Typically, a prompt consists of three components: <span style=\"color: #339966\">actual input<\/span>, <span style=\"color: #ff99cc\">task description<\/span>, and optionally some <span style=\"color: #6596cf\">demonstrations.<\/span><\/span><\/p>\n<div class=\"bg-color-blue\">\n<p><span style=\"color: #6596cf\">The service was rude. This review is negative.<\/span><br \/>\n<span style=\"color: #6596cf\">The room was clean and beautifully decorated. This review is positive.<\/span><br \/>\n<span style=\"font-weight: 400\"><span style=\"color: #339966\">I love this product!<\/span> <span style=\"color: #ff99cc\">This review is\u00a0 _______<\/span><\/span><\/p>\n<\/div>\n<h4><span class=\"ez-toc-section\" id=\"Manually-designed-prompts\"><\/span><b>Manually designed prompts<\/b><span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p><span style=\"font-weight: 400\">One way to obtain the prompts is to design them manually. This process can require a lot of trial and error. You may use your intuition to formulate the task description, arrange the components, and see if the prompt works well with your language model. Below are some commonly used prompts taken from<\/span><a href=\"https:\/\/www.semanticscholar.org\/paper\/Pre-train%2C-Prompt%2C-and-Predict%3A-A-Systematic-Survey-Liu-Yuan\/28692beece311a90f5fa1ca2ec9d0c2ce293d069\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400\"> Liu et al. (2021).<\/span><\/a><span style=\"font-weight: 400\"> You can try the prompt with OpenAI&#8217;s GPT3 playground and Huggingface&#8217;s <a href=\"https:\/\/huggingface.co\/bert-base-uncased\" target=\"_blank\" rel=\"noopener\">BERT API<\/a>.<\/span><\/p>\n<h4><span class=\"ez-toc-section\" id=\"Text-classification-tasks\"><\/span><b>Text classification tasks<\/b><span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p>Sentiment analysis<\/p>\n<div class=\"bg-color-blue\">\n<p><span style=\"font-weight: 400\"><span style=\"color: #339966\">I love this product!<\/span><span style=\"color: #339966\"> \u00a0<\/span><span style=\"color: #ff99cc\">Is this review positive?\u00a0 [MASK]<\/span><\/span><\/p>\n<\/div>\n<div class=\"bg-color-blue\">\n<p><b><span style=\"font-weight: 400\"><span style=\"color: #339966\">I love this product!\u00a0 <\/span><\/span><span style=\"color: #ff99cc\"><span style=\"font-weight: 400\">It was<\/span><span style=\"font-weight: 400\"> [MASK].<\/span><\/span><\/b><\/p>\n<\/div>\n<p><span style=\"font-weight: 400\">For text classification tasks, the answer generated by the language model is later mapped to the actual class label. See the section \u201cAnswer Engineering\u201c below for more details.<\/span><\/p>\n<h4><span class=\"ez-toc-section\" id=\"Text-generation-tasks\"><\/span><b>Text generation tasks<\/b><span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p>Text Summarization<\/p>\n<div class=\"bg-color-blue\">\n<p><span style=\"font-weight: 400\"><span style=\"color: #ff99cc\">Text:<\/span> <span style=\"color: #339966\">[input text]<\/span> <span style=\"color: #ff99cc\">Summary:<\/span> ___________<\/span><\/p>\n<\/div>\n<div class=\"bg-color-blue\">\n<p><span style=\"font-weight: 400\"><span style=\"color: #339966\">[input text]<\/span> <span style=\"color: #ff99cc\">TL;DR:<\/span> ___________<\/span><\/p>\n<\/div>\n<div class=\"bg-color-blue\">\n<p><span style=\"font-weight: 400\"><span style=\"color: #339966\">[input text]<\/span> <span style=\"color: #ff99cc\">In summary,<\/span> ___________<\/span><\/p>\n<\/div>\n<p>Machine Translation<\/p>\n<div class=\"bg-color-blue\">\n<p><span style=\"font-weight: 400\"><span style=\"color: #ff99cc\">French:<\/span> <span style=\"color: #339966\">[French sentence]<\/span> <span style=\"color: #ff99cc\">English:<\/span> ___________<\/span><\/p>\n<\/div>\n<div class=\"bg-color-blue\">\n<p><span style=\"font-weight: 400\"><span style=\"color: #ff99cc\">A French sentence is provided:<\/span> <span style=\"color: #339966\">[French sentence]<\/span><\/span><\/p>\n<p><span style=\"font-weight: 400\"><span style=\"color: #ff99cc\">The French translator translates the sentence into English:<\/span> ___________<\/span><\/p>\n<\/div>\n<p><span style=\"font-weight: 400\">If you have some labeled data, you may want to add them as <\/span><b>demonstrations<\/b><span style=\"font-weight: 400\"> to the prompt. We usually use a couple of demonstrations per class.\u00a0<\/span><\/p>\n<div class=\"bg-color-blue\">\n<p><span style=\"font-weight: 400;color: #6596cf\">German: Der Himmel ist blau\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 English: The sky is blue\u00a0 \u00a0 \u00a0\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\"><span style=\"color: #6596cf\">German: Heute ist es sonnig\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0English: Today it is sunny \u00a0<\/span> \u00a0 \u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\"><span style=\"color: #ff99cc\">German:<\/span> <span style=\"color: #339966\">Ich liebe meinen Hund<\/span>\u00a0 \u00a0 \u00a0 <span style=\"color: #ff99cc\">English:<\/span>\u00a0 ___________\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0<\/span><\/p>\n<\/div>\n<p><span style=\"font-weight: 400\">The order and content of each component can greatly affect the model prediction (<\/span><a href=\"https:\/\/arxiv.org\/abs\/2104.08786\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400\">Lu et al., 2021<\/span><\/a><span style=\"font-weight: 400\">;<\/span><a href=\"https:\/\/arxiv.org\/abs\/2112.08633#:~:text=In-context\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">Rubin et al., 2022<\/span><\/a><span style=\"font-weight: 400\">;<\/span><a href=\"https:\/\/arxiv.org\/abs\/2012.15723\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">Gao et al., 2020<\/span><\/a><span style=\"font-weight: 400\">). Giving more textual context does not always lead to better performance. Sometimes a simple prompt could also yield better performance than a complex one (<\/span><a href=\"https:\/\/arxiv.org\/abs\/2005.04611\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400\">Petroni et al., 2020<\/span><\/a><span style=\"font-weight: 400\">;<\/span><a href=\"https:\/\/arxiv.org\/abs\/2102.07350\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">Reynolds and McDonell, 2021<\/span><\/a><span style=\"font-weight: 400\">). Moreover, it is not always helpful to add demonstrations (<\/span><a href=\"https:\/\/arxiv.org\/abs\/2102.07350\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400\">Reynolds and McDonell, 2021<\/span><\/a><span style=\"font-weight: 400\">). Check out the paper from<\/span><a href=\"https:\/\/arxiv.org\/abs\/2104.08773\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">Mishra et al. (2021)<\/span><\/a><span style=\"font-weight: 400\"> for a guideline on how to construct natural language prompts and things to avoid.<\/span><\/p>\n<p><span style=\"font-weight: 400\">There is also an <\/span><b>ensemble approach<\/b><span style=\"font-weight: 400\"> that makes use of multiple prompts. The input is applied to several different prompt templates. All these prompts are then passed to the model one by one. The final output can generally be obtained by averaging the results from all the prompts (<\/span><a href=\"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00324\/96460\/How-Can-We-Know-What-Language-Models-Know\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400\">Jiang et al., 2020<\/span><\/a><span style=\"font-weight: 400\">;<\/span><a href=\"https:\/\/arxiv.org\/abs\/2012.11926\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">Schick and Sch\u00fctze, 2020<\/span><\/a><span style=\"font-weight: 400\">,<\/span><a href=\"https:\/\/arxiv.org\/abs\/2001.07676\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">\u00a02021a<\/span><\/a><span style=\"font-weight: 400\">,<\/span><a href=\"https:\/\/arxiv.org\/abs\/2009.07118\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">2021b<\/span><\/a><span style=\"font-weight: 400\">;<\/span><a href=\"https:\/\/arxiv.org\/abs\/2104.07210\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400\"> Liu et al., 2021<\/span><\/a><span style=\"font-weight: 400\">).<\/span><\/p>\n<p><span style=\"font-weight: 400\">To give you some ideas, here are works that provide prompt templates for a variety of tasks that you can try:<\/span><a href=\"https:\/\/arxiv.org\/abs\/2005.14165\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400\"> Brown et al. (2020)<\/span><\/a><span style=\"font-weight: 400\">, <\/span><a href=\"https:\/\/aclanthology.org\/D19-1250\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400\">Petroni et al. (2019)<\/span><\/a><span style=\"font-weight: 400\">, <\/span><a href=\"https:\/\/arxiv.org\/abs\/2012.11926\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400\">Schick and Sch\u00fctze (2020<\/span><\/a><span style=\"font-weight: 400\">,<\/span><a href=\"https:\/\/arxiv.org\/abs\/2001.07676\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">2021a<\/span><\/a><span style=\"font-weight: 400\">,<\/span><a href=\"https:\/\/arxiv.org\/abs\/2009.07118\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400\"> 2021b),<\/span><\/a><span style=\"font-weight: 400\"> and more in the further reading section at the end of this article.<\/span><\/p>\n<h4><span class=\"ez-toc-section\" id=\"Automatic-prompt-search\"><\/span><span class=\"h4\"><b>Automatic prompt search<\/b><\/span><span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p><span style=\"font-weight: 400\">As you can see, manually designing the prompts is not an easy task and could require a lot of experimentation and expertise. To address this problem, many methods to automate the template design process have been proposed. Example works on<\/span><b> automatic textual prompts <\/b><span style=\"font-weight: 400\">(also called<\/span><b> discrete prompt \/ hard prompt)<\/b><span style=\"font-weight: 400\"> include prompt mining (<\/span><a href=\"https:\/\/arxiv.org\/abs\/1911.12543\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400\">Jiang et al., 2020<\/span><\/a><span style=\"font-weight: 400\">), prompt paraphrasing (<\/span><a href=\"https:\/\/arxiv.org\/abs\/1911.12543\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400\">Jiang et al. 2020<\/span><\/a><span style=\"font-weight: 400\">;<\/span><a href=\"https:\/\/arxiv.org\/abs\/2106.11520\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">Yuan et al., 2021<\/span><\/a><span style=\"font-weight: 400\">;<\/span><a href=\"https:\/\/aclanthology.org\/2021.eacl-main.316\/\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">Haviv et al., 2021<\/span><\/a><span style=\"font-weight: 400\">), and gradient-based search<\/span><a href=\"https:\/\/aclanthology.org\/D19-1221\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400\"> ( <\/span><span style=\"font-weight: 400\">Wallace et al., 2019<\/span><\/a><span style=\"font-weight: 400\">;<\/span><a href=\"https:\/\/arxiv.org\/abs\/2010.15980\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">Shin et al., 2020<\/span><\/a><span style=\"font-weight: 400\">). Note that most approaches require a large amount of annotated data to find the prompts, which arguably may not be considered true zero-shot or few-shot.<\/span><\/p>\n<p><span style=\"font-weight: 400\">There is another form of prompts, namely <\/span><b>continuous prompts (soft prompts),<\/b><span style=\"font-weight: 400\"> where the prompt is expressed directly in the language model\u2019s embedding space. <\/span><span style=\"font-weight: 400\">The advantage is that the prompt has its own parameters in the model that can be tuned on the training data of the target task, rather than simply being represented like other input tokens. An<\/span><span style=\"font-weight: 400\"> example of a continuous prompt approach is prefix tuning. <\/span><b>Prefix tuning <\/b><span style=\"font-weight: 400\">(<\/span><a href=\"https:\/\/arxiv.org\/abs\/2101.00190\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400\">Li and Liang, 2021<\/span><\/a><span style=\"font-weight: 400\">) can be viewed as a lightweight alternative to fine-tuning. Here, some randomized word vectors (called the prefix) are prepended to the input vector and only these prefix vectors will be trained, by using a small amount of training data. The rest of the LM parameters are frozen. In general, initializing the prefix with embeddings of some real words results in better performance than initializing it with purely random vectors. Other methods include<\/span><a href=\"https:\/\/arxiv.org\/abs\/2104.08691\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">Lester et al., 2021<\/span><\/a><span style=\"font-weight: 400\">;<\/span><a href=\"https:\/\/arxiv.org\/abs\/2104.05240\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">Zhong et al., 2021<\/span><\/a><span style=\"font-weight: 400\">;<\/span><a href=\"https:\/\/aclanthology.org\/2021.naacl-main.410\/\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">Qin and Eisner, 2021<\/span><\/a><span style=\"font-weight: 400\">;<\/span><a href=\"https:\/\/arxiv.org\/abs\/2101.00121\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">Hambardzumyan et al., 2021<\/span><\/a><span style=\"font-weight: 400\">;<\/span><a href=\"https:\/\/arxiv.org\/abs\/2103.10385\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">Liu et al., 2021<\/span><\/a><span style=\"font-weight: 400\">;<\/span><a href=\"https:\/\/arxiv.org\/abs\/2105.11259\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">Han et al., 2021<\/span><\/a><span style=\"font-weight: 400\">.<\/span><\/p>\n<h4><span class=\"ez-toc-section\" id=\"Prompt-based-fine-tuning\"><\/span><b>Prompt-based fine-tuning<\/b><span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p><span style=\"font-weight: 400\">Another interesting approach that has shown competitive performance, particularly in few-shot scenarios, is the combination of fine-tuning and prompting. This method is similar to the standard fine-tuning but it transforms the input samples into prompts before using them in fine-tuning. Examples of these methods include<\/span><a href=\"https:\/\/arxiv.org\/abs\/2012.15723\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">LM-BFF<\/span><\/a><span style=\"font-weight: 400\">,<\/span><a href=\"https:\/\/arxiv.org\/abs\/2001.07676\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">PET for text classification<\/span><\/a><span style=\"font-weight: 400\">,<\/span><a href=\"https:\/\/arxiv.org\/abs\/2012.11926\" target=\"_blank\" rel=\"noopener\">\u00a0and\u00a0<span style=\"font-weight: 400\">PET for text generation<\/span><\/a><span style=\"font-weight: 400\">. This kind of fine-tuning allows the language model to better understand the task that the prompt is asking it to perform.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Answer-Engineering\"><\/span><b>Answer Engineering<\/b><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400\">For text generation, the output of the language model can usually be used directly as the final output, e.g., in machine translation or text summarization. However, for tasks that aim to classify the input into a specific class, we need an additional step to assign the LM output to the target class. For example, to fill in a masked token in a cloze-prompt, the BERT model basically calculates the probability that each word in the vocabulary occurs at that position and selects the word with the highest probability as the answer. To derive a class label from this answer, we need to define a mapping from the model answer to the class label. This process is also called <\/span><b>label mapping<\/b><span style=\"font-weight: 400\"> and the mapping is called <\/span><b>the verbalizer<\/b><span style=\"font-weight: 400\">.<\/span><\/p>\n<p><span style=\"font-weight: 400\">For example, sentiment analysis with 3 classes<\/span><\/p>\n<div class=\"bg-color-blue\">\n<p><span style=\"font-weight: 400\">class positive = { \u201cgreat\u201c, \u201cgood\u201c, \u201cnice\u201c }<\/span><\/p>\n<p><span style=\"font-weight: 400\">class negative = { \u201cterrible, \u201cbad\u201c, worse\u201c }<\/span><\/p>\n<p><span style=\"font-weight: 400\">class neutral = { \u201cOK\u201c, \u201cfine\u201c, \u201cacceptable\u201c}<\/span><\/p>\n<\/div>\n<p><span style=\"font-weight: 400\">The class that contains the word with the highest probability will be selected as the final class. For example, if \u201c<\/span><i><span style=\"font-weight: 400\">good\u201c<\/span><\/i><span style=\"font-weight: 400\">\u00a0gets the highest probability, we assign \u201c<\/span><i><span style=\"font-weight: 400\">positive\u201c\u00a0<\/span><\/i><span style=\"font-weight: 400\">as the final class. This also means that the user will need access to the model output probabilities, which could be difficult if the model is not open-source. Fortunately, for GPT3, there is a<\/span><a href=\"https:\/\/medium.com\/edge-analytics\/getting-the-most-out-of-gpt-3-based-text-classifiers-part-three-77305628f472\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">way<\/span><\/a><span style=\"font-weight: 400\"> to achieve this.<\/span><\/p>\n<p><span style=\"font-weight: 400\">The design of the label mapping is as important as the design of the prompt. The choice of representative words for each class obviously influences the performance. Works on answer engineering include, for example,<\/span><a href=\"https:\/\/arxiv.org\/abs\/2001.07676\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">Schick et al., 2021<\/span><\/a><span style=\"font-weight: 400\">;<\/span><a href=\"https:\/\/arxiv.org\/abs\/2012.11926\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">Schick and Sch\u00fctze, 2020<\/span><\/a><span style=\"font-weight: 400\">;<\/span><a href=\"https:\/\/arxiv.org\/abs\/2012.15723\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400\">Gao et al., 2020<\/span><\/a><span style=\"font-weight: 400\">.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span><b>Conclusion<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400\">Prompt engineering is a powerful technique that allows us to employ a pre-trained language model for a variety of NLP tasks without fine-tuning it. Instead of fine-tuning, the model is given a natural language task description directly along with the input. This technique is particularly useful for large LMs such as GPT3, where the model is so large that fine-tuning becomes difficult or very expensive. It is also applicable to smaller language models such as BERT or RoBERTa in a few-shot setting. The biggest challenge, however, lies in designing the prompt so that the model can understand. How to choose the appropriate language model and deriving the final class prediction are also tricky decisions. We hope that our guide to selecting these components will give you a good overview of the topic and help you get started with your prompt engineering project.<\/span><\/p>\n<p><b>Further readings:<\/b><\/p>\n<ul>\n<li>Try out GPT3<\/li>\n<li><a href=\"https:\/\/huggingface.co\/bert-base-uncased\" target=\"_blank\" rel=\"noopener\">Try out BERT<\/a><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Example prompts for GPT3 from OpenAI\u00a0<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Timeline of Prompt Engineering Progress<\/span><\/li>\n<li style=\"font-weight: 400\"><a href=\"https:\/\/www.gwern.net\/GPT-3\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400\">GPT-3 Creative Fiction<\/span><\/a><\/li>\n<li style=\"font-weight: 400\"><a href=\"https:\/\/github.com\/thunlp\/OpenPrompt\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400\">OpenPrompt<\/span><\/a><span style=\"font-weight: 400\">: open-source prompt learning toolkit<\/span><\/li>\n<li style=\"font-weight: 400\"><a href=\"https:\/\/github.com\/thunlp\/PromptPapers\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400\">Must-read papers on prompt learning<\/span><\/a><span style=\"font-weight: 400\"> organized by topics<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Pretrain, Prompt, Predict<\/span><span style=\"font-weight: 400\">: a collection of prompt learning resources such as frequent updates of the latest research, relevant slides, etc.<\/span><\/li>\n<\/ul>\n<ul>\n<li style=\"font-weight: 400\"><a href=\"https:\/\/arxiv.org\/abs\/2101.06804\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400\">What Makes Good In-Context Examples for GPT-3?<\/span><\/a><span style=\"font-weight: 400\"> (2021)<\/span><\/li>\n<li style=\"font-weight: 400\"><a href=\"https:\/\/arxiv.org\/abs\/2103.08493\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400\">How Many Data Points is a Prompt Worth?<\/span><\/a><span style=\"font-weight: 400\"> (2021)<\/span><\/li>\n<li style=\"font-weight: 400\"><a href=\"https:\/\/arxiv.org\/abs\/2104.08315\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400\">Surface Form Competition-Why the Highest Probability Answer Isn\u2019t Always Right<\/span><\/a><span style=\"font-weight: 400\"> (2021)<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this blog post, I will give you an overview of prompt engineering, talk about its fascinating capabilities, the definition of zero-shot and few-shot learning, and provide a practical guide on how to adopt prompt engineering for your task of interest.<\/p>\n","protected":false},"author":293,"featured_media":36561,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"ep_exclude_from_search":false,"footnotes":""},"tags":[206,151,140],"service":[76,431],"coauthors":[{"id":293,"display_name":"Suteera Seeha","user_nicename":"sseeha"}],"class_list":["post-36812","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","tag-data-science","tag-deep-learning","tag-machine-learning","service-artificial-intelligence","service-data-science"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Prompt Engineering and Zero-Shot\/Few-Shot Learning [Guide] - inovex GmbH<\/title>\n<meta name=\"description\" content=\"This blog post gives you an overview of prompt engineering, the definition of zero-shot and few-shot learning, and provides a practical guide.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Prompt Engineering and Zero-Shot\/Few-Shot Learning [Guide] - inovex GmbH\" \/>\n<meta property=\"og:description\" content=\"This blog post gives you an overview of prompt engineering, the definition of zero-shot and few-shot learning, and provides a practical guide.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/\" \/>\n<meta property=\"og:site_name\" content=\"inovex GmbH\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/inovexde\" \/>\n<meta property=\"article:published_time\" content=\"2022-07-08T07:57:06+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-05-14T05:21:13+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/cover_final2.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1391\" \/>\n\t<meta property=\"og:image:height\" content=\"800\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Suteera Seeha\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/cover_final2-1024x589.png\" \/>\n<meta name=\"twitter:creator\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:site\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"Suteera Seeha\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"17\u00a0Minuten\" \/>\n\t<meta name=\"twitter:label3\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data3\" content=\"Suteera Seeha\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/\"},\"author\":{\"name\":\"Suteera Seeha\",\"@id\":\"https:\/\/www.inovex.de\/de\/#\/schema\/person\/361a5df2031157225ea4a94a6cbdfeae\"},\"headline\":\"Prompt Engineering and Zero-Shot\/Few-Shot Learning [Guide]\",\"datePublished\":\"2022-07-08T07:57:06+00:00\",\"dateModified\":\"2024-05-14T05:21:13+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/\"},\"wordCount\":2856,\"commentCount\":4,\"publisher\":{\"@id\":\"https:\/\/www.inovex.de\/de\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/cover_final2.png\",\"keywords\":[\"Data Science\",\"Deep Learning\",\"Machine Learning\"],\"articleSection\":[\"Analytics\",\"English Content\",\"General\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/\",\"url\":\"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/\",\"name\":\"Prompt Engineering and Zero-Shot\/Few-Shot Learning [Guide] - inovex GmbH\",\"isPartOf\":{\"@id\":\"https:\/\/www.inovex.de\/de\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/cover_final2.png\",\"datePublished\":\"2022-07-08T07:57:06+00:00\",\"dateModified\":\"2024-05-14T05:21:13+00:00\",\"description\":\"This blog post gives you an overview of prompt engineering, the definition of zero-shot and few-shot learning, and provides a practical guide.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/#primaryimage\",\"url\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/cover_final2.png\",\"contentUrl\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/cover_final2.png\",\"width\":1391,\"height\":800,\"caption\":\"A translation prompt with crayons in the background\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.inovex.de\/de\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Prompt Engineering and Zero-Shot\/Few-Shot Learning [Guide]\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.inovex.de\/de\/#website\",\"url\":\"https:\/\/www.inovex.de\/de\/\",\"name\":\"inovex GmbH\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/www.inovex.de\/de\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.inovex.de\/de\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.inovex.de\/de\/#organization\",\"name\":\"inovex GmbH\",\"url\":\"https:\/\/www.inovex.de\/de\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png\",\"contentUrl\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png\",\"width\":1921,\"height\":1081,\"caption\":\"inovex GmbH\"},\"image\":{\"@id\":\"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/inovexde\",\"https:\/\/x.com\/inovexgmbh\",\"https:\/\/www.instagram.com\/inovexlife\/\",\"https:\/\/www.linkedin.com\/company\/inovex\",\"https:\/\/www.youtube.com\/channel\/UC7r66GT14hROB_RQsQBAQUQ\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.inovex.de\/de\/#\/schema\/person\/361a5df2031157225ea4a94a6cbdfeae\",\"name\":\"Suteera Seeha\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/www.inovex.de\/de\/#\/schema\/person\/image\/095283dcf5e4e92f88549871bb1fef68\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/f100ab39473fc0755243b406bfab68ad324f453b5583adc016ec89ffdddb9a6a?s=96&d=retro&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/f100ab39473fc0755243b406bfab68ad324f453b5583adc016ec89ffdddb9a6a?s=96&d=retro&r=g\",\"caption\":\"Suteera Seeha\"},\"url\":\"https:\/\/www.inovex.de\/de\/blog\/author\/sseeha\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Prompt Engineering and Zero-Shot\/Few-Shot Learning [Guide] - inovex GmbH","description":"This blog post gives you an overview of prompt engineering, the definition of zero-shot and few-shot learning, and provides a practical guide.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/","og_locale":"de_DE","og_type":"article","og_title":"Prompt Engineering and Zero-Shot\/Few-Shot Learning [Guide] - inovex GmbH","og_description":"This blog post gives you an overview of prompt engineering, the definition of zero-shot and few-shot learning, and provides a practical guide.","og_url":"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/","og_site_name":"inovex GmbH","article_publisher":"https:\/\/www.facebook.com\/inovexde","article_published_time":"2022-07-08T07:57:06+00:00","article_modified_time":"2024-05-14T05:21:13+00:00","og_image":[{"width":1391,"height":800,"url":"https:\/\/www.inovex.de\/wp-content\/uploads\/cover_final2.png","type":"image\/png"}],"author":"Suteera Seeha","twitter_card":"summary_large_image","twitter_image":"https:\/\/www.inovex.de\/wp-content\/uploads\/cover_final2-1024x589.png","twitter_creator":"@inovexgmbh","twitter_site":"@inovexgmbh","twitter_misc":{"Verfasst von":"Suteera Seeha","Gesch\u00e4tzte Lesezeit":"17\u00a0Minuten","Written by":"Suteera Seeha"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/#article","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/"},"author":{"name":"Suteera Seeha","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/361a5df2031157225ea4a94a6cbdfeae"},"headline":"Prompt Engineering and Zero-Shot\/Few-Shot Learning [Guide]","datePublished":"2022-07-08T07:57:06+00:00","dateModified":"2024-05-14T05:21:13+00:00","mainEntityOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/"},"wordCount":2856,"commentCount":4,"publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/cover_final2.png","keywords":["Data Science","Deep Learning","Machine Learning"],"articleSection":["Analytics","English Content","General"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/","url":"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/","name":"Prompt Engineering and Zero-Shot\/Few-Shot Learning [Guide] - inovex GmbH","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/#primaryimage"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/cover_final2.png","datePublished":"2022-07-08T07:57:06+00:00","dateModified":"2024-05-14T05:21:13+00:00","description":"This blog post gives you an overview of prompt engineering, the definition of zero-shot and few-shot learning, and provides a practical guide.","breadcrumb":{"@id":"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/#primaryimage","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/cover_final2.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/cover_final2.png","width":1391,"height":800,"caption":"A translation prompt with crayons in the background"},{"@type":"BreadcrumbList","@id":"https:\/\/www.inovex.de\/de\/blog\/prompt-engineering-guide\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.inovex.de\/de\/"},{"@type":"ListItem","position":2,"name":"Prompt Engineering and Zero-Shot\/Few-Shot Learning [Guide]"}]},{"@type":"WebSite","@id":"https:\/\/www.inovex.de\/de\/#website","url":"https:\/\/www.inovex.de\/de\/","name":"inovex GmbH","description":"","publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.inovex.de\/de\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/www.inovex.de\/de\/#organization","name":"inovex GmbH","url":"https:\/\/www.inovex.de\/de\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","width":1921,"height":1081,"caption":"inovex GmbH"},"image":{"@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/inovexde","https:\/\/x.com\/inovexgmbh","https:\/\/www.instagram.com\/inovexlife\/","https:\/\/www.linkedin.com\/company\/inovex","https:\/\/www.youtube.com\/channel\/UC7r66GT14hROB_RQsQBAQUQ"]},{"@type":"Person","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/361a5df2031157225ea4a94a6cbdfeae","name":"Suteera Seeha","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/image\/095283dcf5e4e92f88549871bb1fef68","url":"https:\/\/secure.gravatar.com\/avatar\/f100ab39473fc0755243b406bfab68ad324f453b5583adc016ec89ffdddb9a6a?s=96&d=retro&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/f100ab39473fc0755243b406bfab68ad324f453b5583adc016ec89ffdddb9a6a?s=96&d=retro&r=g","caption":"Suteera Seeha"},"url":"https:\/\/www.inovex.de\/de\/blog\/author\/sseeha\/"}]}},"_links":{"self":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/36812","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/users\/293"}],"replies":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/comments?post=36812"}],"version-history":[{"count":8,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/36812\/revisions"}],"predecessor-version":[{"id":53653,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/36812\/revisions\/53653"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media\/36561"}],"wp:attachment":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media?parent=36812"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/tags?post=36812"},{"taxonomy":"service","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/service?post=36812"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/coauthors?post=36812"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}