{"id":48374,"date":"2023-09-21T09:48:42","date_gmt":"2023-09-21T07:48:42","guid":{"rendered":"https:\/\/www.inovex.de\/?p=48374"},"modified":"2024-05-07T14:40:53","modified_gmt":"2024-05-07T12:40:53","slug":"code-assistant-how-to-self-host-your-own","status":"publish","type":"post","link":"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/","title":{"rendered":"Code Assistant: How to Self-Host Your Own"},"content":{"rendered":"<p>The release of the Code Assistant <a href=\"https:\/\/github.com\/features\/copilot\">GitHub Copilot<\/a> to the public in June 2021 marked the beginning of a new kind of helper in the tool belt of developers \u2013 alongside existing ones such as for example linters and formatters.<\/p>\n<p>While basic code completion has been on the market <a href=\"https:\/\/github.com\/kiteco\/vscode-plugin\">for years<\/a> with varying degree of complexity, a tool that <em>understands<\/em> code and completes it in a meaningful way that transcends simple parameter suggestions was a novelty.<\/p>\n<p>This blog article is showing how to build a state-of-the-art Code Assistant using several open source tools created by <a href=\"https:\/\/huggingface.co\/\" rel=\"\">Hugging Face<\/a> \ud83e\udd17:<\/p>\n<ul>\n<li><a href=\"https:\/\/github.com\/huggingface\/text-generation-inference\">Text Generation Inference<\/a>, the model inference API<\/li>\n<li><a href=\"https:\/\/github.com\/huggingface\/huggingface-vscode\">VSCode extension for TGI<\/a>, the extension that lets you access the model from Visual Studio Code<\/li>\n<li><a href=\"https:\/\/github.com\/huggingface\/chat-ui\">Chat UI<\/a>, a ChatGPT-like UI for the model<\/li>\n<\/ul>\n<p>&#8230; all via a single <strong>docker-compose<\/strong> file \ud83d\udd25! This file and all the others discussed in this article are available in an <a href=\"https:\/\/github.com\/inovex\/blog-code-assistant\">accompanying repository<\/a>.<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_79_2 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\"><p class=\"ez-toc-title\" style=\"cursor:inherit\"><\/p>\n<\/div><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/#Wait%E2%80%A6-Have-We-Been-There-Already\" >Wait&#8230; Have We Been There Already?<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/#Challenge-Accepted\" >Challenge Accepted<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/#Why-Bother-with-Self-Hosting\" >Why Bother with Self-Hosting?<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/#Prerequisites\" >Prerequisites<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/#First-Component-The-Inference-Engine\" >First Component: The Inference Engine<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/#The-Model-WizardCoder\" >The Model: WizardCoder<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/#Setting-up-Text-Generation-Inference\" >Setting up Text Generation Inference<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/#Second-Component-The-VSCode-Extension\" >Second Component: The VSCode Extension<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/#Setting-up-the-Extension\" >Setting up the Extension<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/#Third-Component-The-Chat-UI\" >Third Component: The Chat UI<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/#Setting-up-Chat-UI\" >Setting up Chat UI<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/#Putting-Everything-Together\" >Putting Everything Together<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/#Bonus-Adding-HTTPS\" >Bonus: Adding HTTPS<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/#This-is-it\" >This is it!<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"Wait%E2%80%A6-Have-We-Been-There-Already\"><\/span>Wait&#8230; Have We Been There Already?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Kite was one of the companies that provided a more advanced variant of code completion and gave up on the task for <a href=\"https:\/\/qz.com\/1043614\/this-startup-learned-the-hard-way-that-you-do-not-piss-off-open-source-programmers\">various reasons<\/a>. In late 2022 the company gave the following explanation:<\/p>\n<blockquote><p><em>First, we failed to deliver our vision of AI-assisted programming because we were 10+ years too early to market, i.e. the tech is not ready yet.<\/em><\/p>\n<p><em>We built the most-advanced AI for helping developers at the time, but it fell short of the 10\u00d7 improvement required to break through because the state of the art for ML on code is not good enough. You can see this in Github Copilot, which is built by Github in collaboration with Open AI. As of late 2022, Copilot shows a lot of promise but still has a long way to go.<\/em><\/p><\/blockquote>\n<p>But in \u201clate\u201c\u00a02023 you can run a publicly available model that even beats ChatGPT and old versions of GPT-4 on your personal computer! One year in AI moves blazingly fast and can cover a decade&#8230;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Challenge-Accepted\"><\/span>Challenge Accepted<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Ever since Copilot was released, the open source LLM community tried its best to replicate its functionality. ChatGPT and GPT-4 raised the bar even higher. The <a href=\"https:\/\/huggingface.co\/bigcode\/starcoder\">release<\/a> of StarCoder by the <a href=\"https:\/\/www.bigcode-project.org\/\">BigCode<\/a> project was a major milestone for the open LLM community: The first truly powerful large language model for code generation that was released to the public under a <a href=\"https:\/\/huggingface.co\/spaces\/bigcode\/bigcode-model-license-agreement\">responsible but nonetheless open license<\/a>: The code wars had begun and <a href=\"https:\/\/arxiv.org\/abs\/2305.06161\">the source was with StarCoder<\/a>.<\/p>\n<p>While it still performed considerably worse than the proprietary and walled GPT-4 (67 in March) and ChatGPT (48.1) models on the HumanEval benchmark with 32.9 points, it positioned itself successfully <a href=\"https:\/\/huggingface.co\/spaces\/bigcode\/bigcode-models-leaderboard\">within striking distance<\/a>.<\/p>\n<p>The releases of <a href=\"https:\/\/ai.meta.com\/llama\/\">Llama 2<\/a> and subsequently <a href=\"https:\/\/about.fb.com\/news\/2023\/08\/code-llama-ai-for-coding\/\">Code Llama<\/a> \u2013 both by Meta \u2013 are also important waypoints. Code Llama achieved an impressive HumanEval pass@1 score of 48.8, beating ChatGPT. A few days later WizardCoder builds on top of StarCoder, thereby achieving 73.2 pass@1 which even surpasses GPT-4&#8217;s March score!<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Why-Bother-with-Self-Hosting\"><\/span>Why Bother with Self-Hosting?<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>While Coding Assistant services like GitHub Copilot and <a href=\"https:\/\/www.tabnine.com\/\">tabnine<\/a> (allows VPC and air-gapped installs) exist already, there are many reasons to self-host one for your company or even yourself.<\/p>\n<ul>\n<li>Full control over all the moving parts, models and software<\/li>\n<li>The ability to easily fine-tune models on your own data<\/li>\n<li>No vendor lock-in<\/li>\n<li>The fact that by now many of the most capable models are public anyway<\/li>\n<li>Various compliancy reasons<\/li>\n<\/ul>\n<p>On August 22, <a href=\"https:\/\/huggingface.co\/\" rel=\"\">Hugging Face<\/a> \ud83e\udd17 announced an enterprise Code Assistant called <a class=\"c-link\" href=\"https:\/\/huggingface.co\/blog\/safecoder\" target=\"_blank\" rel=\"noopener noreferrer\" data-stringify-link=\"https:\/\/huggingface.co\/blog\/safecoder\" data-sk=\"tooltip_parent\">SafeCoder<\/a>, which brings together StarCoder (and other models), as well as an inference endpoint and a VSCode extension all in a single\u00a0managed package. SafeCoder addresses many of the points above, but hides most of its moving parts behind its managed service \u2013 by design. Luckily, the main components are open source and readily available. In the following, we will setup everything that is needed to run your very own Coding Assistant serviced by you.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Prerequisites\"><\/span>Prerequisites<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The best and most performant way to run LLM today is by leveraging GPUs or TPUs. This article assumes that you have a NVIDIA GPU with CUDA support with at least 10 Gigabytes of VRAM at your disposal. Be sure to install an <a href=\"https:\/\/github.com\/huggingface\/text-generation-inference#get-started\">up-to-date driver and CUDA version<\/a>. You will also need <a href=\"https:\/\/docs.docker.com\/compose\/install\/\">Docker<\/a> (or another container engine like <a href=\"https:\/\/podman.io\/\" rel=\"\">Podman<\/a>) and the <a href=\"https:\/\/docs.nvidia.com\/datacenter\/cloud-native\/container-toolkit\/install-guide.html\">NVIDIA Container Toolkit<\/a>.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"First-Component-The-Inference-Engine\"><\/span>First Component: The Inference Engine<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The core of the Coding Assistant is the backend that is handling the user&#8217;s completion requests and generating new tokens based on them. For this we will use huggingface&#8217;s <a href=\"https:\/\/github.com\/huggingface\/text-generation-inference\">Text Generation Inference<\/a>, which powers <a href=\"https:\/\/huggingface.co\/inference-endpoints\">Inference Endpoints<\/a> and the <a href=\"https:\/\/huggingface.co\/inference-api\">Inference API<\/a> \u2013 a well tested and vital part of huggingface&#8217;s infrastructure. Note that the license for the software was slightly <a href=\"https:\/\/github.com\/huggingface\/text-generation-inference\/issues\/726\">changed recently<\/a>: TGI (text generation inference) from 1.0 onwards uses a new license called <a href=\"https:\/\/github.com\/huggingface\/text-generation-inference\/blob\/bde25e62b33b05113519e5dbf75abda06a03328e\/LICENSE\">HFOIL 1.0<\/a>, which restricts commercial use. Olivier Dehaene, the maintainer of the project, <a href=\"https:\/\/github.com\/huggingface\/text-generation-inference\/issues\/726#issuecomment-1656592840\">summarises<\/a> the implications of the license as follows:<\/p>\n<blockquote><p><em>building and selling a chat app for example that uses TGI as a backend is ok whatever the version you use<\/em><br \/>\n<em>building and selling a Inference Endpoint like experience using TGI 1.0+ requires an agreement with HF<\/em><\/p><\/blockquote>\n<p>While this summary should give you a basic understanding of what is possible under the license, be sure to consult a lawyer to get a thorough understanding of whether your use case is covered or not.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"The-Model-WizardCoder\"><\/span>The Model: WizardCoder<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>We will <a href=\"https:\/\/huggingface.co\/TheBloke\/WizardCoder-Python-13B-V1.0-GPTQ\">use<\/a> a quantised and optimised version of a SOTA Code Assistant model called WizardCoder. There are several options available today for quantised models: GPTQ, GGML, GGUF&#8230; Tom Jobbins aka \u201cTheBloke\u201c gives a good introduction <a href=\"https:\/\/huggingface.co\/TheBloke\/wizardLM-7B-GGML\/discussions\/3\">here<\/a>.\u00a0Since GGUF is not yet available for Text Generation Inference yet, we will stick to GPTQ. For the model to run properly, you will need roughly 10 Gigabytes of available VRAM. If you happen to have more than that available, feel free to try the <a href=\"https:\/\/huggingface.co\/TheBloke\/WizardCoder-Python-34B-V1.0-GPTQ\">34B model<\/a>, or the slightly better <a href=\"https:\/\/huggingface.co\/TheBloke\/Phind-CodeLlama-34B-v2-GPTQ\">34B Phind model<\/a>, which unfortunately is not yet available in a 13B version. Also, check the \u201c<a href=\"https:\/\/huggingface.co\/spaces\/bigcode\/bigcode-models-leaderboard\" rel=\"\">Big Code Models Leaderboard<\/a>\u201c\u00a0on huggingface to regularly select the best performing model for your use case.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Setting-up-Text-Generation-Inference\"><\/span>Setting up Text Generation Inference<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Create a <strong>docker-compose.yml<\/strong> file with the following contents:<\/p>\n<pre class=\"lang:yaml decode:true \" title=\"docker-compose.yml for TGI\">version: '3.8'\r\n\r\nservices:\r\n  text-generation:\r\n    image: ghcr.io\/huggingface\/text-generation-inference:1.0.3\r\n    environment:\r\n      HUGGING_FACE_HUB_TOKEN: ${HUGGING_FACE_HUB_TOKEN}\r\n    ports:\r\n      - \"8080:80\"\r\n    volumes:\r\n      - .\/data:\/data\r\n    command:\r\n      - \"--model-id\"\r\n      - \"${MODEL_ID:-TheBloke\/WizardCoder-Python-13B-V1.0-GPTQ}\"\r\n      - \"--quantize\"\r\n      - \"${QUANTIZE:-gptq}\"\r\n      - \"--max-batch-prefill-tokens=${MAX_BATCH_PREFILL_TOKENS:-2048}\"\r\n    deploy:\r\n      resources:\r\n        reservations:\r\n          devices:\r\n          - driver: nvidia\r\n            count: all\r\n            capabilities: [gpu]\r\n    container_name: text-generation\r\n    restart: always # Ensuring service always restarts on failure<\/pre>\n<p>Optionally, create an <strong>.env <\/strong>file with:<\/p>\n<pre class=\"lang:default decode:true\" title=\".env for TGI\"># optional, only if you want to use a guarded model like StarCoder or Code Llama\r\nHUGGING_FACE_HUB_TOKEN=1234\r\n# the model we are going to use\r\nMODEL_ID=TheBloke\/WizardCoder-Python-13B-V1.0-GPTQ\r\n# how the model is quantized\r\nQUANTIZE=gptq\r\nMAX_BATCH_PREFILL_TOKENS=2048<\/pre>\n<p>Finally, use <strong>sudo docker compose up -d<\/strong> to run the text generation service. It will now be available at <strong>localhost:8080<\/strong>. <strong>sudo docker container ls<\/strong> gives you a list of all running container instances. Next, type <strong>sudo docker logs text-generation &#8211;follow<\/strong> to get live-output of the TGI container logs. This is particularly helpful for debugging. As you can see in the logs, TGI will download the model the first time that it is run and save it to the `data` folder that is mounted as a volume inside the container.<\/p>\n<p>To test if everything was setup correctly, try to send the following POST request to your API from a new terminal window\/tab:<\/p>\n<pre class=\"lang:zsh decode:true \">curl localhost:8080\/generate -X POST -d '{\"inputs\":\"write a python functions that gets me all folders in the working directory,\"parameters\":{\"max_new_tokens\":200}}' -H 'Content-Type: application\/json'\r\n<\/pre>\n<p>Now, you should get a response back from the API and also see the request in the container logs! Note that the quality of the response may very well be lacking, since we did not configure any parameters for our request, as this is just to test the basic functionality. You should now have Text Generation Inference up and running on your machine with WizardCoder as a model. Well done!<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Second-Component-The-VSCode-Extension\"><\/span>Second Component: The VSCode Extension<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Next, we will setup a plugin for Visual Studio Code that allows us to query TGI conveniently from our IDE! For this we will use <a href=\"https:\/\/github.com\/huggingface\/huggingface-vscode\">huggingface&#8217;s VSCode extension<\/a> available from the <a href=\"https:\/\/marketplace.visualstudio.com\/items?itemName=HuggingFace.huggingface-vscode\">marketplace<\/a>. The plugin is actively developed and thankfully a <a href=\"https:\/\/github.com\/huggingface\/huggingface-vscode\/pull\/59\">recent update<\/a> made it possible to configure the <strong>max_new_tokens<\/strong> parameter, which controls how long the model&#8217;s response can be. A larger number allows for longer code to be generated but also results in more load.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Setting-up-the-Extension\"><\/span>Setting up the Extension<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Once you have installed the plugin, head over to the extension settings. We will need to configure a few parameters:<\/p>\n<ol>\n<li>First, change the <strong>Hugging Face Code: Config Template<\/strong><br \/>\nto\u00a0<strong>WizardLM\/WizardCoder-Python-34B-V1.0<\/strong><\/li>\n<li>Next, configure the <strong>Hugging Face Code: Model ID Or Endpoint<\/strong>\u00a0setting and change it to <strong>http:\/\/YOUR-SERVER-ADDRESS-OR-IP:8080\/generate<\/strong> or localhost if TGI runs on the same machine.<\/li>\n<\/ol>\n<p>To test if everything works as intended, create a new <strong>.py<\/strong> file and copy over the following text. Since we are using an instruction model, the model will perform best when prompted properly:<\/p>\n<pre class=\"lang:python decode:true \"># write a function that lists all text files in a given directory. use type hints and python docstrings<\/pre>\n<p>Then move your cursor to the end of function definition&#8217;s line and hit enter. You should see a spinning circle in the bottom of the window and should be greeted with some (hopefully functional) code!<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-48398\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/VSCode-code-assistant.png\" alt=\"\" width=\"964\" height=\"370\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/VSCode-code-assistant.png 964w, https:\/\/www.inovex.de\/wp-content\/uploads\/VSCode-code-assistant-300x115.png 300w, https:\/\/www.inovex.de\/wp-content\/uploads\/VSCode-code-assistant-768x295.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/VSCode-code-assistant-400x154.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/VSCode-code-assistant-360x138.png 360w\" sizes=\"auto, (max-width: 964px) 100vw, 964px\" \/><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Third-Component-The-Chat-UI\"><\/span>Third Component: The Chat UI<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Would it not be convenient to also be able to access the Code Assistant from your web browser without needing to open an IDE? Certainly! And this is where another great open source software comes into play: huggingface&#8217;s <a href=\"https:\/\/github.com\/huggingface\/chat-ui\">Chat UI<\/a>. It is the very same code that drives the Assistant <a href=\"https:\/\/github.com\/huggingface\/chat-ui\">HuggingChat<\/a>, which is a very well put together variant of the familiar ChatGPT UI.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Setting-up-Chat-UI\"><\/span>Setting up Chat UI<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>First, clone the repository and create a file called <strong>.env.local<\/strong> in its root directory with the following contents:<\/p>\n<pre class=\"lang:yaml decode:true \" title=\".env.local file for Chat UI\"># url to our local mongodb\r\nMONGODB_URL=\"mongodb:\/\/mongo-chatui:27017\"\r\n# we don't need authorization for our purposes\r\nREJECT_UNAUTHORIZED=false\r\n# insert your favorite color here\r\nPUBLIC_APP_COLOR=blue\r\n\r\n# overwrite the standard model card with the model we serve via tgi\r\n# be sure the edit the 'endpoints' field!\r\n\r\nMODELS=`[{\"name\":\"TheBloke\/WizardCoder-Python-13B-V1.0-GPTQ\",\r\n          \"endpoints\":[{\"url\":\"http:\/\/text-generation:\/generate_stream\"}],\r\n          \"description\":\"Programming Assistant\",\r\n          \"userMessageToken\":\"\\n\\nHuman: \",\r\n          \"assistantMessageToken\":\"\\n\\nAssistant:\",\r\n          \"preprompt\": \"You are a helpful, respectful and honest assistant. Below is an instruction that describes a task. Write a response that appropriately completes the request.\",\r\n          \"chatPromptTemplate\": \"{{preprompt}}\\n\\n### Instruction:\\n{{#each messages}}\\n {{#ifUser}}{{@root.userMessageToken}}{{content}}{{@root.userMessageEndToken}}{{\/ifUser}}\\n {{#ifAssistant}}{{@root.assistantMessageToken}}{{content}}{{@root.assistantMessageEndToken}}{{\/ifAssistant}}\\n{{\/each}}\\n{{assistantMessageToken}}\\n\\n### Response:\",\r\n          \"promptExamples\":[{\"title\":\"Code a snake game\",\"prompt\":\"Code a basic snake game in python, give explanations for each step.\"}],\r\n          \"parameters\":{\"temperature\":0.1,\"top_p\":0.9,\"repetition_penalty\":1.2,\"top_k\":50,\"truncate\":1000,\"max_new_tokens\":1024}}]`<\/pre>\n<p>There is still a lot of room for improvement especially in the <strong>chatPromptTemplate<\/strong> section. See <a href=\"https:\/\/github.com\/huggingface\/chat-ui#custom-prompt-templates\">here<\/a> for further information.<\/p>\n<p>Unfortunately, no prebuilt Docker image exists for Chat UI. Thus, we have to build the image ourselves. The <strong>.env<\/strong> and <strong>.env.local<\/strong> files are needed at build-time, so be sure to have them ready. Run the following command in the root directory of the Chat UI repository:<\/p>\n<pre class=\"lang:zsh decode:true \">sudo docker build . -t chat-ui:latest<\/pre>\n<p>Next, create a new folder and create a new <strong>docker-compose.yml<\/strong> file with the following contents. It is important that the <strong>.env<\/strong> file from Chat UI is not in the same folder hierarchy as the <strong>docker-compose.yml<\/strong> (hence the new folder), since Docker compose will try to parse and use the <strong>.env<\/strong> file in this case case, which will lead to parsing errors due to the JSON string formatting. And we do not need the <strong>.env<\/strong> file and its contents at runtime, anyway.<\/p>\n<pre class=\"lang:yaml decode:true \" title=\"docker-compose.yml for Chat UI\">version: '3.8'\r\n\r\nservices:\r\n  # The frontend\r\n  chat-ui:\r\n    image: chat-ui\r\n    ports:\r\n      - \"3000:3000\"\r\n    environment:\r\n       - MONGODB_URL=mongodb:\/\/mongo-chatui:27017\r\n    container_name: chatui\r\n    restart: always # Ensuring service always restarts on failure\r\n  # The database where the history and context are going to be stored\r\n  mongo-chatui:\r\n    image: mongo:latest\r\n    ports:\r\n      - \"27017:27017\"\r\n    container_name: mongo-chatui\r\n    restart: always # Ensuring service always restarts on failure<\/pre>\n<p>Now, we can test-drive Chat UI. To do so, type in <strong>sudo docker compose up -d<\/strong> in the directory of the <strong>docker-compose.yml<\/strong> (as before with TGI) and be sure to also keep an eye on the logs via <strong>sudo docker container logs chat-ui &#8211;follow<\/strong>. If all works as expected, you should be able to access the UI on port 3000!<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-48641\" src=\"https:\/\/www.inovex.de\/wp-content\/uploads\/code_assistant_example-971x1024.png\" alt=\"Code Assistant example using the UI\" width=\"900\" height=\"949\" srcset=\"https:\/\/www.inovex.de\/wp-content\/uploads\/code_assistant_example-971x1024.png 971w, https:\/\/www.inovex.de\/wp-content\/uploads\/code_assistant_example-284x300.png 284w, https:\/\/www.inovex.de\/wp-content\/uploads\/code_assistant_example-768x810.png 768w, https:\/\/www.inovex.de\/wp-content\/uploads\/code_assistant_example-400x422.png 400w, https:\/\/www.inovex.de\/wp-content\/uploads\/code_assistant_example-360x380.png 360w, https:\/\/www.inovex.de\/wp-content\/uploads\/code_assistant_example.png 1242w\" sizes=\"auto, (max-width: 900px) 100vw, 900px\" \/><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Putting-Everything-Together\"><\/span>Putting Everything Together<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>Besides, it is also possible, of course, to use one combined docker-compose file if you are willing to host the backend, frontend and database on the same machine. Copy the data folder from earlier so the models do not need to be re-downloaded. You might also have to remove the old Chat UI and database containers using <strong>sudo docker container remove chat-ui mongo-chatui<\/strong>.<\/p>\n<pre class=\"lang:yaml decode:true \" title=\"docker-compose for TGI and Chat UI\">version: '3.8'\r\n\r\nservices:\r\n  # Text Generation Inference backend\r\n  text-generation:\r\n    image: ghcr.io\/huggingface\/text-generation-inference:1.0.3\r\n    environment:\r\n      HUGGING_FACE_HUB_TOKEN: ${HUGGING_FACE_HUB_TOKEN}\r\n    ports:\r\n      - \"8080:80\"\r\n      - .\/data:\/data\r\n    command:\r\n      - \"--model-id\"\r\n      - \"${MODEL_ID:-TheBloke\/WizardCoder-Python-13B-V1.0-GPTQ}\"\r\n      - \"--quantize\"\r\n      - \"${QUANTIZE:-gptq}\"\r\n      - \"--max-batch-prefill-tokens=${MAX_BATCH_PREFILL_TOKENS:-2048}\"\r\n    deploy:\r\n      resources:\r\n        reservations:\r\n          devices:\r\n          - driver: nvidia\r\n            count: all\r\n            capabilities: [gpu]\r\n    container_name: text-generation\r\n    restart: always # Ensuring service always restarts on failure\r\n  # The frontend\r\n  chat-ui:\r\n    image: chat-ui\r\n    ports:\r\n      - \"3000:3000\"\r\n    environment:\r\n       - MONGODB_URL=mongodb:\/\/mongo-chatui:27017\r\n    container_name: chatui\r\n    restart: always # Ensuring service always restarts on failure\r\n  # The database where the history and context are going to be stored\r\n  mongo-chatui:\r\n    image: mongo:latest\r\n    ports:\r\n      - \"27017:27017\"\r\n    container_name: mongo-chatui\r\n    restart: always # Ensuring service always restarts on failure<\/pre>\n<p>Do not forget to change the <strong>endpoints parameter<\/strong> in the MODELS variable of Chat UI&#8217;s <strong>.env.local<\/strong> to <strong>\u201cendpoints\u201c:[{&#8222;url&#8220;:&#8220;http:\/\/text-generation:\/generate_stream&#8220;}]<\/strong>, since we now can conveniently use the container address of the shared Docker network. Remember, you have to re-build the image after adapting the <strong>.env.local<\/strong> file.<\/p>\n<p>Great! Now you can start the backend, the frontend and the database with one single <strong>sudo docker compose -up -d<\/strong>.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Bonus-Adding-HTTPS\"><\/span>Bonus: Adding HTTPS<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Up to this point, the API and UI are all served only via HTTP. It is therefore advisable to better secure our traffic with HTTPS and the help of a reverse proxy like <a href=\"https:\/\/nginx.org\/en\/\">nginx<\/a>. Without HTTPS, you will not be able to access the UI from other destinations as localhost.<\/p>\n<p>Create a new directory called <strong>nginx<\/strong> and inside of it a new file <strong>nginx.conf<\/strong>. The specific settings depend on what local registrar you are using \u2013 in case you only want to make the service available to your local network.<\/p>\n<p>This <strong>nginx.conf<\/strong> template can serve as a starting point:<\/p>\n<pre class=\"lang:js decode:true\" title=\"nginx.conf\">events {\r\n    worker_connections  1024;\r\n}\r\n\r\nhttp {\r\n    server_tokens off;\r\n    charset utf-8;\r\n\r\n    server {\r\n        listen 80 default_server;\r\n        listen [::]:80 default_server;\r\n\r\n        location \/nginx_status {\r\n            stub_status on;\r\n        }\r\n\r\n    }\r\n\t\r\n    # Frontend\r\n    server {\r\n        listen              443 ssl http2;\r\n        listen              [::]:443 ssl http2;\r\n        server_name         your.local.address.io;\r\n        client_max_body_size 15G;\r\n        \r\n        ...\r\n    \r\n        # reverse proxy\r\n        location \/ {\r\n            proxy_pass            http:\/\/chat-ui:3000;\r\n            \r\n\t\t        ...\r\n        }\r\n    }\r\n\r\n    # Serving backend\r\n    server {\r\n        listen              443 ssl http2;\r\n        listen              [::]:443 ssl http2;\r\n        server_name         api.your.local.address.io;\r\n        client_max_body_size 15G;\r\n\r\n        ...\r\n        # reverse proxy\r\n        location \/ {\r\n            proxy_pass            http:\/\/text-generation:80;\r\n            \r\n            ...\r\n        }\r\n\r\n    }\r\n\r\n    # HTTP redirect\r\n    server {\r\n        listen      80;\r\n        listen      [::]:80;\r\n        server_name .your.local.address.io;\r\n        return      301 https:\/\/your.local.address.io$request_uri;\r\n    }\r\n}<\/pre>\n<p>You also need to add the nginx service to your existing <strong>docker-compose.yml<\/strong>.<\/p>\n<pre class=\"lang:yaml decode:true\" title=\"... adding nginx\">version: '3.8'\r\n\r\nservices:\r\n\t...\r\n  # The reverse proxy\r\n  nginx:\r\n    container_name: nginx\r\n    restart: unless-stopped\r\n    image: nginx\r\n    ports:\r\n      - 80:80\r\n      - 443:443\r\n    volumes:\r\n      - .\/nginx\/nginx.conf:\/etc\/nginx\/nginx.conf\r\n      - .\/certificates:\/certificates<\/pre>\n<p>Now you only need to generate the certificates, save them in the certificates folder and restart everything.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"This-is-it\"><\/span>This is it!<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Good job. You now have all the components needed to self-host our very own Code Assistant. Thanks to the awesome people at huggingface, it is easier than ever.\u00a0 And maybe you even learned a thing or two along the way. Before you put it in production though, you may want to do a final load test, e.g. via <a href=\"https:\/\/locust.io\/\">locust<\/a>. Doing so, you get an understanding of how many users are able to use the service at the same time. For this you will need to write a small <strong>locust-file.py\u00a0<\/strong>\u2013 and for that you could kindly ask WizardCoder to help you out \ud83e\uddd9\u200d\u2640\ufe0f.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The release of the Code Assistant GitHub Copilot to the public in June 2021 marked the beginning of a new kind of helper in the tool belt of developers \u2013 alongside existing ones such as for example linters and formatters. While basic code completion has been on the market for years with varying degree of [&hellip;]<\/p>\n","protected":false},"author":337,"featured_media":48697,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"ep_exclude_from_search":false,"footnotes":""},"tags":[511,225,140],"service":[75],"coauthors":[{"id":337,"display_name":"Malte B\u00fcttner","user_nicename":"mbuettner"}],"class_list":["post-48374","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","tag-artificial-intelligence-2","tag-data-science-in-production","tag-machine-learning","service-nlp"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Code Assistant: How to Self-Host Your Own - inovex GmbH<\/title>\n<meta name=\"description\" content=\"This post gives you all the steps needed to self-host a state-of-the-art Code Assistant model with huggingface&#039;s TGI and a ChatGPT-like UI.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Code Assistant: How to Self-Host Your Own - inovex GmbH\" \/>\n<meta property=\"og:description\" content=\"This post gives you all the steps needed to self-host a state-of-the-art Code Assistant model with huggingface&#039;s TGI and a ChatGPT-like UI.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/\" \/>\n<meta property=\"og:site_name\" content=\"inovex GmbH\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/inovexde\" \/>\n<meta property=\"article:published_time\" content=\"2023-09-21T07:48:42+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-05-07T12:40:53+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/Code-Assistant.png\" \/>\n\t<meta property=\"og:image:width\" content=\"6250\" \/>\n\t<meta property=\"og:image:height\" content=\"3667\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Malte B\u00fcttner\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/www.inovex.de\/wp-content\/uploads\/Code-Assistant-1024x601.png\" \/>\n<meta name=\"twitter:creator\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:site\" content=\"@inovexgmbh\" \/>\n<meta name=\"twitter:label1\" content=\"Verfasst von\" \/>\n\t<meta name=\"twitter:data1\" content=\"Malte B\u00fcttner\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"14\u00a0Minuten\" \/>\n\t<meta name=\"twitter:label3\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data3\" content=\"Malte B\u00fcttner\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/\"},\"author\":{\"name\":\"Malte B\u00fcttner\",\"@id\":\"https:\/\/www.inovex.de\/de\/#\/schema\/person\/1327bd076626e70f17bb045a68002602\"},\"headline\":\"Code Assistant: How to Self-Host Your Own\",\"datePublished\":\"2023-09-21T07:48:42+00:00\",\"dateModified\":\"2024-05-07T12:40:53+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/\"},\"wordCount\":2090,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.inovex.de\/de\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/Code-Assistant.png\",\"keywords\":[\"Artificial Intelligence\",\"Data Science in Production\",\"Machine Learning\"],\"articleSection\":[\"English Content\",\"General\"],\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/\",\"url\":\"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/\",\"name\":\"Code Assistant: How to Self-Host Your Own - inovex GmbH\",\"isPartOf\":{\"@id\":\"https:\/\/www.inovex.de\/de\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/Code-Assistant.png\",\"datePublished\":\"2023-09-21T07:48:42+00:00\",\"dateModified\":\"2024-05-07T12:40:53+00:00\",\"description\":\"This post gives you all the steps needed to self-host a state-of-the-art Code Assistant model with huggingface's TGI and a ChatGPT-like UI.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/#primaryimage\",\"url\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/Code-Assistant.png\",\"contentUrl\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/Code-Assistant.png\",\"width\":6250,\"height\":3667,\"caption\":\"Grafik zum Blogartikel \u201eCode Assistant: How to Self-Host Your Own\u201c von Malte B\u00fcttner\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.inovex.de\/de\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Code Assistant: How to Self-Host Your Own\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.inovex.de\/de\/#website\",\"url\":\"https:\/\/www.inovex.de\/de\/\",\"name\":\"inovex GmbH\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/www.inovex.de\/de\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.inovex.de\/de\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.inovex.de\/de\/#organization\",\"name\":\"inovex GmbH\",\"url\":\"https:\/\/www.inovex.de\/de\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png\",\"contentUrl\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png\",\"width\":1921,\"height\":1081,\"caption\":\"inovex GmbH\"},\"image\":{\"@id\":\"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/inovexde\",\"https:\/\/x.com\/inovexgmbh\",\"https:\/\/www.instagram.com\/inovexlife\/\",\"https:\/\/www.linkedin.com\/company\/inovex\",\"https:\/\/www.youtube.com\/channel\/UC7r66GT14hROB_RQsQBAQUQ\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.inovex.de\/de\/#\/schema\/person\/1327bd076626e70f17bb045a68002602\",\"name\":\"Malte B\u00fcttner\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"de\",\"@id\":\"https:\/\/www.inovex.de\/de\/#\/schema\/person\/image\/2923d91c8cf793a60efbdeed40dd2728\",\"url\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/profile1-96x96.jpg\",\"contentUrl\":\"https:\/\/www.inovex.de\/wp-content\/uploads\/profile1-96x96.jpg\",\"caption\":\"Malte B\u00fcttner\"},\"sameAs\":[\"https:\/\/www.linkedin.com\/in\/maltepaulb\/\"],\"url\":\"https:\/\/www.inovex.de\/de\/blog\/author\/mbuettner\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Code Assistant: How to Self-Host Your Own - inovex GmbH","description":"This post gives you all the steps needed to self-host a state-of-the-art Code Assistant model with huggingface's TGI and a ChatGPT-like UI.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/","og_locale":"de_DE","og_type":"article","og_title":"Code Assistant: How to Self-Host Your Own - inovex GmbH","og_description":"This post gives you all the steps needed to self-host a state-of-the-art Code Assistant model with huggingface's TGI and a ChatGPT-like UI.","og_url":"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/","og_site_name":"inovex GmbH","article_publisher":"https:\/\/www.facebook.com\/inovexde","article_published_time":"2023-09-21T07:48:42+00:00","article_modified_time":"2024-05-07T12:40:53+00:00","og_image":[{"width":6250,"height":3667,"url":"https:\/\/www.inovex.de\/wp-content\/uploads\/Code-Assistant.png","type":"image\/png"}],"author":"Malte B\u00fcttner","twitter_card":"summary_large_image","twitter_image":"https:\/\/www.inovex.de\/wp-content\/uploads\/Code-Assistant-1024x601.png","twitter_creator":"@inovexgmbh","twitter_site":"@inovexgmbh","twitter_misc":{"Verfasst von":"Malte B\u00fcttner","Gesch\u00e4tzte Lesezeit":"14\u00a0Minuten","Written by":"Malte B\u00fcttner"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/#article","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/"},"author":{"name":"Malte B\u00fcttner","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/1327bd076626e70f17bb045a68002602"},"headline":"Code Assistant: How to Self-Host Your Own","datePublished":"2023-09-21T07:48:42+00:00","dateModified":"2024-05-07T12:40:53+00:00","mainEntityOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/"},"wordCount":2090,"commentCount":0,"publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/Code-Assistant.png","keywords":["Artificial Intelligence","Data Science in Production","Machine Learning"],"articleSection":["English Content","General"],"inLanguage":"de","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/","url":"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/","name":"Code Assistant: How to Self-Host Your Own - inovex GmbH","isPartOf":{"@id":"https:\/\/www.inovex.de\/de\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/#primaryimage"},"image":{"@id":"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/#primaryimage"},"thumbnailUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/Code-Assistant.png","datePublished":"2023-09-21T07:48:42+00:00","dateModified":"2024-05-07T12:40:53+00:00","description":"This post gives you all the steps needed to self-host a state-of-the-art Code Assistant model with huggingface's TGI and a ChatGPT-like UI.","breadcrumb":{"@id":"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/"]}]},{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/#primaryimage","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/Code-Assistant.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/Code-Assistant.png","width":6250,"height":3667,"caption":"Grafik zum Blogartikel \u201eCode Assistant: How to Self-Host Your Own\u201c von Malte B\u00fcttner"},{"@type":"BreadcrumbList","@id":"https:\/\/www.inovex.de\/de\/blog\/code-assistant-how-to-self-host-your-own\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.inovex.de\/de\/"},{"@type":"ListItem","position":2,"name":"Code Assistant: How to Self-Host Your Own"}]},{"@type":"WebSite","@id":"https:\/\/www.inovex.de\/de\/#website","url":"https:\/\/www.inovex.de\/de\/","name":"inovex GmbH","description":"","publisher":{"@id":"https:\/\/www.inovex.de\/de\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.inovex.de\/de\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Organization","@id":"https:\/\/www.inovex.de\/de\/#organization","name":"inovex GmbH","url":"https:\/\/www.inovex.de\/de\/","logo":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/2021\/03\/inovex-logo-16-9-1.png","width":1921,"height":1081,"caption":"inovex GmbH"},"image":{"@id":"https:\/\/www.inovex.de\/de\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/inovexde","https:\/\/x.com\/inovexgmbh","https:\/\/www.instagram.com\/inovexlife\/","https:\/\/www.linkedin.com\/company\/inovex","https:\/\/www.youtube.com\/channel\/UC7r66GT14hROB_RQsQBAQUQ"]},{"@type":"Person","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/1327bd076626e70f17bb045a68002602","name":"Malte B\u00fcttner","image":{"@type":"ImageObject","inLanguage":"de","@id":"https:\/\/www.inovex.de\/de\/#\/schema\/person\/image\/2923d91c8cf793a60efbdeed40dd2728","url":"https:\/\/www.inovex.de\/wp-content\/uploads\/profile1-96x96.jpg","contentUrl":"https:\/\/www.inovex.de\/wp-content\/uploads\/profile1-96x96.jpg","caption":"Malte B\u00fcttner"},"sameAs":["https:\/\/www.linkedin.com\/in\/maltepaulb\/"],"url":"https:\/\/www.inovex.de\/de\/blog\/author\/mbuettner\/"}]}},"_links":{"self":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/48374","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/users\/337"}],"replies":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/comments?post=48374"}],"version-history":[{"count":6,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/48374\/revisions"}],"predecessor-version":[{"id":53498,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/posts\/48374\/revisions\/53498"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media\/48697"}],"wp:attachment":[{"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/media?parent=48374"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/tags?post=48374"},{"taxonomy":"service","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/service?post=48374"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.inovex.de\/de\/wp-json\/wp\/v2\/coauthors?post=48374"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}