At inovex, we are delving into the opportunities that generative AI offers companies. We‘ll work closely together with you to develop new use cases, as well as to validate possible business models and to implement them in projects.
We support you by providing:
If you already have a specific idea, we’ll provide an initial feasibility analysis.
We’ll help you develop new product ideas in the new field of generative AI.
Proof of concept
As soon as the first concepts are available, we’ll create proofs of concept as a first step towards developing your new products.
We research and work on generative AI and are happy to share our knowledge with you.
What is generative AI?
GPT-3, DALL-E, and Stable Diffusion enable users to iterate their ideas very quickly and to nail down their designs. Computers use prompts – short text instructions – to generate suitable texts, images, and even videos.
These applications are based on generative models. These have been around for a long time, but only recently has the quality of the data generated taken a major leap forwards. The breakthrough was made using textual data with large, transformer-based language models such as BERT and GPT-3. Models for visual data followed somewhat later in the shape of diffusion models such as DALL-E, Imagen, and Stable Diffusion.
Use of generative AI for natural language
In prompting, you use text instructions to control generative speech models. For example, a short instruction is enough to translate a text from German to French or to evaluate the tone of a film review. In contrast to the classic approach using fine-tuning, no specific training on the downstream task concerned is required. This means that generative models can be used to solve problems for which little or no annotated text data is available.
Suteera Seeha describes how this works in her blog post on prompt engineering, as does Pascal Fecht in his lectures at data2day (download PDF) and the AIxIA (download PDF).
More creativity with generative AI
Generative language models have also attracted considerable attention as creativity-enhancing tools. One of the most popular examples is the Github Copilot. Generative models such as Codex (OpenAI) or AlphaCode (DeepMind) are intended to make the work of programmers easier by generating functional code fragments from user descriptions.
On Write With Transformers, users can playfully experience what it feels like when language models [end] their own sentences. Here, too, the versatility of generative models is evident. Selecting a pre-trained language model and the underlying training domain enables text generation to be controlled in any direction. In our experience, this contextual focus is a key success factor in the use of language models. It means, for example, that publicly available language models can be used to generate texts based on scientific works from the field of medicine or short, concise news items from the financial sector. Domain adaptation can be used to adapt these models to the vocabulary and style of your own text documents. More information about domain adaptation on reviews, news, social media, and scientific papers can be found in this blog post.
inovex research into generative AI
In order to keep pace with the rapid development of language models, we are focusing on the topic in an internal research project. One aspect involves understanding the technology behind relevant topics such as prompting or domain adaptation. We also explore issues pertaining to the effective use of large language models in conjunction with available resources and MLOps. In one of our research projects, we used “Parrot”, a demonstrator for Transformer language models. It uses the same technology used by ChatGPT, just a generation older and orders of magnitude smaller – and of course not with the properties of trillion parameter models.
DALL-E, Stable Diffusion, et al: generative AI for visual data
Current models such as DALL-E (OpenAI), Imagen (Google), Stable Diffusion (Stability.Ai) and Midjourney are all based on the same technology: diffusion models. This method involves destroying data in many small steps, or more precisely, converting it into white noise. Diffusion models learn to reverse these steps to iteratively “reconstruct” data from noise. In order to create an image with this method, they proceed like a sculptor: they start with Gaussian noise and gradually remove everything superfluous until an image appears. The mathematical foundation of this process is explained in this presentation (download as PDF).
This simple technology is already being used in a wide variety of scenarios, including to generate new data in the style of the training data; in inpainting and outpainting, i.e. painting over unwanted sections of images and expanding existing images beyond their edges; in colourisation and in interpolation between two images. Guidance is used to further influence the result. The best-known examples of this are text prompts, which can, for example, be used to create images of cats playing the piano.
Machine Learning on small data basis
In his master’s thesis, our thesis candidate Anton Wiehe studied how the same technique can be used to generate training data for machine learning if only a small quantity of real data is available. Domain adaptation also enables pre-trained models to be used if the target domain differs greatly from the source domain. The results are summarized in this paper.
In addition to text, images can also be used as guidance. It is, for example, possible to scale low-resolution images to higher resolutions (Super Resolution), to create variations of images, or to flesh out sketches – for example with Ando AI and ai-render.
The guidance process is, however, so general that any other information could be used to influence image generation, including, for example, user profiles or weather data.
Use of generative AI in videos
And while most applications “only” currently generate images, some programs have already started creating short videos (Imagen Video, Phenaki, Make-A-Video) ans 3D models (DreamFusion).
Diffusion models are, however, not limited to visual data. With small adjustments, diffusion models can be used to generate audio samples for musicians (Audio Diffusion, Harmonai), spoken language (WaveGrad, DiffWave) or target information for robotic arms.