Zwei Männer arbeiten am Schreibtisch

Development of Big Data and Data Science Solutions

What measurable influence does TV advertising have on online behaviour? inovex and ProSiebenSat.1 used big data technology to answer this question objectively. They presented their findings and provided exciting insights into the project at the BITKOM Big Data Summit 2015.

Das Logo der Pro7Sat1 Media SE

ProSiebenSat.1 Media AG markets traditional TV advertising slots, and is also involved with numerous e-commerce companies. As part of this involvement, ProSiebenSat.1 is making available advertising slots for e-commerce offerings. The company is, therefore, greatly interested in the specific value added by TV ads to the e-commerce companies advertised. Exactly how many visitors visit an e-commerce website because they saw its TV ad? How much revenue is generated by those visitors proven to have come to the website from the TV ad?

The Challenge

Standard solutions for analysing web traffic cannot explain the correlation between TV ads (events outside the online world) and the online behaviours they trigger. For this reason, ProSiebenSat.1 decided to develop a customised big data solution to answer these questions.

The Solution

Thanks to a previous project implemented by inovex GmbH for ProSiebenSat.1 Media AG, the Hadoop cluster and its associated traffic tracking was available at the beginning of the solution development process. The big data approach enabled the development of a process to measure TV’s influence on website traffic.The solution was made live during an initial configuration stage. Additional configuration stages are currently being developed as part of a proof-of-concept phase.

Big Data

This is an implementation scenario for a data science project based on a Hadoop cluster which inovex GmbH implemented for ProSiebenSat.1 AG. At its core, the project seeks to determine the objectively measureable effects of TV ads for ecommerce offerings on web traffic to these e-commerce sites – and, ultimately, to determine the amount of revenue generated by these “ad-induced” visitors.
To achieve this, the project draws on traffic data which may extend over an observation period of several years, and correlates it with information on ads broadcast during the same period. Until now, this correlation has not been supported by any traditional web traffic analysis system because the log data and the TV ad data are in different formats (variety).

With its customised data science solution, which analyses around 60 million visits per day (volume/velocity), ProSiebenSat1 can detect and calculate precisely the economic benefits of TV spots advertising digital business models. In addition, ProSiebenSat.1 can also make accurate statements about the efficiency of TV ad types (broadcasting organisation, spot length, time of broadcast, etc.) and predict the effectiveness of ads for future broadcasts. The development phase included the use of data science algorithms from multivariate processes. These are the same as those used at CERN during the search for the Higgs boson particle.

Method Details

Depiction of the ad induction
Graphic: Ad Induction

Ad induction uses a quantity measurement observation to determine whether or not a visit can be attributed to a TV ad. To do this, the synchronicity of both results is examined according to various categories (visit entry, end device, etc.). Incremental visits above the baseline traffic (measured in a signal window starting eight minutes before the ad begins and ending eight minutes after it finishes) are attributed to the ad. The relative weighting of TV ads is used to determine overlapping signal regions. The algorithm is implemented as a Hive UDTF.

The short-ranging results of the ad induction define the basis for the MVA approach to measure the long-range TV effects (see the “Revenue Analysis” graphic).

Depiction of the revenue analysis
Graphic: Revenue Analysis


According to current information, this project makes ProSiebenSat1 a pioneer of big-data supported advertising effects research. This is the first time that online and TV effects analyses have been successfully correlated. No comparable projects within the media industry have yet been published.


The benefits of the big data solution lie in the following components:

  • Concrete, measurable proof of the economic benefits of broadcast TV advertising for e-commerce offerings
  • Derivation of predictions for the efficacy of planned TV ads
  • Optimisation of media planning for TV ads
  • Maximisation of marketing budget allocation

Future Plans

The following next steps are planned for the big data usage:

  • Further refinement of the data science methods, for example for calculating profitability
  • Linking of additional data sources to better model user behaviour
  • Linking of additional websites
Technology Stack
  • Cloudera CDH
  • Hive
  • Hive-BO-Interface