Florian Wilhelm

Über Florian Wilhelm

My name is Florian Wilhelm and I am a Data Scientist living in Cologne, Germany. Right now I enjoy working on innovative Data Science projects with experts every day at inovex. With more than five years of project experience in the field of Predictive & Prescriptive Analytics and Big Data, I have acquired profound knowledge in the domains of mathematical modelling, statistics, machine learning, high-performance computing and data mining. For the last years I programmed mostly with the Python Data Science stack (NumPy, SciPy, Scikit-Learn, Pandas, Matplotlib, Jupyter, etc.) to which I also contributed several extensions. Due to my participation in many industry projects, I have also gained experience in the Hadoop stack including Hive and Spark as well as R.

Data-driven Services: Ein Workshop speziell für digitale Mehrwertdienste


Um den Wandel hin zu digitalen Service-Ökonomie optimal zu unterstützen haben wir ein neues Workshop-Format entwickelt, das den spezifischen Besonderheiten von Data Driven Services Rechnung trägt. Den Ablauf dieses Data-Driven Services Workshops beschreibe ich in diesem Artikel.

Schon seit mehreren Jahrzehnten wandelt sich die Wirtschaft in Deutschland von der reinen Produktherstellung hin zu mehr Dienstleistungen. Die zunehmende Wichtigkeit

Data-driven Services: Ein Workshop speziell für digitale Mehrwertdienste2020-09-03T15:46:38+00:00

Working efficiently with Jupyter Notebooks


Being in the data science domain for quite some years, I have seen good Jupyter notebooks but also a lot of ugly ones. Follow these best practices to to work more efficiently with your notebooks and strike the perfect balance between text, code and visualisations.

If you have ever done something analytical or anything closely related to data science in Python, there is just no way you have not heard of or IPython or Jupyter not

Working efficiently with Jupyter Notebooks2018-11-20T11:31:51+00:00

Multiplicative LSTM for sequence-based Recommenders


Traditional user-item recommenders often neglect the dimension of time, finding for each user a latent representation based on the user’s historical item interactions without any notion of recency and sequence of interactions. Sequence-based recommenders such as Multiplicative LSTMs tackle this issue.

Recommender Systems support the decision making processes of customers with personalized suggestions. They are widely used and influence the daily life of almost ever

Multiplicative LSTM for sequence-based Recommenders2019-04-02T18:00:51+00:00

Managing isolated Environments with PySpark


In this article we present a simple solution for managing Isolated Environments with PySpark that we have been using in production for more than a year.

With the sustained success of the Spark data processing platform even data scientists with a strong focus on the Python ecosystem can no longer ignore it. Fortunately

Managing isolated Environments with PySpark2018-04-10T13:30:43+00:00

Data Science in Production: Packaging, Versioning and Continuous Integration


Here's what changes when your data science project grows from a proof of concept. How do you deploy your model, how can updates be rolled out, ...?

A common pattern in most data science projects I participated in is that it’s all fun and games until someone wants to put it into production. From that point in time

Data Science in Production: Packaging, Versioning and Continuous Integration2020-09-07T14:05:40+00:00

Efficient UD(A)Fs with PySpark


Nowadays, Spark surely is one of the most prevalent technologies in the fields of data science and big data. Luckily, even though it is developed in Scala and runs in

Efficient UD(A)Fs with PySpark2017-11-27T15:30:11+00:00

Causal Inference and Propensity Score Methods


In supervised learning, correlation is crucial to predict the target variable with the help of the feature variables. But what good is causation?

In the field of machine learning and particularly in supervised learning, correlation is crucial to predict the target variable with the help of the feature variables

Causal Inference and Propensity Score Methods2017-11-27T15:30:21+00:00

Hive UDFs and UDAFs with Python


In this post we focus on how to write sophisticated User Defined (Aggregated) Functions (UD(A)Fs) for Apache Hive in Python.

Sometimes the analytical power of built-in Hive functions is just not enough. In this case it is possible to write hand-tailored User-Defined Functions (UDFs) for tra

Hive UDFs and UDAFs with Python2019-01-23T11:31:13+00:00