Recommender Systems support the decision making processes of customers with personalized suggestions. They are widely used and influence the daily life of almost ever
Über Florian WilhelmMy name is Florian Wilhelm and I am a Data Scientist living in Cologne, Germany. Right now I enjoy working on innovative Data Science projects with experts every day at inovex. With more than five years of project experience in the field of Predictive & Prescriptive Analytics and Big Data, I have acquired profound knowledge in the domains of mathematical modelling, statistics, machine learning, high-performance computing and data mining. For the last years I programmed mostly with the Python Data Science stack (NumPy, SciPy, Scikit-Learn, Pandas, Matplotlib, Jupyter, etc.) to which I also contributed several extensions. Due to my participation in many industry projects, I have also gained experience in the Hadoop stack including Hive and Spark as well as R.
In this article we present a simple solution for managing Isolated Environments with PySpark that we have been using in production for more than a year.
With the sustained success of the Spark data processing platform even data scientists with a strong focus on the Python ecosystem can no longer ignore it. Fortunately
Here's what changes when your data science project grows from a proof of concept. How do you deploy your model, how can updates be rolled out, ...?
A common pattern in most data science projects I participated in is that it’s all fun and games until someone wants to put it into production. From that point in time
Nowadays, Spark surely is one of the most prevalent technologies in the fields of data science and big data. Luckily, even though it is developed in Scala and runs in
Before we actually dive into this topic, imagine the following: You just moved to a new place and the time is ripe for a little house-warming dinner with your best fr
In supervised learning, correlation is crucial to predict the target variable with the help of the feature variables. But what good is causation?
In the field of machine learning and particularly in supervised learning, correlation is crucial to predict the target variable with the help of the feature variables
In this post we focus on how to write sophisticated User Defined (Aggregated) Functions (UD(A)Fs) for Apache Hive in Python.
Sometimes the analytical power of built-in Hive functions is just not enough. In this case it is possible to write hand-tailored User-Defined Functions (UDFs) for tra