Schon seit mehreren Jahrzehnten wandelt sich die Wirtschaft in Deutschland von der reinen Produktherstellung hin zu mehr Dienstleistungen. Die zunehmende Wichtigkeit
Über Florian WilhelmMy name is Florian Wilhelm and I am a Data Scientist living in Cologne, Germany. Right now I enjoy working on innovative Data Science projects with experts every day at inovex. With more than five years of project experience in the field of Predictive & Prescriptive Analytics and Big Data, I have acquired profound knowledge in the domains of mathematical modelling, statistics, machine learning, high-performance computing and data mining. For the last years I programmed mostly with the Python Data Science stack (NumPy, SciPy, Scikit-Learn, Pandas, Matplotlib, Jupyter, etc.) to which I also contributed several extensions. Due to my participation in many industry projects, I have also gained experience in the Hadoop stack including Hive and Spark as well as R.
Being in the data science domain for quite some years, I have seen good Jupyter notebooks but also a lot of ugly ones. Follow these best practices to to work more efficiently with your notebooks and strike the perfect balance between text, code and visualisations.
If you have ever done something analytical or anything closely related to data science in Python, there is just no way you have not heard of or IPython or Jupyter not
Traditional user-item recommenders often neglect the dimension of time, finding for each user a latent representation based on the user’s historical item interactions without any notion of recency and sequence of interactions. Sequence-based recommenders such as Multiplicative LSTMs tackle this issue.
Recommender Systems support the decision making processes of customers with personalized suggestions. They are widely used and influence the daily life of almost ever
In this article we present a simple solution for managing Isolated Environments with PySpark that we have been using in production for more than a year.
With the sustained success of the Spark data processing platform even data scientists with a strong focus on the Python ecosystem can no longer ignore it. Fortunately
Here's what changes when your data science project grows from a proof of concept. How do you deploy your model, how can updates be rolled out, ...?
A common pattern in most data science projects I participated in is that it’s all fun and games until someone wants to put it into production. From that point in time
Nowadays, Spark surely is one of the most prevalent technologies in the fields of data science and big data. Luckily, even though it is developed in Scala and runs in
Before we actually dive into this topic, imagine the following: You just moved to a new place and the time is ripe for a little house-warming dinner with your best fr
In supervised learning, correlation is crucial to predict the target variable with the help of the feature variables. But what good is causation?
In the field of machine learning and particularly in supervised learning, correlation is crucial to predict the target variable with the help of the feature variables
In this post we focus on how to write sophisticated User Defined (Aggregated) Functions (UD(A)Fs) for Apache Hive in Python.
Sometimes the analytical power of built-in Hive functions is just not enough. In this case it is possible to write hand-tailored User-Defined Functions (UDFs) for tra