As a certified Cloudera and Hortonworks partner, we support (after the merger of the companies) our customers with the Cloudera Data Platform – a solution capable of acquiring, storing, processing and analysing very large volumes of data. Cloudera is a state-of-the-art platform for Data Management and Analysis, Machine Learning and Artificial Intelligence.
Increasingly powerful and more networked devices such as smartphones, sensors, cameras, machines and servers produce measurement data, log data and process data, while commerce and social media platforms generate records of social interactions, as well as transactions involving goods and finances.
The collection, analysis and evaluation of that data aids better and more detailed understanding of existing business models and the establishment of new, ‘digital’ business models and products. Highly scalable data management forms the indispensable basis for many processes in data science, machine learning and artificial intelligence.
Big Data overview
‘Big Data’ is the collective term for the technologies, frameworks and tools that have arisen for this purpose in recent years. What they have in common is that they are scaled horizontally as distributed systems and their runtime properties can thus be comparatively easily adjusted to increasing quantities of data through the addition of further resources. Big Data Systems can process a broad range of data types from a wide variety of sources, both in large batches and in a continuous data stream with low latencies. With these properties, big-data technologies provide the foundation for complex analytical evaluations, scalable, reporting-oriented data platforms, and distributed software systems with an event-based processing paradigm.
Since as long ago as 2009, inovex has been one of the first IT service providers in Germany dealing in depth with big data, and has developed and implemented productive corporate solutions in many projects:
- Data lakes, data hubs, data platforms
- Intelligent, data-powered services and applications (references: mobile.de, REWE, Arvato)
- Data analysis and machine-learning platforms (references: EM²Q, KOSMoS)
- Hybrid data warehouses, virtual data integration (references: ProSiebenSat.1, dmTech, C. H. Beck Verlag)
This means we can support you in all areas: from planning and development to operation of Big Data Systems, using on-premises infrastructure and/or in the cloud.
Big Data Tech Stack
- Event streaming platform: Kafka, Confluent
- Streaming data processing: Spark Streaming, NiFi, Flink, Storm
- Scalable data processing and analytics: Spark, Databricks
- Hadoop data platform distributions: MapR and Cloudera
- SQL mass data queries: Hive, Phoenix or Drill
- NoSQL databases: HBase, Cassandra, Elasticsearch, Druid
- Public-cloud (Big) Data Services: Microsoft Azure, Amazon AWS and Google Cloud
- Container-based set-up of infrastructure and services: Docker, Kubernetes
- Job management, orchestration: AirFlow, Argo, Oozie
- Data ingestion: NiFi, Flume, Sqoop
- Data governance and cluster security: Ranger, Kerberos, Navigator, Atlas
Technology PartnersBig Data
Confluent was founded by the team who developed the Apache Kafka™ distributed streaming platform for LinkedIn, scaling it to receive, process and store over 1 trillion messages per day. Kafka boasts a particularly impressive processing speed and provides connectors for data integration, as well as a framework for stream processing.
The mission of our partner Databricks is to accelerate innovation for all customers by unifying Data Science, Data Engineering and Business Intelligence in one solution.
This blog post motivates the use of virtual environments with Python and then shows how they canLEARN MORE
Right now there are three popular platforms to build a scalable and flexibel logfile management solution on-premise:LEARN MORE