
Summary
Construction of data warehouse and data-driven analysis / models
Keywords
AWS, MS Azure, ETL Pipelines, Tensorflow, Elasticsearch, SQL, NoSQL, Kafka, Python
Description
The goal was to combine various NoSQL and SQL databases in a central data warehouse and data lake. Data had to be collected via batch jobs in a nightly process as well as “live” data via streaming using Kafka from different end devices.
A special requirement of the project was that large amounts of data had to be managed (20+ million web events / month, 2 TB/month), as well as the security standards had to be met by Volkswagen AG. Along with this, I was also massively involved in the implementation of the DSGVO standards, as well as with the group-wide “Team Cloud” from Volkswagen AG in the expansion of the cloud infrastructure with AWS and MS Azure.
Various recommender systems were developed for use in apps and on websites, with the aim of displaying the perfect vehicle for a user. These were implemented in a simple way using graphs in Elasticsearch, and in more complex scenarios using precomputed clusters or machine learning models which were played out using OpenFAAS and Kubernetes.
Furthermore, machine learning models were developed for the prediction of unit numbers, inventories and (vehicle) registration numbers.