Data Science & Engineering

Data is becoming more and more important. Let us take a look at some examples to become a feeling in how far data will change in the future:

1. People are generating 2.5 quintillion bytes of data each day [1].
That is 2,5 with 30 zeros added to it, and that happens each day!

2. 90% of all data available on the planet, has been generated in the last 2 years [1].
This means, that we far exceeded an exponential growth of data, and truly entered the data-driven age.

3. Of this data, 95% remain not analyzed [1].
Many businesses know that they need to collect data, but do not know what to do with it. In this 95 % of unanalyzed data very valuable information might be hiding, like your ideal target demographic, an undervalued product of yours, or a potential unicorn idea.

4. By 2020 almost all data processing will be done by automated processes [1].
All these data can not anymore be analyzed by people themselves. What might have worked in times of SAP and Excel, can not be manually done anymore. Therefore new tools, new technologies, and new processes need to be implemented to handle this enormous data mass.

[1] Source

How does Data Science benefit my business?

Especially in times where more and more business transitions to the digital world, it is of the essence to make the switch into the data-driven age. Of course, this might upset many traditional businesses, but complaining does not help amidst this transition into a new world.
Your company will have the choice to either build up their own data team and architecture or use services and outsourcing to handle their data for good.
But to give you a glimpse of hope: Using this data can boost your sales and revenue in a never before seen manner. New markets and customers become available, new opportunities together with a global reach are at your fingertips. An upgrade of a process will not increase sales by 1% anymore, it might x10 your results. Keep reading to find out how your business can manage this transition.

The difference between Data Science and Data Engineering

Both techniques and approaches to deal with this massive amount of data can be summarized to “Data Science” and “Data Engineering”. Data Science deals with the analytical part of the process, meaning the step of data preparation, extraction, and deduction of results given data.

Data Engineering on the other hand deals with the technical background for a Data Scientist to work. This includes for example optimizing or deploying databases, special Big Data Frameworks, process optimization, or data preprocessing as well.

DATA SCIENCE

Data science is a “concept to unify statistics, data analysis, machine learning, and their related methods” in order to “understand and analyze actual phenomena” with data. [2]

DATA ENGINEERING

Data Engineering deals with the setup and usage of tools needed for a data scientist to work. This includes for example storage of data, parallel processing, and other things in tools like Hadoop, Spark, and similar.

Increased revenue through Data

Data is becoming more and more important, but still 95% of data remains unused by companies. You might say to yourself “of course, there are many opportunities that I could follow with my company, why data?” – but one thing that should not be underestimated with data is that it is not just a small improvement of your product gaining 0.X % increase in sales, it is rather a transition of your company into the data-driven age, times X-ing your revenue.

Example 1: Ford automotive

As an example, let us have a look at the Ford automotive company. In 2006 their annual loss has been 12,6 billion USD. After that, a data scientist has been brought in to overhaul the company in the next three years into a more data-driven company. The result has been 2,3 million cars sold in 2009 and is in the green by the end of the year.

E. McNulty, 5 Ways a Data Scientists Can Add Value To Your Business (2014)

Example 2: Supply chain improvements

Timing supply chain resources can be the deciding factor for a win or loss. To have the factory ready if a delivery arrives, or to avoid employees being idle are some factors that can be optimized with data science.

The Pitt Ohio Freight Company utilized data science to estimate the arrival of their drivers with a 99% accuracy rate, which increased their customers that much, that they gained another 50,000 USD a year by repeated orders. (Source: T. Capone, How Data Science Can Help Your Enterprise Generate More Revenue (2018))

Example 3: Programmatic Advertising

Using the information a company has about future or existing customers can leverage new revenue streams for previously undiscovered target groups or better targeting of ad spends. Traditional advertisements have a bad targeting, as all the people driving past a billboard, for example, might or might not see the ad, no matter if it is fitting or not. This is one the one hand quite expensive, as billboards cost a lot of money, and due to the fact that the average person is spammed by hundreds of ads a day, their attention span is really limited nowadays. What if we could just show our ad to the people that are most likely to be our customers, and target them with personalized information that fits their needs?

This is the classic “shooting a cannon at a bird” example. Billboards are like shooting a cannon at a small sparrow to catch it. I mean it works, but it also spends a lot of money on cannons, might injure a lot of other animals and is a huge effort. Why not use bait that targets sparrows specifically to catch only sparrows and it precisely?

“The Economist”, an established print and online media, used programmatic advertising to drive potential unsure readers to finally subscribe. Their initial hopes were to gain 650’000 new customers. They used a combination of information they already had, plus derived information from geolocation, IP’s and several other information to show advertisements to “mirrored customers”, meaning potential new customers that are similar to existing ones. They identified their main segments in their publications, like “Finance”, “Politics”, “Careers” and others, and produced matching content which was then shown to the segmented mirrored customers.

The results of this campaign have been astonishing. The goal of 650’000 customers was bet 5 times over, with 3,6 million new customers. Their newly gained lifetime value is estimated to be £11,970,000 (14,8 million USD), which is a return of invest (ROI) of 10:1. Furthermore, awareness for “The Economist” has been raised by almost 65%.

[Source]

Become a data-driven company now

Contact us now for a free 15-minute consultation on how to apply Data Science to your business.

A selection of our data services:

Data Lakes
Data Analytics / BI-Tools
Data Migration
ETL (Extract, Transform, Load)
Data Strategy Consulting
Data Environment setup (Spark, Hadoop, Jupyter Notebooks, …)
Serverless Databases (SQL, NoSQL, …)
Data Visualization (2D, 3D, VR, AR)
Model Tuning
Data Warehouses, Data Marts, Data Policies (HIPAA, GDPR, …)

What is a Data Lake?

Data lakes are essential for your Big Data landscape. In order to centralize all your data sources, be it SQL-databases, images, text data, E-mail, SAP, and others, an intermediate layer needs to be established in between your data science team and the data itself.
This layer manages communication between these different data structures, access control (who can access what), lifetime-policies (how long is data stored, data catalogs), and the security of these interactions (logging of access, security measures, timely-limited access, …).
The goal is to basically establish a “front-end” for all your data needs. Employees will be able to only access the data they are allowed to, where the access will be logged for forensics if needed, and can “load” them into their data analytics tools. A centralized gives both an overview of existing data and avoids building up several data “silos” or unnecessary duplicates in your company.

What is Data Engineering?

Data scientists should focus on their data analytic processes. But given the 80/20 data science dilemma, data scientists spend 80% of their time with data preparation instead of applying their algorithms (https://www.infoworld.com/article/3228245/the-80-20-data-science-dilemma.html). If Jupyterhub & other tools are correctly set up on top of a data lake this time can be drastically reduced, which includes using the right tools, latency reduction, and usage of the right storage version for the matching use case.

What is ETL / Data Processing?

Another time-consuming step in data science is to take the data in its raw form (e.g. SQL data) and preprocess it into a form that makes it usable. This is called ETL, being short for Extract, Transform, Load. A good big data system automates these steps as best as possible, e.g. that to prepare data for the data scientists in the morning, huge cloud-driven machines start-up in the night, take the raw data, prepare it and save it ready to be analyzed in the morning.

What are Serverless Databases?

SQL databases can be a pain to manage. Additionally, many companies do not employ effective backup solutions in case a server goes down. More than often, unexpected traffic peaks can crash a product, because classic SQL servers do not scale with the demand needed. This can easily be avoided with the usage of cloud, or hybrid-cloud (on-premise and cloud) scenarios.

What are Speed-First Websites?

Our focus when building websites is speed. BBC reported losing 10% of customers for every second their website takes to load.
https://www.creativebloq.com/features/how-the-bbc-builds-websites-that-scale
Additionally, 53% of mobile users abandon the website if it takes more than 3 seconds to load (https://www.marketingdive.com/news/google-53-of-mobile-users-abandon-sites-that-take-over-3-seconds-to-load/426070/).
Of course, the look and design of a website are important as well, but in the last years, looks became more important than loading times. The problem is, that if the website loads slowly the user will not even see your website. Therefore it is of uttermost importance to speed up websites because even apart from being favored by customers, they also rank higher on Google searches. We can do this using several serverless architectures, content delivery systems, and special settings to speed up content delivery.