Data Warehouse

A data warehouse enables companies to combine and analyze a wide variety of data. You will find everything worth knowing about data warehousing and corresponding systems on this page!

What is a data warehouse?

A data warehouse (DWH for short) is a database system that is used especially for business analyses. It can obtain data from various sources, merge it, convert it, store it permanently, and make it available for downstream analysis systems.

What is Data Warehouse for?

The core task of a data warehouse is to provide data in order to be able to make sound business decisions on this data basis. The challenge lies in the fact that the required information is available today in a wide variety of (internal and external) systems. In addition, the data are structured differently - they therefore have different formats. Data warehousing not only makes it possible to store this data in a central system. It can also be transformed into a uniform structure and then combined. This creates a consistent view of the source data. This holistic approach makes it easier to identify interrelationships and to derive well-founded management decisions.

Data warehousing is also relevant for ad-hoc analyses. Here, a technique called Online Analytic Processing (OLAP) is used. OLAP enables the aggregation and evaluation of data. One example is the summation of all sales of a specific product within a sales region.

The second important task of data warehouses is to separate operational data from analysis data. This means that reports and analyses do not have to be executed in an operative system (usually ERP). Instead, they are moved to a separate area. This reduces the load on the source system and avoids performance bottlenecks.

Is Data Warehouse a software, a server or a database?

Data warehouses are not server hardware. They are also not databases in the classical sense. Rather, they are software solutions consisting of several components (layers). The data warehouse architecture is as follows:

  • Source systems (operative systems): provide source data as a database
  • Data Staging Area: extracts, structures, transforms and loads the data from the source systems; transports the data to the actual DWH database (the data warehouse)
  • Data Presentation Area: separate area for data access by downstream systems
  • Data Access Tools: Applications for data access

The Data Staging Area has a particularly important task. It is usually based on the ETL process, which we will describe briefly below.

The ETL Process in the Data Warehouse

In order to convert raw data from different sources into a uniform format, they must be converted. In data warehousing, this usually takes place within the framework of the so-called ETL process. ETL stands for Extract, Transform and Load. During extraction, the data is obtained from the source systems. The transformation cleanses and standardizes the data. In the last step (Load), the data is then transferred to the target database. To keep the data up-to-date, this process is repeated at regular intervals. In recent years, real-time loading has become increasingly popular.

Data Warehouse compared to related solutions

A data warehouse is often confused with similar applications. These include in particular conventional databases, data marts, data lakes and data mining. In order to differentiate the terms, we would like to compare them in the following.

Data Warehouse vs. Database

In contrast to classic databases, data warehouses are designed for the analysis of larger data volumes. Databases collect data for different processing purposes and optimize them for read and write access. DWHs, on the other hand, aggregate, transform and store transaction data for analysis purposes. The focus is on access to large data sets.

Data Warehouse vs. Data Mart

Sometimes data warehouses are also confused with data marts. However, data marts are only a partial view of a dataset. They are used to make a subset of the total data available to defined users for specific application purposes. Compared to making the entire dataset available, this approach is much faster.

A data warehouse is often made up of several data marts. However, there are also data marts that are created and used independently of a DWH.

Data Warehouse vs. Data Lake

The classic data warehouse is primarily tailored to the development and provision of structured data. These are mainly derived from SQL databases. In the course of Big Data, however, it is necessary to access more diverse information, which is often available in unstructured form. In addition, the data stocks are considerably larger.

For these reasons the concept of the Data Lake was developed. It can be described as a supplement to data warehouses and makes it possible to expand DWHs into a big data analysis solution. Data Lakes are designed for the collection of enormous amounts of data. At the same time, they can store a wide variety of data formats - even unstructured ones.

However, the storage in Data Lakes is exclusively in raw format. The transformation only takes place when needed. In contrast to data warehousing, special knowledge of data scientists is sometimes required for this.

Data Warehouse vs. Data Mining

With data mining, correlations and patterns can be extracted from existing data. In principle, this process can be performed with any conventional database. However, if you want to ensure that only quality-checked data is used, data mining based on data warehouse data is recommended. If the data volumes for data mining are so large that storage resources reach their limits, companies can alternatively use data marts as data sources.

What aspects should a data warehouse concept cover?

When creating a data warehouse concept, both business and technical aspects must be taken into account. From a business perspective, the following questions should be addressed:

  • Which information should be transferred to the central data storage?
  • Which summarization level is necessary?
  • Which employees should have access to the data?
  • What is the definition of the key figures and what are their values?

From a technical point of view, the concept must be developed with the aim of systematically combining the data from the various sources into a common database. For this purpose, connections to the data sources must be established periodically or in real time. These connectors must be described exactly. The same applies to the processing mechanisms and the desired data quality. In the best case, the concept achieves that a "single point of truth" for data is created with the introduction of the data warehouse. This means that in all reports and evaluations, regardless of the department, no deviating figures can occur.

The following technical components of a data warehouse architecture must also be planned:

  • Components (computing resources) for the ETL process
  • Storage space for raw data
  • Subdivision into data marts
  • Storage space for long-term archiving of data (data backup, recovery)
  • Database system for metadata

Which data warehouse tools are available?

With regard to tools for data warehousing, a distinction must be made between commercial systems and freely available tools (open source). However, open source solutions often do not cover the entire 5-tier architecture (data source, data acquisition, data storage, data analysis, data presentation). They must therefore be combined with each other to represent a complete data warehouse concept.

Furthermore, a distinction can be made between on-premise and cloud solutions. More and more companies are now moving their data management and data warehouse tools to the cloud. This is done in particular for reasons of flexibility, scalability, location-independent access options and cross-team collaboration. Examples of popular cloud offerings are Google BigQuery, Amazon Redshift, Snowflake and Microsoft Azure SQL Data Warehouse. ERP providers such as SAP now also have cloud based DWH software in their portfolio.

Which data warehouse solutions does SAP offer?

In the area of data warehousing, SAP offers several solutions with which companies can implement a binding, central data source. In the latest product generation, these are the following systems:

  • SAP HANA
  • SAP BW/4HANA
  • SAP Data Warehouse Cloud

All three solutions enable the integration of data from SAP and non-SAP systems. Apart from this, however, the systems differ in their target group and approach. Let us therefore take a closer look at them.

SAP HANA (as Data Warehouse)

The high-performance SAP HANA in-memory database is used for numerous applications - including as an "in-memory data mart". In contrast to traditional data warehouses, reports can be accelerated by a factor of 100 to 1,000 with HANA. The technology is also suitable for use in conjunction with the SAP Business Warehouse business intelligence solution.

As a DWH, SAP HANA is particularly suitable for companies that are looking for a flexible, customizable solution, require high performance and want to perform advanced evaluations. The system can be provided either locally or in the cloud.

SAP BW/4HANA

SAP BW/4HANA is also completely based on the in-memory database HANA. However, the focus is particularly on processing very large volumes of data in real time. Among other things, BW/4HANA can be used to analyze data from SAP systems and third-party systems as well as geodata, sensor data (IoT), Hadoop data and unstructured data live. This makes the DWH solution suitable for the following scenarios, for example:

  • Manual and automated forecasts in the financial sector
  • Analysis of mass data from IoT sensors
  • Predictive analytics
  • Analyses, forecasts and process automation based on artificial intelligence (data mining, cognitive computing)

SAP BW/4HANA is the successor of the classic SAP Business Warehouse. Unlike the predecessor system, BW/4HANA is independent of NetWeaver. Data modeling has been greatly simplified with the "Data Flow Modeler" tool. BW/4HANA can also be operated either on-premise or in the cloud.

SAP Data Warehouse Cloud

SAP Data Warehouse Cloud is aimed at companies that want to enter data warehousing quickly and without high costs. It is a fully managed service, so there is no need to invest in your own server hardware.

SAP Data Warehouse Cloud also provides the advanced functionality of HANA. Its scalability allows companies to start with a small environment and expand as demand grows. Due to the wide range of connectors, all common data sources (SAP and non-SAP) can be integrated. The templates that are already included in the delivery make data integration into the cloud data warehouse even easier.