I keep seeing organizations that try to use their data warehouse platforms as data hubs. While on the surface this might seem like a good idea, it really might not be depending on your utilization of each. The data warehouse serves primarily as a business intelligence and analytics tool. If implemented correctly, it contains vast amounts of data from many disparate HR sources and therefore, it’s a good assumption that perhaps this might be used as the integration engine for HR as well. After all, HR usually has way too many interfaces sitting around requiring constant maintenance. If you could perform all integration out of a single platform rather than maintaining one of those octopus-like diagrams, all the better.
The truth of the matter is that data warehouses are inherently bad at being data hubs. An essential part of the data warehouse is the ETL engine. ETL stands for Extract, Translate, Load, and it’s the translate part that gives us problems. When the translations occur, what is actually happening (simplistically) is the data goes through a transformation so that when you report in the end stat of a data warehouse the data definitions are the same. The transformation is really an alteration of the data so that analytics are made simpler and possible at the same time. However, transformed data does not really work well for integration.
Let’s take salary grades as an example. When you do a global analysis of salaries, you would probably load in global employee salaries, currency conversion rates, and then perform a calculation to convert the data into the home currency. The report then might look at the distribution of salaries across several bands. These bands however, have been transformed into global bands as opposed to using the home country’s salary grades. In many cases, the original salary grades may not have gotten into the data warehouse intact. (just an example I thought of off the top of my head).
I would suggest that to be a true data hub, you would want to have the original data from the original system. Let’s say that you need global employee data from many sources to load into a single global succession system. Not only will you be taking employee indicative data, but you’ll want performance scores, education, goals, etc… Once again, the problem with a data warehouse here is that performance scores in each country may utilize different rating scales, etc. The normalized data out of a data warehouse may not be suitable for the succession application depending on the decisions you make when you implement.
The alternative is to use what is usually called the ODS, or the Operational Data Store. The ODS often exists within the DW application, but is a loading area (after the E in ETL, but before the TL). The ODS is not ideal because it’s really just a storage tank before translation, but does hold untransformed data from all the systems. While not intended as an interface engine, its tables could be used for it (I suppose), but the actual transformed data tables are simply not appropriate for use as an interface engine in my opinion.