Jul 27, 2010
Fact: A turnover rate based on a calculation
Dimension: Historical time trend of turnover
Fact: The number of heads in your HR database
Dimension: The breakdown of headcount by organization
Fact: The number of FTE’s in any given job code
Dimension: The demographic analysis of FTE’s in job code
I honestly have no idea how many of you have ever heard of a “star schema.” I’m not sure I ever wanted to, and I’m not sure knowing what it is called has ever really helped me out. But somehow, I picked it up along the way (not surprising for a data warehouse and analytics guy) and every once in a while I am reminded how difficult it is to explain analytical reporting to HR people who don’t already have it. And the simple fact is that most of us don’t really have it yet, or only have small bits in a couple of our applications.
Facts are all those things that we can quantify. They are… well, facts. They are the things we run ad hoc or operational reports against, and are usually the things that we have reported against for years. They are those turnover reports, or the headcount reports.
Dimensions on the other hand, are the attributes we want to dynamically apply to the facts. Lets say I get a turnover report. I’d love to right click on one of the bars in the bar chart and see a historical trend for turnover. So in this case, the dimension is time. Or that same right click might have given me the option to see the turnover by business unit, so now the dimension is organization. Perhaps we want turnover by age or gender instead – so now it’s a demographic parameter.
Sometimes, a data element can be a fact in one report and a dimension in another. So age, a dimension in our turnover report, can also be a fact. We run a report on the headcount by age groupings. However, when we see that report, we decide that we want to know the race allocation within our 30-39 band, so now age is the fact and race is the dimension.
I have no idea if all that makes sense in text (and without actually seeing it), and I’m really avoiding trying to explain a ‘cube” which I’m fairly sure I’ve tried to do before. But there’s a reason we talk about this stuff. When you set up analytics, you set up these star schema first. After you drop the data into your data warehouse, you get to create these “schema” which form the basis of your analytics. If you don’t know beforehand how you might want to see your data, you’re going to wind up with a very limited set of dimensions and 2 months down the road, you’ll be wondering why your implementer didn’t ask you if you wanted turnover by age, race and gender. Implementers implement what you tell them to. It’s up to you to understand what your own requirements are. The problem with data warehouse, is that understanding what the requirements are only happens when you understand the capabilities of the technology.