This simple design, popped up in my inbox today asking for advice…
What you need to understand about this diagram is that there is a concatenated key (keys are in BOLD) on each of the orange dimension tables, with one part of that key being CompanyID. The exception is dCompany which is just keyed on CompanyID. You can imagine that this is a ledger fact table for multiple sub companies in a large organisation (BTW there is a time dimension as well but I want to keep the post simple).
I can understand that each product, location and so on belongs to particular company and that there might be occasions when you might need a complete list of the locations, cost centres etc for that one company. This design allows each of the dimensions to yield that list in a simple 2 table query (by joiing directly to dCompany on CompanyID).
The fact table has no redundant columns because even CompanyID is needed to link to the company dimension.
So all I would do is add in surrogate keys, i.e. a key with no relevance to the data in the dimension which would be the primary and clustered key of each of the dimensions..
For example in dProductGroup the ProductSK of 1 is just a made up number – it might refer to CompanyID4, ProductID20, rather than choosing ProductSK on some transformation of the two business keys. This will make queries faster and speed up cube processing if you plan to put analysis services on top of this. In my projects I nearly always use surrogate keys and I simply use the identity property on the column that will be the surrogate key on the dimension to do this, reserving 0 for a an unknown row e.g. unknown Location.
Whether or not it’s worth doing this is going to depend on 2 factors:
A final thought is that the design in the second diagram will also reduce the size of the fact table in SQL Server 2005 and standard edition of SQL Server 2008, but data compression in SQL Server 2008 enterprise edition will remove this advantage.
Another good reason to introduce surrogate keys is if you need to track historical changes to the dimensions (as a Type 2 Slowly Changing Dimension).
While you might not immediately identify this as a requirement, by using surrogate keys you make life so much easier for yourself when the requirements change and suddenly your users base say something like "of course we want to know that the Company name change six months ago"!
Your revised design also allows for the STAR JOIN or Optimized Bitmap Filter optimisation to be used when querying this model. In 2008 Enterprise Edition only single column keys can use optimized bitmap filters so your revision is very valuable indeed. Assuming that the surrogates are integer based this will also open up the full optimization by ensuring that the Bitmap filter is processed IN ROW.
Andrew, this is absolutely fine, and is most certainly best practice. What would be really nice is if you could extend this simple example into SSAS . . .
What I'm thinking about is screen shots of how the Dimension Structure and Attribute Relationships tabs would look in SSAS 2008 Dimension Designer. I would suggest just the dCompany dimension. It would be smashing if you could show how to 'hide' the SK so that from the user viewpoint, it would look like the original dCompany (without the SK) when browsing the cube. (Perhaps also show Property settings?)
I feel that this would better show the practical implementation of this undoubtedly best practice in DW design.
Cheers, Donna Kelly
In 2008 Enterprise Edition only single column keys can use optimized bitmap filters so your revision is very valuable indeed.