Insufficient data from Andrew Fryer

The place where I page to when my brain is full up of stuff about the Microsoft platform

December, 2008

  • Insufficient data from Andrew Fryer

    SQL Server Advent Calendar 6 – Hierarchy Data Type

    • 2 Comments

    Day 6 of my virtual advent calendar, about stuff I like in SQL Server 2008..

    Using a relational database to store structures like the hierarchy of a company or the product catalogue usually involves one of two techniques:

    1. Have a table for each level in the hierarchy with a foreign primary key relationship to represent the parent child relationship.  This works fine where the depth (number of levels) in the hierarchy is known.
    2. Have a self join in the table e.g. a member of staff would have the id of their manager in a manager id column which refer to the staff id of the manager in the same table.  This means that the number of levels in the hierarchy is not predetermined and so the schema doesn’t have to be changed if the business introduces new levels.

    The downside of this self join approach is that is slow to navigate and awkward to report on.  So In SQL Server 2008 there is a new HierarchyID data type. This stores the structure of the hierarchy in a system defined user defined type (UDT) like there is in SQL Server for spatial data.  and like spatial data there are three things you get:

    1. the data type itself
    2. functions to make it work e.g. tostring(), reparent(), GetParent(), GetDescendant() etc.
    3. indexing to make it fast.

    Of course this is one new feature which will require development time, but it is a lot faster and intuitive then self joins while having the same advantage of not requiring schema changes to reflect changes in the number of levels you need.

    If you want to try it then I have put a simple example in this separate article on my blog.

  • Insufficient data from Andrew Fryer

    SQL Server Advent Calendar 5 – Analysis Services Performance

    • 0 Comments

    Day 5 of my virtual advent calendar, about stuff I like in SQL Server 2008..

    Over 80% of the budget on improvements to Analysis Services in SQL Server 2008 went on improving performance. The answer was called block space computation and to understand this you should be aware that Cubes are often quite sparse so for example every customer doesn’t buy every product everyday (nice if they did!).  However in SQL server 2005 calculated members were applied to the whole cube wasting a lot of time and effort in the process.  More sophisticated users worked around this by using not empty predicates in queries with limited success.

    Block space computation fixes this by filtering the incoming query before calculated members are applied (the meat grinder in the diagram below) to it to remove any cells which have a default value in them. 

    image

    Thus calculations are only applied where they are needed and this just works behind the scenes without any user intervention. 

    So I would recommend testing your performance using an evaluation copy of SQL Server 2008 by simply backing up / restoring to the new server.

  • Insufficient data from Andrew Fryer

    SQL Server Advent Calendar 4 – Filtered Indexes

    • 1 Comments

    Day 4 of my virtual advent calendar, about stuff I like in SQL Server 2008..

    Filtered indexes are indexes with a where clause, allowing you to index the part of the  table you are interested in. For example if a column can have nulls in you could create an index on it that only indexes the non-null values making the index still relevant but potentially a lot smaller at the same time. better yet if the remaining values are unique you can then make that a unique index.

    create unique index production.nullidx

    on production.product(code) 

    where production.product(code) is not null

    I have posted about this before, but I wanted to mention it again because there is an occasional problem with filtered indexes returning incorrect answers if there are more than three tables in a join.  It’s been fixed by a cumulative update which is here.

  • Insufficient data from Andrew Fryer

    SQL Server Advent Calendar 3 – Analysis Services Backup

    • 1 Comments

    Day 3 of my virtual advent calendar, about stuff I like in SQL Server 2008..

    Backing up of analysis services in SQL Server 2005 ran out for steam for databases of about 20Gb, and so you have to follow a different route to back them up (see this technet article).  The time taken to back up a database does not grows faster than the growth in size of that database. e.g. doubling the size of the data might mean the backup takes 3 times longer.

    This has been fixed in SQL Server 2008 as the backup process has been completely rewritten and now closely follows the time it would take to copy the physical files.  This means that the simple backup in analysis services can now cope with data sizes of 100Gb plus which is pretty good when you remember that the analysis services database is typically a quarter or a sixth of the size of the relational database the data came from.

    Finally,it’s a trivial exercise to migrate an analysis services databases from SQL Server 2005 to 2008 either through a backup restore or by opening and deploying the source project to analysis services 2008.

  • Insufficient data from Andrew Fryer

    SQL Server Advent Calendar 2 – Time

    • 1 Comments

    Day 2 of my virtual advent calendar, about stuff I like in SQL Server 2008..

    Following on from my previous post, in some data warehouses there is a separate dimension for time of day, so that demand through a day can be modelled. Storing time in SQL server 2005 was a bit of a cludge typically involving picking an arbitrary date (like 1/1/1900) and then tacking the time on to the end of that.  Now there’s a separate time data type so it’s easy to store the right data and create the time dimension using a script like this:

    declare @time time = '00:00'
    declare @timekey int = 0
    declare @timegrain int =15

    if not exists
        (select  * from sys.tables where name = 'dimTimeofday')
    create table dimTimeofday( timekey int, TimeofDay time)
    while @timekey < 1440 begin   
        insert into dimTimeofday(timekey,Timeofday) values (@timekey, @time)
        set @time = dateadd(minute,@timegrain,@time)
        set @timekey += @timegrain
    end

    For more on the new time data type check books on line here.

  • Insufficient data from Andrew Fryer

    SQL Server Advent Calendar 1 – Dates

    • 3 Comments

    There is no possible connection between databases and Christmas, so I simply wanted to share 24 my favourite bits and bobs that are in SQL Server 2008….

    The new date data types in SQL server crack a problem I have had since I started BI as there is (nearly) always a time dimension in every data warehouse. In many retail data warehouses the grain (level of detail) was reduced by grouping all transactions in a day.  The trouble is in SQL Server 2005, you had to create a date like 25/12/2008 00:00 as per dimTime in AdventureWorks

    image

    Now we a date datatype which also allows ius to store just the date and has implicit conversion form datetime to rip out the time.

    For more on the new date data types check books on line here.

Page 2 of 2 (31 items) 12