Insufficient data from Andrew Fryer

The place where I page to when my brain is full up of stuff about the Microsoft platform

December, 2008

  • SQL Server 2008 - Hierarchy ID Data Type

    To show this we need to create a table with the new data type..

    create table [dbo].Organisation (
      DepartmentID int primary key nonclustered,
      DepartmentName varchar(100) not null,   
      DepartmentHierarchyNode hierarchyid not null,
      DepartmentHierarchyLevel as DepartmentHierarchyNode.GetLevel() persisted
      -- plus any other columns you need
    )

    Note the level is a derived function of getLevel() which returns how deep a node is in the structure in this case it’s the only node and so will return 0.

    Now we need to put the some data in starting at the top of the tree i.e the whole company – which in this example is Contoso..

    insert Organisation(DepartmentID, DepartmentHierarchyNode, DepartmentName)
    values (1, hierarchyid::GetRoot(), 'Contoso')

    note the getroot() function which does what is says and gets the root node of the hierarchy and since this is the first node in the hierarchy this is simply itself.

    Adding in new rows to this table is best done with a stored procedure like this one which needs to know the id for the new department the name of it and the id of the parent department it belongs to ..

    create

    procedure AddDepartment

    @DepartmentID int,

    @DepartmentName varchar(100),

    @ParentID int

    as

    begin

    declare @ParentNode hierarchyid,

    @MaxChildNode hierarchyid

    begin transaction

    -- get the node for the parent of the new

    -- department

    select @ParentNode = DepartmentHierarchyNode

    from Organisation

    where DepartmentID = @ParentID

    -- get the last department for the parent

    select @MaxChildNode = max(DepartmentHierarchyNode)

    from Organisation

    where @ParentNode = DepartmentHierarchyNode.GetAncestor(1)

    -- add the new department

    -- Parent.GetDescendant(MaxChild, null) returns the

    -- next descendent

    insert Organisation (DepartmentID, DepartmentHierarchyNode, DepartmentName)

    values (@DepartmentID, @ParentNode.GetDescendant(@MaxChildNode, null), @DepartmentName)

     

    commit

    end

    Here you can se more obviously named functions of a HeirarchyID like GetAncestor, GetDescendant and MaxChild.  I have to admit I have no problem with these as they are quite similar to what you use in MDX in Analysis Services.

    Here’s some data to go into the new table..

    exec AddDepartment 2, 'Operations', 1
    exec AddDepartment 3, 'Development', 1
    exec AddDepartment 4, 'Parking', 1
    exec AddDepartment 5, 'Home Operations', 2
    exec AddDepartment 6, 'Field Operations', 2
    exec AddDepartment 7, 'North Territory', 6
    exec AddDepartment 8, 'South Territory', 6
    exec AddDepartment 9,   'Database', 3
    exec AddDepartment 10,  'Services', 3
    exec AddDepartment 11,  'Applications', 3
    exec AddDepartment 12, 'Windows', 11
    exec AddDepartment 13, 'Internet', 11
    exec AddDepartment 14, 'Self', 4
    exec AddDepartment 15, 'Valet', 4

    selecting from this will give you the levels and so on:

    image 

    This procedure will show a tree of the structure using a single select statement

    create procedure ShowDepartmentChart
    @DepartmentName varchar(50)
    as
    begin

      declare @TopNode hierarchyid,
              @TopLevel int

      select @TopNode = DepartmentHierarchyNode,
             @TopLevel = DepartmentHierarchyLevel
        from Organisation
       where DepartmentName = @DepartmentName

      -- note that parent is descendent of itself
      select DepartmentHierarchyNode.ToString() NodeText, space((DepartmentHierarchyLevel - @TopLevel) * 5) + DepartmentName Department
        from Organisation
       where DepartmentHierarchyNode.IsDescendantOf(@TopNode) = 1
       order by DepartmentHierarchyNode

    end

    go

     

    and then use this like this..

    exec ShowDepartmentChart 'Contoso'

    image

    The common toSting() function returns the human readable form of the hierarchyID node.

    You can (and should!) index Hierarchy ID’s in two ways..

    1. Create a "breadth first" index,  to speed up selecting of nodes at the same level, all the sub departments that belong to a parent department


    create clustered index Organisation_Breadth_First
    on [dbo].Organisation( DepartmentHierarchyLevel, DepartmentHierarchyNode )

    2. Create a "depth first" index, to speed up selecting a sub-tree
    of nodes, e.g. all the departments beneath a particular parent department


    create unique index Orgisation_Depth_First
    on [dbo].Organisation(DepartmentHierarchyNode )

    Hierarchy ID’s are faster than self joins, not least because no recursive SQL is needed and in my opinion more intuitive.  However one important thing to remember is that don’t enforce any kind of integrity so like with self joins you can end up with orphans. One area like this where you can stuff things up is if you use the reparent function. this moves a node in the structure to a differnet parent ( for example if a mmeber of staff has a new manager), but it doesn't move the nodes underneath the moved node as well, you must do this yourself. 

  • SQL Server Advent Calendar 11 – Report Builder v2

    Day 11 of my virtual advent calendar, about stuff I like in SQL Server 2008..

    …and I mentioned redesigning charts in my last post and the best answer might be to get your users to do it for themselves in Report Builder 2.

    image

    Leaving you (the DBA BI expert) to concentrate on getting them the data they need.  Report Builder2  is built of the same code as the Report designer in BI development studio, but has a different query builder.  If you want to use a report model then the model needs to be deployed to report manager and then report builder can see it from the data source dialog..

    image

    Report Builder v1 is still available as a ClickOnce deployment from Report Manager in SQL Server 2008, but if you want Report Builder 2 it needs to be downloaded from the Feature pack here.

  • SQL Server Advent Calendar 23 - Filestream

    Day 23 of my virtual advent calendar, about stuff I like in SQL Server 2008..

    Filestream isn’t one of the new data types in SQL Server 2008 it a variant of varbinary(MAX), and it is Microsoft’s solution to the problem of storing large unstructured data as part of a database. 

    In SQL Server 2005/8  varbinary(max) can store a file of up to 2Gb (a limitation inherited by SharePoint)  and as this limit is approached the performance of streaming this data back to a client becomes worse and worse.  On the other hand referential integrity is easy to maintain and backups disaster recovery is straightforward as all that’s needed is to restore the database.

    The other common approach to this problem is to have a table with pointers to the file locations of the documents on disk.  This is quite hard to write and implement and referential integrity and backup are also difficult as it’s important to have the right file at the end of each pointer and not to loose any on the way. However it does have the advantage of performance over internal storage of the data in the database. 

    Filestream is the best of both worlds.  Documents are stored as files on disk, but they are in a special SQL Server filegroup.  When a request for a file is made SQL resolves this into a filehandle that can be use by ordinary win32api’s so performance is as good as it can be.  However because all of this is in a filegroup backup is also straightforward. This mechanism also respects transactions so if you rollback you get back to the previous version of the file. A word of warning here is that this can involve some serious disk usage if the files are large as internally SQL Server keeps the older version of the file =until the transaction is committed.

    Using Filestream is a developer task , so any example I put here will be littered with very dodgy C#. Rather than do that I will instead point you to this excellent  example developed by one of my French (it is in English) colleagues Patrick Guimonet, and it also shows Powershell integration in action!

    Finally Filestream is included in all editions of SQL Server while not being part of size limitations in these editions.

     

  • SQL Server Advent Calendar 18 – Audit

    Day 18 of my virtual advent calendar, about stuff I like in SQL Server 2008..

    Rounding out ways of keeping an eye on what’s happening to your data in SQL Server 2008 is the new auditing feature in Enterprise edition.  As the name suggest it’s there for a specific purpose, to make your compliance work as easy as possible.

    In SQL Server 2005 you could monitor changes to to the data in a database using triggers and you could also track changes to permissions and schemas.  However tracking the results of queries was not easy or you had to resort to profiler to trap the SQL.  Audit not only handles all of this it, also allows you to direct the audit information to the application or security log as well as to file.

    You also get a nice UI to help you set it up and consume the information.

    The first step is to create an Audit in the Security node of SSMS..

    image 

    This will consume i.e. be the target of the database audit specification.  I am going to use a file here.

    Now I can setup a database audit specification again using the UI which I can get to by expanding security under the database I am interested in and right clicking on Database Audit Specifications..

    image

    I am going for an action type of select on an object and then I can select which object(s) to track. In this case I am worried about people looking up employee addresses so I have just selected the HumanResources.EmployeeAddress table. I can also filter by the security principal  (in my case dbo) as I may only be worried about certain groups of users.  Having created it I can right click on it to enable it and I am done.

    To test it I can run some queries against it..

    select
        E.LoginID,
        A.AddressLine1,
        A.City
    from HumanResources.EmployeeAddress EA
    inner join Person.Address A on EA.AddressID = A.AddressID
    inner join HumanResources.Employee E on E.EmployeeID = EA.EmployeeID

    Select
        VE.FirstName,
        VE.AddressLine1,
        VE.City
    from HumanResources.vEmployee VE

    Note the second query is a view based on the table we have the audit on.

    Now I can see the output by going back to the audit and right clicking to view the audit log..

    image

    scanning along to the right I can see the SQL I used…

    image

    so we got both selects back which is good to know, and I set this up in the the time it took my wife to grill tonight’s Salmon! 

    For further reading check books on-line here and this whitepaper on SQL Server 2008 security.

  • SQL Server Advent Calendar 12 - Tablix

    Day 12 of my virtual advent calendar, about stuff I like in SQL Server 2008..

    One of the frustrations I used to have with Reporting Services was to decide whether to use a table or a matrix data control, and if I got too far in to a design and it turned out to be wrong I had to pretty much start again.

    Although in looks like nothing has changed in this regard in SQL Server 2008 there is in fact only one kind of data grid which is called Tablix (TABL e + matr IX).  Here’s one I made earlier in Report Builder v2

    image

    No only does it replace table and matrix reports it is also a lot more powerful – as a tablix control can have multiple independent row and column groups as you can see in this simple example..

    I can add an independent group by right clicking on the column group area (bottom right)..

    image

    then add  Group –> Adjacent after, and then select group by Year.  That will create a new column into which I can drag the sales amount from the data set on the left into the detail row and then again into the total.  I need to have the year as the column header so I need to enter !Fields.[Order Year].value. in there (the teal coloured box).

    If I run it now I get this.

    image 

    This is simply not doable in SQL Server 2005 reporting services.  Be aware however that I had all of this data in the one data set, as any data region can only depend one dataset.

    If you want to have a go, you’ll need adventureworks as the database and then the original report is here and the version I changed it to is here

  • SQL Server Advent Calendar 3 – Analysis Services Backup

    Day 3 of my virtual advent calendar, about stuff I like in SQL Server 2008..

    Backing up of analysis services in SQL Server 2005 ran out for steam for databases of about 20Gb, and so you have to follow a different route to back them up (see this technet article).  The time taken to back up a database does not grows faster than the growth in size of that database. e.g. doubling the size of the data might mean the backup takes 3 times longer.

    This has been fixed in SQL Server 2008 as the backup process has been completely rewritten and now closely follows the time it would take to copy the physical files.  This means that the simple backup in analysis services can now cope with data sizes of 100Gb plus which is pretty good when you remember that the analysis services database is typically a quarter or a sixth of the size of the relational database the data came from.

    Finally,it’s a trivial exercise to migrate an analysis services databases from SQL Server 2005 to 2008 either through a backup restore or by opening and deploying the source project to analysis services 2008.

  • 7th Day of Christmas Tagging

    Properly indexing databases is generally seen as a good thing, and is one of the top causes of poor performance.  However when we get home from a hard days SQL we seem to forget all of this and our home lives are less well organised, especially our photographs.

    In my latest Christmas video, my wife Juliet shows you how to tag your photographs using Live Photo Gallery ..

    Juliet also wanted to mention that a lot of telemetry from your photos gets stored in the image (date, camera settings), automatically by a digital camera, so you just need to add the who and what, if you have one of those.  

    Technorati Tags: ,

  • 1st Day of Christmas – Looking for Vulcans

    clip_image001

    I hope you like the card (a result of a miss spent youth at Art College) and are enjoying your well earned break.

    My 12 days of Christmas this year will take a look at what sort of free stuff you can download from Microsoft, some of which you may already know but maybe not all of them.  The series are all short videos which are intended as a bit of fun.  So without further ado here’s James O’Neill talking to me about Vulcans…

    Technorati Tags: ,
  • 2nd Day of Christmas – Brighton 360

    Although Photosynth has been widely publicised in the general and IT press I still come across load of people who don’t know what it is.  In this short video James O’Neill in fine Blue Peter tradition shows a synth (the buzz word for what you make in Photosynth)  he made earlier of Brighton Pavilion..

    Hopefully this will inspire you to have a go with that new camera you got for Christmas

  • 3rd day of Christmas – Ducks in a Row

    This blog is written using Windows Live Writer as are many of the others on TechNet and MSDN.  In this video Giorgio Sardo shows how he writes his and how to control your PC with a Samsung Omnia..

  • 4th Day of Christmas

    I love PC Gaming, so I haven’t got an XBox 360 yet.  However there’s no denying its popularity and unlike other games consoles Microsoft have released a simple toolkit , the XNA Game Studio to enable anyone to write a game for the XBox/PC/Zune.

    In today’s video Kevin Pfister takes us through this free toolset..

  • 5th Day of Christmas – Deep Zoom

    Deep Zoom actually does what it says on the tin.  However while many people have seen demos of it I am not sure many have actually tried it or know how it works and that it’s free.

    To rectify that Viral Tarpara (he’s the good looking one in the shot below) explains all in this short tour..

  • 6th Day of Christmas – Armchair Astronomy

    I still love to watch the Royal Institute Lectures at Christmas, and I am please to see Microsoft supporting them this year.  In this spirit I wanted to show the Microsoft World Wide Telescope so if you haven’t tried it yet hopefully this will convince you…

    Technorati Tags: ,

    On a serious note,  this is a very demonstrable example of Software + Services that Microsoft keeps banging on about, in this case the client does all of the rendering and caching while the service provides the rest including the up to date imagery, search, and repository of the guided tours.

  • SQL Server Advent Calendar 4 – Filtered Indexes

    Day 4 of my virtual advent calendar, about stuff I like in SQL Server 2008..

    Filtered indexes are indexes with a where clause, allowing you to index the part of the  table you are interested in. For example if a column can have nulls in you could create an index on it that only indexes the non-null values making the index still relevant but potentially a lot smaller at the same time. better yet if the remaining values are unique you can then make that a unique index.

    create unique index production.nullidx

    on production.product(code) 

    where production.product(code) is not null

    I have posted about this before, but I wanted to mention it again because there is an occasional problem with filtered indexes returning incorrect answers if there are more than three tables in a join.  It’s been fixed by a cumulative update which is here.

  • SQL Server Advent Calendar 5 – Analysis Services Performance

    Day 5 of my virtual advent calendar, about stuff I like in SQL Server 2008..

    Over 80% of the budget on improvements to Analysis Services in SQL Server 2008 went on improving performance. The answer was called block space computation and to understand this you should be aware that Cubes are often quite sparse so for example every customer doesn’t buy every product everyday (nice if they did!).  However in SQL server 2005 calculated members were applied to the whole cube wasting a lot of time and effort in the process.  More sophisticated users worked around this by using not empty predicates in queries with limited success.

    Block space computation fixes this by filtering the incoming query before calculated members are applied (the meat grinder in the diagram below) to it to remove any cells which have a default value in them. 

    image

    Thus calculations are only applied where they are needed and this just works behind the scenes without any user intervention. 

    So I would recommend testing your performance using an evaluation copy of SQL Server 2008 by simply backing up / restoring to the new server.

  • SQL Server Advent Calendar 1 – Dates

    There is no possible connection between databases and Christmas, so I simply wanted to share 24 my favourite bits and bobs that are in SQL Server 2008….

    The new date data types in SQL server crack a problem I have had since I started BI as there is (nearly) always a time dimension in every data warehouse. In many retail data warehouses the grain (level of detail) was reduced by grouping all transactions in a day.  The trouble is in SQL Server 2005, you had to create a date like 25/12/2008 00:00 as per dimTime in AdventureWorks

    image

    Now we a date datatype which also allows ius to store just the date and has implicit conversion form datetime to rip out the time.

    For more on the new date data types check books on line here.

  • SQL Server Advent Calendar 2 – Time

    Day 2 of my virtual advent calendar, about stuff I like in SQL Server 2008..

    Following on from my previous post, in some data warehouses there is a separate dimension for time of day, so that demand through a day can be modelled. Storing time in SQL server 2005 was a bit of a cludge typically involving picking an arbitrary date (like 1/1/1900) and then tacking the time on to the end of that.  Now there’s a separate time data type so it’s easy to store the right data and create the time dimension using a script like this:

    declare @time time = '00:00'
    declare @timekey int = 0
    declare @timegrain int =15

    if not exists
        (select  * from sys.tables where name = 'dimTimeofday')
    create table dimTimeofday( timekey int, TimeofDay time)
    while @timekey < 1440 begin   
        insert into dimTimeofday(timekey,Timeofday) values (@timekey, @time)
        set @time = dateadd(minute,@timegrain,@time)
        set @timekey += @timegrain
    end

    For more on the new time data type check books on line here.

  • SQL Server Advent Calendar 6 – Hierarchy Data Type

    Day 6 of my virtual advent calendar, about stuff I like in SQL Server 2008..

    Using a relational database to store structures like the hierarchy of a company or the product catalogue usually involves one of two techniques:

    1. Have a table for each level in the hierarchy with a foreign primary key relationship to represent the parent child relationship.  This works fine where the depth (number of levels) in the hierarchy is known.
    2. Have a self join in the table e.g. a member of staff would have the id of their manager in a manager id column which refer to the staff id of the manager in the same table.  This means that the number of levels in the hierarchy is not predetermined and so the schema doesn’t have to be changed if the business introduces new levels.

    The downside of this self join approach is that is slow to navigate and awkward to report on.  So In SQL Server 2008 there is a new HierarchyID data type. This stores the structure of the hierarchy in a system defined user defined type (UDT) like there is in SQL Server for spatial data.  and like spatial data there are three things you get:

    1. the data type itself
    2. functions to make it work e.g. tostring(), reparent(), GetParent(), GetDescendant() etc.
    3. indexing to make it fast.

    Of course this is one new feature which will require development time, but it is a lot faster and intuitive then self joins while having the same advantage of not requiring schema changes to reflect changes in the number of levels you need.

    If you want to try it then I have put a simple example in this separate article on my blog.

  • SQL Server Advent Calendar 24 – Upgrade

    It’s Christmas Eve and the last day of my virtual advent calendar on SQL Server 2008..

    Hopefully the last 23 posts have given you some idea of what’s in SQL Server 2008 so I thought it might be useful to conclude with a word on upgrade.  Firstly SQL Server 2008 has a lot in common with SQL server 2005 so if that’s where you are then upgrade is pretty painless and straightforward.

    Don’t panic if your still on SQL Server 2000, you can upgrade straight to SQL Server 2008, and there is a massive amount of support to help you. Probably the best place to start is the SQL Server 2008 Technical Reference Guide.  This covers all aspects of the SQL Server (DBEngine, Reporting Services, Analysis Services and Data Transformation Services/Integration Services) and provides a very detailed (490 pages) of help to ensure upgrade is painless as possible.

    There’s also more help on TechNet and a tool to discover where your servers are:

    So take the next few days to read over the guide so you can get started on your upgrade next week, unless that is, you have other plans tomorrow!

  • SQL Server Advent Calendar 7 – Partitions &amp; Indexed Views

    Day 6 of my virtual advent calendar, about stuff I like in SQL Server 2008..

    Indexed Views allow result sets to be persistent as they are materialized on physical storage which avoids performing costly joins and/or aggregations at execution time.  However in SQL Server 2005 they would have to be dropped and recreated if one of the underlying tables was partitioned and a new partition was added.

    This is fixed with the cunningly named indexed aligned partition views (that’s a Microsoft mouthful so I will abbreviate that to PAIV). This allows you to swap partitions in and out of your large table to add new data and archive older data.

    Partition-Aligned Index Views Switching

    The key to this is the ALTER TABLE  ..SWITCH statement.  There’s a really simple example to follow in books on-line on the TechNet SQL Server TechCenter.

  • SQL Server Advent Calendar 8 – Analysis Services Named Sets

    Day 8 of my virtual advent calendar, about stuff I like in SQL Server 2008..

    Analysis Services allows you to create a set of things you are interested in and persist this in the cube.  MDX is used to define the set e.g.

    CREATE SET CURRENTCUBE.[Top 10 Resellers]
    AS TopCount([Reseller].[Reseller].[Reseller].Members, 10, [Measures].[Sales Amount]);

    creates a set Top 10 resellers of the resellers having the 10 highest sales amounts

    If you do this in SQL Server 2005 then that statement would be evaluated when the set was defined and members in the set would be fixed for the life of the named set regardless of any changes to the data in the cube.  This can be a good thing but as I business user I would expect this to be dynamic e.g. if in our example a new reseller meets the rules at the expense of one of the others then I expect to see that reflected in the set in the same way as if I used a SQL view on a table with a top  X count clause.

    SQL Server 2008 has the option to create dynamic sets that behave like this and this is simply achieved by altering the create statement to

    CREATE DYNAMIC SET CURRENTCUBE.[Dynamic Top 10 Resellers]
    AS TopCount([Reseller].[Reseller].[Reseller].Members, 10, [Measures].[Sales Amount]);
     

    I like the upgrade aspect of this as the set you originally created in will behave as before in Analysis Services 2005 and you simply need to flip the dynamic switch in Analysis Services 2008 to make it behave in what I believe is the more logical fashion.

  • SQL Server Advent Calendar 9 – Management with Powershell

    Day 9 of my virtual advent calendar, about stuff I like in SQL Server 2008..

    Many SQL Server databases are managed by part time or accidental DBA’s, who have a whole bunch of other duties possibly including networking, active directory, SharePoint and Exchange.   As well as learning about what these all do part time DBA’s also have to get to grips with the different tools used to manage them. 

    This is one of the many areas addressed by Microsoft’s new scripting language, Powershell.  Bizarrely from the company that bought you wizards and tries to get you to do everything in  a slick UI, this initially looks like a throw back to the days of DOS or the Unix command shells. This is not as odd as it might seem, because although UIs are good for ‘do once’ operations,  they are not nearly so good for repeating tasks particularly if these tasks take arguments like server name, database name, today’s date etc.

    Powershell allows you to create scripts where everything you use is an object using all the power of WMI and .Net, to directly control Windows 2008, Vista, SQL Server 2008 and Exchange (SharePoint will come in Office 14). What does that mean for the DBA?

    • Manageability and repeatability
    • Unified scripting to encompass database management tasks with file operations, security permissions.
    • Using other new tools in SQL Server 2008, like scripting evaluation of policies to a table across multiple servers
    • No more reliance on VB Script and odd looking batch jobs
    • A huge amount of resources like codeplex, real books, TechNet, and the Powershell User Group run by Richard Siddaway 

    A good place to start is this http://www.microsoft.com/emea/spotlight/sessionh.aspx?videoid=984blogcast from Dan Jones on the SQL Server Management team followed by this introductory Powershell guide on Codeplex

    Finally while Powershell is built in to SQL Server 2008, you can retrofit Powershell onto older versions of SQL Server (back to 2000) using this kit on Codeplex.

    Technorati Tags: ,
  • SQL Server Advent Calendar 10 – Reporting Services Charts

    Day 10 of my virtual advent calendar, about stuff I like in SQL Server 2008..

    I have been using Reporting Services since beta 2 in SQL Server 2000, and for me the weak link was the charts which even  then looked pretty basic…

    image

    Many people resorted to using add-ons including the ubiquitous Dundas charts.  Microsoft noticed this trend and simply bought those tools and put them into Reporting services 2008. The chart below shows different axes and chart types for different data series, and the improved control there is over formatting the chart.

    image

    There are also gauges available now and at some point there will hopefully be the maps to leverage the spatial spatial data types in the SQL Server 2008 database engine.

    If you are using Dundas charts already then they'll be updated automatically to pick up the versions embedded in reporting services.  If you weren't then your charts will look the same, but will have been migrated to the new chart type as well and you can then change them if you want and usethe new features like I have show above, or….

  • SQL Server Advent Calendar 13 – Grouping Sets

    Day 13 of my virtual advent calendar, about stuff I like in SQL Server 2008..

    The SQL Group by clause has been around since I can remember, but in SQL Server 2008 we know have grouping sets.  Imagine you wanted to sum sales amount from the adventure works reseller sales table by year, by Product Category and by Year AND Product Category perhaps as the first stage in writing a report.  You also need to have all the results in one query so you end up doing something like this in SQL Server 2005:

    SELECT
        NULL AS [year],
        dpc.EnglishProductCategoryName AS Category,
        SUM(frs.salesamount)
    FROM FactResellerSales frs
    INNER JOIN DimProduct dp ON dp.ProductKey = frs.ProductKey
    INNER JOIN DimProductSubcategory dps ON dps.ProductSubcategoryKey = dp.ProductSubcategoryKey
    INNER JOIN DimProductCategory dpc ON dpc.ProductCategoryKey = dps.ProductCategoryKey
    GROUP BY dpc.EnglishProductCategoryName

    UNION ALL

    SELECT
        dt.CalendarYear AS [year],
        NULL AS Category,
        SUM(frs.salesamount)
    FROM FactResellerSales frs
    INNER JOIN DimTime dt ON dt.TimeKey = frs.OrderDateKey
    GROUP BY dt.CalendarYear

    UNION ALL

    SELECT
        dt.CalendarYear AS [year],
        dpc.EnglishProductCategoryName AS Category,
        SUM(frs.salesamount)
    FROM FactResellerSales frs
    INNER JOIN DimProduct dp ON dp.ProductKey = frs.ProductKey
    INNER JOIN DimProductSubcategory dps ON dps.ProductSubcategoryKey = dp.ProductSubcategoryKey
    INNER JOIN DimProductCategory dpc ON dpc.ProductCategoryKey = dps.ProductCategoryKey
    INNER JOIN DimTime dt ON dt.TimeKey = frs.OrderDateKey
    GROUP BY dpc.EnglishProductCategoryName,
            dt.CalendarYear

    Grouping sets makes this a far simpler exercise which also executes only once against the fact table..

    SELECT
        dt.CalendarYear AS [year],
        dpc.EnglishProductCategoryName AS Category,
        SUM(frs.salesamount)
    FROM FactResellerSales frs
    INNER JOIN DimProduct dp ON dp.ProductKey = frs.ProductKey
    INNER JOIN DimProductSubcategory dps ON dps.ProductSubcategoryKey = dp.ProductSubcategoryKey
    INNER JOIN DimProductCategory dpc ON dpc.ProductCategoryKey = dps.ProductCategoryKey
    INNER JOIN DimTime dt ON dt.TimeKey = frs.OrderDateKey
    GROUP BY
        GROUPING SETS
        (    (dpc.EnglishProductCategoryName),
            (dt.CalendarYear),
            (dpc.EnglishProductCategoryName,dt.CalendarYear),
            ()
        )

    Each row under grouping sets groups the above query in a different way  equivalent to the previous query except that I have added an extra set, the last row, () which gives the grand total,  (which shows up as  NULL,NULL, 804505969823) in the results below..

    image

    I think this plays very well with Reporting Services and for generating aggregated fact tables in large relational data warehouses, but will obviously need development work to implement.

  • SQL Server Advent Calendar 14 – Integration Services Lookups

    Day 14 of my virtual advent calendar, about stuff I like in SQL Server 2008..

    If it aint broke don’t fix it, is probably best aimed at Integration Services in SQL Server 2008. So what was ‘fixed’?  The most important thing was the performance of lookups, both in terms of speed and memory usage.  To address this there is a new lookup cache feature to allow frequent lookups from static data to be persisted to a special file (*.caw) .

    This cache need to be populated from the new cache transform task in the data flow designer:

    image

    To use this you need to tell the inevitable wizard what connection, table and columns you want to cache…

    image

    Note the 1 in the index position for the SalesTerritoryAlternateKey row to indicate this is indexed (at least one column must be indexed).

    So that’s how you populate the cache, now it can be used in the standard lookup transformation by selecting cache connection manager on the general tab for the lookup

    image

    and then putting in the cache location on the connection tab (You will be warned to do this). You can then use the columns form the cache in the same way as for a normal lookup. As ever I’ve done this in AdventureWorks so you should be able to follow this.

    This is going to be useful in applying the same (conformed) dimension to multiple fact tables, particularly if the dimension doesn’t change that often.  I would also point out that there has been a lot of work behind the scenes to improve the parallelism in integration services, which will kick in when you upgrade.