Insufficient data from Andrew Fryer

The place where I page to when my brain is full up of stuff about the Microsoft platform

February, 2008

  • SQL Server 2008 Sparse Columns

    Sparse Columns are another new feature of SQL Server 2008 and are included in the February CTP (CTP6). They pretty much do what they say on the tin; offering a trade-off between taking more space to hold data, but none at all when they are empty.  They don't get you over the 1024 column limit, but could mean you can squeeze more columns into the 8,060 byte row limit for SQL Server.

    Like everything in SQL Server you need to know when they add value and when to avoid them like the plague. Fortunately one of the non-sparse areas of Books On-Line is the section covering sparse columns here

    So the good news first:

    • Storing a null in a sparse column takes up no space at all.
    • To any external application the column will behave the same
    • Sparse columns work really well with filtered indexes as you will only want to create an index to deal with the non-empty attributes in the column.
    • You can create a column set over the sparse columns that returns an xml clip of all of the non-null data from columns covered by the set.  The column set behaves like a column itself. Note: you can only have one column set per table.
    • Change Data Capture and Transactional replication both work, but not the column sets feature.

    And the downsides.

    • If a sparse column has data in it it will take 4 more bytes than a normal column e.g. even a bit (0.125 bytes normally) is 4.125 bytes and unique identifier rises form 16 bytes to 20 bytes.
    • Not all data type can be sparse: text, ntext, image, timestamp, user-defined data type, geometry, or geography or varbinray (max) with the FILESTREAM attribute cannot be sparse. (Changed17/5/2009 thanks Alex for spotting the typo)
    • computed columns can't be sparse (although sparse columns can take part in a calculation in another computed column)
    • You can't apply rules or have default values.
    • Sparse columns cannot form part of a clustered index.  If you need to do that use a computed column based on the sparse column and create the clustered index on that (which sort of defeats the object).
    • Merge replication doesn't work.
    • Data compression doesn't work.
    • Access (read and write) to sparse columns is more expensive, but I haven't been able to find any exact figures on this.

    As you can see from Books On-Line there is a really useful guide to when to use them for a particular data type e.g. if more than 64% of your values are null in an int column then use sparse columns, and basically the longer the data type the lower the threshold for using sparse columns.

    So how does it work? Just put the keyword SPARSE into a create table statement:

    CREATE TABLE CustomerDemographics

    (CusomterID int PRIMARY KEY,

    Gender varchar(7) NOT NULL,

    EducationLevel varchar(20) SPARSE NULL,

    SalaryBand varchar(10) SPARSE NULL)

    Selects against this table will work exactly as for normal columns whether the sparse column is included as a column in the select column or a filter in a where clause.

    Optionally to create a column set for this table append this to the end of the create table statement:

    DemographicSet XML COLUMN_SET FOR ALL_SPARSE_COLUMNS

    The column set DemographicSet is then treated like any xml column i.e. it can be selected and also be used for updates and inserts, Note if you do use a column set for updating data sparse columns not specifically declared in the update well be set to null.

    Finally if you are wondering why this feature was developed, the simple answer is to support future versions SharePoint which was also one of the drivers behind FileStream.  I can see it being applied to any content management system over SQL Server and also as I have mentioned before for reducing the overhead of storing customer demographics or product catalogs where not every column applies to every product or customer.

  • Management Reporter - the new FRx

    image 

    One of the worst reporting headaches is producing output for the finance department.  The reports need to meet strict standards, be penny perfect, and the end user always wants the report to be in excel.  With reporting services in SQL Server 2005 this could be done by the IT department but the real solution is to let the finance types do it for themselves.  Performance Point went some way to addressing this and for those customers with Dynamics there was FRx.  So three ways of doing things - a typical Microsoft approach.

    Then very quietly, in fact so quietly that you would only know about it by searching for it by name, Management Reporter quietly launched down the Microsoft slipway on 29th January.  Management Reporter is a free add-on to Performance Point users with software assurance. It's specifically designed for finance users to do reporting in Excel.  It has a lot of financial intelligence built in and support for regulatory compliance e.g. Sarbannes Oxley (SOX), International Financial Reporting Standards (IFRS), and Generally Accepted Accounting Practices (GAAP) to name a few.

    Unlike FRx, your accounts don't have to be in Dynamics to use it and you can save the design as a reporting services report so information can be disseminated to other parts of the business who don't have Performance Point.

  • SQL Server 2008 Resource Governor in action

    One of the biggest headaches in SQL Server 2005 is trying to manage resources where there are competing workloads on a shared server. A typical approach is to use instances, but this completely hides memory and CPU from other instances which are then not available to an instance under stress whether the other instances are using them or not.

    The resource governor in SQL Server 2008 fixes that and I have a short blogcast here on TechNet Edge to demo how this works...

     image

  • SQL Server 2008 & Virtual Earth

    There's a lot of buzz around Software plus Services (S+S) from Microsoft and others.  It's all very simple as Eileen will tell you -  it's when you use a service like GPS that's remote and public and combine it with some software like CoPilot or TomTom on your GPS to find directions. 

    In a similar vein SQL Server 2008 can store geo-spatial data but that's of no value to a business unless they can see it and the best way to do that is on a map.  If you don't have mapping software of your own then you can use a web service like Virtual Earth.

    Johannes Kebeck a Microsoft Virtual Earth expert spent a morning in Visual Studio 2008 to produce an application for an insurance company which I have demonstrated in this blogcast:

     image

  • SQL Server 2008 Geo-Spatial functions and indexes

    SQL Server 2008 enables the use of geo-spatial data or location intelligence in four ways:

    • a new geography data type (there's another one for 2D geometry), which can store points, lines, line segments and polygons
    • A rich set of functions to make use of them, e.g. whether they intersect, how far are they away etc.
    • a special indexing mechanism to enable the function to perform quickly
    • Virtual earth integration so your users can see the results on a map without resorting to any extra software investment, while keeping their data in a secure database.

    Below is a short blogcast to show how these come together following on from my last two blogcast's on this interesting aspect of SQL Server 2008...

    image

    Technorati Tags: ,
  • SQL Server 2008 Filtered Indexes

    I have seen the filtered indexing in SQL Server 2008 mentioned a couple of times over the last couple of weeks but I only got it properly when I got to try it myself in ctp6.

    I have often needed to constrain a column to a only allow unique values but also allow the column to allow nulls and with filtered indexes you can put in a where clause like this ...

    create unique index production.nullidx

    on production.product(code) 

    where production.product(code) is not null 

    So the filter takes out the nulls.  Apparently this is also really useful if you are migrating your database from another provider to SQL Server.

    Technorati Tags: ,

  • SQL Server Schema Design

    I have seen some strange schemas in my time which look like a good idea on paper but not on disk. A common scenario is the schema that is created by a tool controlled by a user and so we end up with columns like user21 in usertabel7 and so on. Then I got an e-mail about the poor performance of user defined functions (UDFs) following Simon Sabin's talk on the CLR and how good they could be for replacing T-SQL UDF's at the SQL Community event last Thursday .

    The database involved in the problem had several tables where generic columns where used for different purposes. Taking customer as an example there was a separate table CustomerStrings that looked like this:

    Column Type
    CustomerID int not null PK
    CustomerStringTypeID int not null
    StringValue varchar(50)

    CustomerID is foreign key to the main customer table and CustomerStringTypeID is another foreign key, this time to CustomerStringType which has an ID and a description, with values in the description such as  'Fax', 'Mobile', 'e-mail' etc.

    So the CustomerString table would have multiple rows for the same customer for example a row for Fax, mobile, email and so on.  The good thing about this is that rows would only be created where a given attribute for a customer was used and so would be efficient on space.

    Getting back all the e-mail addresses for a customer wouldn't be too bad either depending on how this was implemented and this is where the developer tried to implement a generic solution with a UDF:

    CREATE FUNCTION dbo.GetCustomerString(@customerID int, @customerStringTypeID int) RETURNS varchar(50) AS BEGIN

    DECLARE @returnValue varchar(50);

    SELECT @returnValue = CustomerStrings.StringValue

    FROM CustomerStrings

    WHERE CustomerStrings.CustomerID = @customerID

    AND CustomerStrings.CustomerStringTypeID = @customerStringTypeID

    RETURN @returnValue

    END;

    The problem arises when this function is used in-line on multiple occasions to get three attributes for every customer:

    SELECT

    cst.LastName,

    dbo.GetCustomerString(Customers.CustomerID, 1) AS Fax,

    dbo.GetCustomerString(Customers.CustomerID, 2) AS mobile,

    dbo.GetCustomerString(Customers.CustomerID, 3) AS email FROM dbo.Customers cst;

    So the UDF gets called three times for every row in the customer table which is always going to be much slower than using a set based operation for example:

    SELECT

    cst.LastName

    fax.StringValue as Fax

    mob.StringValue as Mobile

    eml.StringValue as email

    FROM dbo.Customers cst

    LEFT JOIN dbo.CustomerStrings fax ON cst.CustomerID = fax.CustomerID AND fax.CustomerStringTypeID =1

    LEFT JOIN dbo.CustomerStrings mob ON cst.CustomerID = mob.CustomerID AND fax.CustomerStringTypeID =2

    LEFT JOIN dbo.CustomerStrings fax ON cst.CustomerID = eml.CustomerID AND eml.CustomerStringTypeID =3;,

    However maintaining all the code for the possible combinations of attributes required is a big headache for this company so how can they get the speed of the second query without writing mountains of code.  In my opinion the schema is the problem here.  I would either create sufficient columns in the customer table for all the attributes required or using another group of tables with one row per customer to store contact details etc. Of course if there loads of columns required (as in SharePoint) many of which are usually empty then it's time to have a look at Sparse columns in SQL Server 2008 and that is what I will be posting about next..

  • SQL Server 2008 CTP6 now available

    CTP6 is now available for download here. This has all the functionality of the final product and amongst the new stuff for this CTP are:

    • Data compression.
    • Integrated full text search - indexes are now part of the database making for easier backup and faster mixed mode queries.
    • Filtered indexing -  indexes with a where clause.
    • Partition aligned indexed views
    • Policy based management (the feature formerly known as declarative management framework)
    • Sparse columns so you can many more columns in a table
    • Gauges in reporting services

    One thing to note is that the link above still looks like the November CTP but does actually link to the February one, and the TechNet and msdn pages will change over the next day or so to reflect this.

    Technorati Tags: ,
  • Analysis Services and Excel Pivot Tables

    Excel is the number one business intelligence tool;  It fronts Hyperion, Cognos Financial Planning, and of course SQL Server Analysis Services and Performance Point Server Planning. In Office 2007 there is now excel services so that information workers can park their work on SharePoint where it can be properly secured.   

    One issue I have seen is where you decide to publish an excel worksheet to excel services with a connection to an SSAS cube in it.  By default the worksheet will cache the data the user sees when they publish it as well as the layout and the connection to the relevant cube.  That might not be a good idea if the creating user has complete access to the underlying cube, while the intended users of the cube, have very restricted permissions. In this scenario when a restricted user opens the worksheet from excel services they will see everything that the creating user saw when they saved it.

    It's simple to correct this behaviour, but very difficult to find where to do this, and in my opinion the default behaviour should be the other way round! Anyway all you need to do is to force a refresh of the data when opening the file in the connection properties dialogue:

       image

    This situation also applies to surfacing an excel worksheet in Performance Point Server.

    Another issue with Excel services is that it doesn't support the publication of worksheets with the new data mining add-ins installed, and I am assured that this will be fixed as part of future changes to excel services rather than by the new version of the data mining add-in that will ship with SQL Server 2008.

  • SQL Server 2008 BI, Perception & Reality

    image

    Following all the acquisition mania in the BI marketplace last year, it's interesting to see that the Microsoft approach of growing your own BI platform seems to be paying off according to this article by Gartner in Information Week.

    It has taken a while though - I well remember my first foray into olap services in SQL Server 7.0, we were using a market leading olap tool to build a cube from a fact table with 75 million rows in for an electricity company. It was taking 26 hours to build and then failed and we used the same hardware and SQL Server 7.0 olap services to build the same cube and it successfully completed processing after three hours, and because it was an open solution we were able to use the same front-end for the users.

    That was in 1999 and since then there has been loads of work on the Microsoft BI platform so by last year I worked on a project that could load 15 millions rows of new fact data a night into a cube with two years of history.

    In the latest version there aren't the big changes that we saw moving from 2000 to 2005 as the hard work was done then.  However as I have posted before the key things are analysis services 2008 are:

    • Help and advice to get the optimal design, namely highlighting this using blue squigglies under the offending items.
    • Better wizards and visualisations such as the  attribute designer and aggregation designer to help you be more productive
    • Improvements to the processing for sparse data sets, i.e. where the majority of the set is a default value (not necessarily null) using what is termed block space computation 
    • Dynamic sets, i.e. your top ten customers will be your top ten customers every time the set is evaluated.  This is better than SQL Server 2005 where the rule to derive the set is only evaluated when the set is created.

    The key white paper on these new SQL Server 2008 analysis services features is here.

  • Compliance in SQL Server

    I often get asked about whether SQL server is compliant with some standard or other e.g. Sarbannes-Oxley (SOX) , Data Protection etc.  The answer is yes, but there is no flip switch in SQL to turn any of these on , and as anyone knows who's tried it turning on c2 security is a great way of slowing your data base down, so caution is advised here.

    For SQL Server 2005 there are configuration packs for SOX here  and for the European Union Data Protection Directive (EUDPD) here.  However these packs are applied to System Center Configuration Manager 2007 (SCCM) as these they are a set of  configurations which can be used to evaluate compliance and then sent into SCCM for reporting, so you are going to need to invest in this technology or take the more expensive option of going around every server in your organisation and checking each one in turn manually. 

    For SQL Server 2008 this approach will continue, but is much easier as there are a whole bunch of new features to make compliance a lot easier and my top three would be:

    • Transparent Data Encryption (TDE) to encrypt the data at rest.
    • Change Tracking and Auditing to find out who did what and when (Why is proving much more difficult until we can use TSQL to read minds!).  These are much lighter weight than profiler and trace and so can be left on without significant performance degradation.
    • Policy based management (which was called the declarative management framework) allows the DBA to set conditions on any configurable facet of SQL Server and then decide what to do if they are broken or someone attempts to break them.

    Of course the rules of the game will also change as new legislation comes in, usually following hot on the heels of the latest compliance fiasco, so I wouldn't be surprised to see something bubbling up from the UK government's scant regard for the protection of personal data or the fallout from Societe Generale.

  • TechNet Launch Road Shows

    image

    The next round of road shows are going to focus on Windows Server 2008 and Vista sp1 so no SQL Server 2008 this time.  But no peace for the wicked like, many dba's, when not doing my day job I have to get my head around all of the other stuff that sits around the database, such as SharePoint, the operating system and networking.  So I will be on stage with the rest of my compadres talking about the new ways we can support the branch offices  that exist in many organisations.

    In order for you able to see all the demos including James' 8pt PowerShell scripts Georgina has booked up cinemas up and down the country. Also we're adding Cardiff to the list this time, at the expense of Reading, so remember to get on the right platform if you are coming form Swindon or Bristol! 

    The day pans out like this:

    09:00 – 09:30 Registration
    09:30 – 09:45 Keynote

    Consider the way you work today. Think about what’s changed in the last few years. You can be sure that there are plenty of changes along the way. How can you best position yourself to help your business make the most of change and improve your career at the same time?

    During this session we well take a look at the opportunities offered by key technologies including software plus services and innovative approaches such as virtualisation and Infrastructure Optimisation. Change seems to be the only constant in our industry. Keeping ahead of the curve brings benefits to everyone. This session will help you plan for the next generation of infrastructure and your own personal development.
    09:45 – 11:00 Windows Top Features

    So many features. So little time. Where should you focus? This session will help you to decide where to invest your time and effort.

    People’s expectations are changing. They expect to be able to work “their way”, free of the limitations of being “tied to a desk” or a particular device. Minimising the risk of security breaches while enabling flexible working can create interesting challenges. Windows Server 2008 can help – enter Network Access Protection.

    Thankfully the days of one line of business application per server are numbered thanks to virtualisation. Lazy servers waste power, data centre space and maintenance charges. Windows Server 2008 can help – enter Hyper-V.

    Traditionally ensuring consistent configuration of your web servers can be time consuming. Windows Server 2008 can help – enter IIS 7.
    11:00 - 11:20 Break
    11:20 – 12:35 How to Manage Windows Server 2008

    Where should you use Windows Server Core instead of the full installation of the operating system? If you use “core” then how on Earth are you supposed to manage it? Regardless of whether you use “core” or “full” where should you use Powershell and where does Windows Server Manager make more sense?

    We’ll show you!
    12:35 – 13:25 Lunch
    13:25 – 14:40 How to Deploy and Manage the desktop? What’s in Vista SP1?

    One desktop no cry? As anyone who’s ever managed a large number of desktops knows it’s well worth making the most of automation to ensure consistent user experiences and reduce support costs. We’ll dive into the range of operating system features available to you including WDS and WSUS plus we’ll investigate MDOP and SoftGrid’s application virtualisation.

    Waiting for SP1 before deploying Vista? Wait no more! We’ll open the lid and explain exactly what’s in there.
    14:40 - 15:00 Break
    15:00 – 16:15 How to overcome the challenges of Small Office AND branch offices

    It’s often less likely anyone would notice a server going “missing” in a small or remote branch office. These servers hold replica copies of user account credentials and your production data. Windows Server 2008’s Read Only Domain Controller features and BitLocker on the server can help!

    Moving to a World where small/branch offices connect via the Internet to your main server resources can reduce both your attack surface AND expense/complexity. Windows Server 2008’s much improved Terminal Services can help!
    16:15 – 17:15 Q&A with the full team of presenters throughout the day
    17.15 Close

    Like Jimi Hendrix we love feedback so step 1: come along and step 2: tell us what you think.  As usual you need to register here, but it is filling up following on from the launch event at the ICC in Birmingham on 19th March.   

  • TechNet Mile High Club

    British Airways not only run Vista for the in flight entertainment system, They also serve up TechNet on it...

    image

    I am not sure if this is an exclusive service on the Heathrow - Seattle flights or just part of the normal service.  I also noticed that the keynote guest speaker at Techready 06, Michio Kaku, (one of the co-founders of string theory) also had a slot in this IT Pro orientated system as well.

    image

    Of course it might be cheaper to subscribe to TechNet than get it for free this way!

    Technorati Tags:
  • SQL Server 2008, 9 or 10?

    I am in soggy Seattle this week with a bunch of like minded SQL squirrels, so our super Tuesday was watching the product teams including, Ted Kummert, Donald Farmer and  Christian Kleinerman putting the latest refresh of SQL Server 2008 through its paces.

    One thing that made me smile as Ted was going through the history of SQL Server was that everyone complained that there was such a big gap between SQL Server 2000 and SQL Server 2005 and now everyone is complaining that SQL Server 2008 is coming too soon after SQL Server 2005!  So if you are in the latter camp hopefully the fact that the product is now scheduled for Q3 2008 rather than Q2 will make you feel a little happier.

    Technorati Tags:
  • SQL Server 2008 Geometry Data

    One of the problems with our TechNet road shows is getting at the content after the event.  Of course we do provide the slide decks, but these can be abut thin as many IT Pro's (including us!) are power point averse, preferring see stuff in action.  We can record it all but I am not sure anyone  is going to find an hour long demo much use so one of my new year's resolution is to do a few blog casts to show particular bits of SQL Server 2008 work.

    So following on from my last blogcast on Friday I thought it would be good to do another one to show what's happening behind the scenes using a simple geometry based example..

    image 

    Technorati Tags: ,
  • Performance Point Evolution

    I thought you might find this article in the Data Warehousing Institute (TDWI) interesting.  Basically Stephen Swoyer has discovered that users like Performance Point because of the way it acts as a piece of glue to bring all of the Microsoft BI stuff together rather than prescribing an approach.  Integration with Excel is of course top of this list, but every BI vendor, even open source olap tools like palo have that. What makes Performance Point a little bit different is its use of Excel services and SharePoint to deploy and manage Excel, as well as report design in the new Management Reporter I mentioned in my previous post

    Another odd thing is that early adopters don't seem to mind that the product has a few rough edges, they trust Microsoft to evolve Performance Point in the same way they did with SQL Server Analysis and Reporting services, listening to them in the process.

    How does that listening process work? Well the most common routes are:

    1. Post your request/bug on Microsoft Connect, and then get the world and his wife to vote for it
    2. Work for a customer who gets to participate in customer clinics, but these tend to be US companies for the most part.
    3. Get a job at Microsoft.
    4. Get elected as a European Union official and bring out some legislation.
  • TechNet Spotlight

    image

    I am like c# - I am event driven. I like to meet people and I like to present or be presented to in person.  Last night I watched one of our MVPs, Simon Sabin, give us (the UK SQL Community) the DBA's guide to the CLR - when to use it and when not to. Great stuff and you get free beer and pizza as well.

    However I do understand that evenings are precious and we all have busy lives, so sometimes you just have to watch stuff off line.  I have just discovered TechNet spotlight; which is packed with recordings from most of the big IT Pro events over the last coupe of years.

    However there isn't anything on SQL Server 2008, although 2005 returns about 30 hits and even SQL Server 2000 has 2! So I have mainly been using it to mug up on stuff that isn't SQL Server such as System Center, and virtualisation in all its forms.

    We also have TechNet Edge which is content made by people like me as well as MVP's and tends to focus on what people are doing with our stuff although I could only find a couple of SQL Server posts on here.

    So if you are doing anything cool with SQL server and you want to share it, either in person or as a blogcast please get in touch.  The SQL community would welcome your ideas for sessiosn at their meetings even if it's only a short slot, and I would be happy to help you get your blog casting career launched.

    Technorati Tags: ,,
  • TechReady - Vista sp1 Testing

    The geeks in Microsoft have taken over down town Seattle for the week as we all get our six monthly fix of the latest stuff that's coming out.  This is one of the ways that Microsoft has to keep it's technical people happy and it certainly works for me.  We get to meet the product teams and network with all the different sorts of communities we have , such as Eileen's women in IT, the bloggers, SQL experts etc. 

    There also lots of social events and on Tuesday night all the evangelists took over a game-works an amusement arcade, which I have to say I found great fun but not as engaging as PC Gaming.  So my favourite evening of the week is when the entire Hands-on-Labs area (some 1500 PCs in one football field size room) is setup for a bit of a LAN Party.

    Obviously what we actually do in Redmond stays in Redmond, but what's this all got to do with Vista sp1?

    The whole party and all of the Hands on Labs is run on PCs with sp1 installed on without incident. 

    Technorati Tags:
  • SQL Server 2008 ctp6

    I have been to lots sessions on SQL Server 2008 at Microsoft's internal technical training fest (TechReady) over the last couple of weeks, and I wanted to post about but the new for CTP 6 stuff , but only when you can try it for yourselves.  I wondered if there was any buzz in the public domain around CTP6 while I have been off the radar and this looked good..

    image

    ..until you read the date of October 16 2008!

    So the good news is that Live Search does hit MSDN, and TechNet OK, the bad news is that there must be tons of stuff on these sites that might not be relevant any more, and the really bad news is that I still can't talk about how good full text search in SQL Server 2008 yet doh!

    Technorati Tags: ,,