Insufficient data from Andrew Fryer

The place where I page to when my brain is full up of stuff about the Microsoft platform

May, 2008

  • Using TinyInt as keys in Analysis Services

    Fact tables should be narrow and deep.  If you are recording every debit card transaction for every customer across the fifteen retail banks you own, you end up with 15 million rows a day, so to save space they need to be as thin as possible.  If one of your dimensions can only ever have a small number of values then use tinyint as the key to this dimension.  Actually I always use tinyint instead of boolean for yes/no male/female stuff because you often need to have three or four values i.e. to include ‘not applicable’, ‘unknown’ etc. and I also break with Kimball and use special keys for these such as 0.

    OK so tinyint and smallint can save space in data warehouses, however you can run into trouble when you try and use these keys to build cubes in analysis services (in SQL Server 2005 and 2008).  This is because analysis service doesn’t support tinyint and converts the type to something else.  Unfortunately the something else it gets converted to is different for the foreign key to the dimension on the fact table (System.Byte) than for the primary key on the dimension (System.int32).  So now analysis services can’t join the two tables properly and this is quite rightly confused and annoyed a few people.

    I can’t fix analysis services to correct this and I understand the issue still exists in Analysis Services 2008, but I can tell you two approaches to get around it.

    • Change the data types of all of the tinyint and smallint in the data source view (DSV) used to build the unified dimensional model (UDM) for your cube.
    • Build the UDM up from a set of SQL Server views and not the base tables, and do the conversion there.  I used this last approach all the time in Analysis Services 2000, because there was no UDM, back then.

    The longer term approach is to voice concern on connect, where you have a much better chance of influencing the shape of SQL Server than I do as an evangelist as the product team do take customer feedback seriously. Incidentally I notice that connect is starting to look like a proper portal, so do have a look around anyway if your interested in participating in any betas.

  • Data Mining Epiphany

     

    Data Mining is like Web 2.0, everyone has their own idea of what it is, and there are loads of people out there who don’t get either of these technologies, despite them having a big impact on our lives.

    image

    Books have been written on data mining and whole conferences devoted to it. However if you are new to the term or have been confused about it in the past, I would simply refer you to these videos on TechNet Spotlight from one of the best speakers on the subject, Rafal Lukawiecki , from Project Botticelli:

    Technorati Tags:
  • Server Quest

    clip_image002

    The word is out that I am the PC Gamer in Eileen’s team. I started on a Pr1me computer that had a Star Trek game where the enterprise was 0-<= and the klingons were +—=. Later came the PC with colour and graphics, with such classics from Sierra Games as Space Quest and Police Quest and the one everyone knows and no one ever admits to playing Leisure Suit Larry .

    Microsoft have a new title of their own in this vein, Server Quest at  www.server-quest.com.  I don’t think it needs DirectX 10 or the ATi or NVidia graphics card, so see if your knowledge of IT goes back as far as mine and try it out.  It is nearly as cheesy as those old games but in this case the fromage is all TechNet related.

    Technorati Tags: ,,
  • SQL Server 2008 Community Launch

    The London SQL Server Community have a SQL Soiree on 19th June at Cardinal Place from 6-9pm.

    Map image

    This is to mark the launch of SQL Server (not RTM!) in their own inimitable style. The organisers have been smuggling swag out of Microsoft and will be giving away all sorts of useful stuff in addition to the essential beer and Pizza.

    All you need to bring is your questions and opinions (on SQL Server!) to ensure the evening is a success.

  • Deprecated Features in SQL Server 2008

    My first ever post was about the features in SQL Server 2000 that were deprecated in SQL Server 2005 i.e. they wouldn’t be available in the next release (SQL Server 2008). In the same way there are SQL Server 2005 features that will be deprecated in SQL Server 2008, they will still work and be supported, but won’t be available in the next release of SQL Server.

    The links for all of deprecated features for the different parts of SQL Server are below:

    • Database Engine

    http://msdn.microsoft.com/en-us/library/ms143729(SQL.100).aspx

    • Replication

    http://msdn.microsoft.com/en-us/library/ms143550(SQL.100).aspx

    • Integration Services

    http://technet.microsoft.com/en-us/library/ms403408(SQL.100).aspx

    • Analysis Services

    http://technet.microsoft.com/en-us/library/ms143346(SQL.100).aspx

    • Reporting Services (No features deprecated)

    http://technet.microsoft.com/en-us/library/ms143509(SQL.100).aspx

    So please consider these articles when doing development on SQL Server to avoid pianting yourself into a corner in a few years time.

  • E-Mail Signature Fun

    One of the many things about the Microsoft culture I love is e-mail signature anarchy. What do I mean well the boring bit is that there is no clue or standard to the format of our aliases so I am afryer@microsoft.com, but steve lamb is stephlam@micrsofot.com and if Bruce Banner joined we might get bruce.banner@micrsoft.com.  Of course this can end in tears with unfortunate humour when an alias crosses international frontiers for example there are a lot of guys called Randy in the US which isn’t such a good story in the UK.

    The fun bit is the signature block some people use, as there are no rules apart from  the  laws of the countries in which you are working.  To show that, here’s the sort of thing you will see in Microsoft signature blocks, that make me smile:

    One guy puts this old chestnut after every e-mail:

    THINGS TO DO IF I BECOME AN EVIL OVERLORD #150

    I will provide funding and research to develop tactical and strategic weapons covering a full range of needs so my choices are not limited to "hand to hand combat with swords" and "blow up the planet".

    How about philosophical:

    "We must learn our limits. We are all something, but none of us are everything."

    ~Blaise Pascal~

    Corporate, but I am sure the egg throwers out there will argue Microsoft is making it worse.

    clip_image001

    For the Tolkein fans I saw this from a guy in the SQL product team:

    "Nobody tosses a dwarf!" Gimli, son of Glóin, as Aragorn was about to throw him across a gap in the stairs in the Mines of Moria, The Fellowship of the Ring
    "Ok, throw me... but don't tell the Elf!" Gimli, as he asks Aragorn to throw him on to an entrance way loaded with Orcs, The Two Towers
    "That still only counts as one! " Gimli, Return of the King
    "A day may come when the courage of men fails . . . but this day, we fight! " Aragorn, Return of the King

    Frodo Lives !

    My personal favourite is this guy (Dubi Lebel) having fun with his name:

    dubil@microsoft.com

    --------------------------------

    Dubi - Or Not To Be

    To do is to be - Descartes

    To be is to do - Voltaire

    Do be do be do - Frank Sinatra

    Yaba Daba Do - Fred Flintstone

    I have tried to mix corporate with a bit of fun:

    clip_image002

    BTW the image is tiny and hopefully won’t fill up the world’s in boxes. 

    Hopefully your organisation cuts you a bit of slack so you can create an individual impression with your contacts.

    Technorati Tags: ,
  • SQL Server 2008 Filtered Indexes vs Index Views

    SQL Server 2008 has a new feature called filtered indexes which are like normal table indexes only they have a simple where clause,  meaning that the index will only cover rows specified in the where clause. As I have mentioned before an obvious example is where you can filter out nulls and still create a unique index.

    So how do filter indexes compare with index views, well index views are more complex , you can index across a view containing multiple tables for a start and the where and case clauses can be very sophisticated.  This sophistication comes at a price as the index can be larger (and the B+ Trees will be deeper)  and is will be updated during an insert whereas a filtered index will only be updated during insert if the new row meets the where clause in the filter.

    So when to use what?

    • The where clause in a filtered indexes has be pretty simple – i.e. it has to conform to these this format:

    <filter_predicate> ::=

        <conjunct> [ AND <conjunct> ]

    <conjunct> ::=

        <disjunct> | <comparison>

    <disjunct> ::=

            column_name IN (constant ,…)

    <comparison> ::=

            column_name <comparison_op> constant

    <comparison_op> ::=

        { IS | IS NOT | = | <> | != | > | >= | !> | < | <= | !< }

    For more information on the rules check SQL Server 2008 books online here 

    • The rule eliminates a good portion of the table from being covered by the index, I would caveat this by reiterating the benefits of a unique index on a column with nulls in.
    • The  rest of the time use a regular index or in indexed view.
    Technorati Tags: ,
  • Open Business Intelligence, spoilt for choice

    Let me start with a little history of Microsoft BI. Microsoft acquired an OLAP tool from a company called Panorama, project name Plato, which was bundled in with SQL Server 7.0 and from this grew the present incarnation of Analysis Services which IMHO is the crown jewel in the company's BI armory, it has allowed the creation of Performance Point, data mining in Excel, and the Unified Dimensional Model to capture business definitions and model security. 

    A common misconception is that you don’t have to use Microsoft tools to get at Analysis Services, because, like many OLAP engines it uses MDX as its query language in the same way as SQL is the query language for relational databases.  However, just like SQL, MDX comes in lots of flavours, and has been developed by each vendor so functionality will vary from Oracle Hyperion to SAP/BW etc.

    So there is a small but thriving economy of independent tools that connect to all of these engines and fulfill some niche functionality that is available elsewhere.  An obvious example is that same Panorama company with it’s product Novaview which has a very intuitive web UI. Another good visualisation tool from a Microsoft Gold Partner is Tableau:

    image

    The important thing here is that the data for these tools comes from a central source and while it may make sense to give everyone access to a cube in Excel, one of these specialised tools might be the best thing for your analysts to use, if it does what they need.

    Using an analogy  with SQL Server (the database engine) you might want to run a third party niche accounting system or ERP system for your business, but this could well run on SQL Server, which means you can expose the data in it to Excel or Reporting Services.

    My other top top  would be to get a third party tool to document analysis services such as BI Documenter, or xSQL Documenter, or the major players such as Idera, Quest, or Redgate

  • PASS Meeting 26th June

    The Professional Association for SQL Server (PASS), is holding it's next meeting on 26th June from 6-9pm.  The theme is virtualising SQL Server, in all its guises both using hardware and software. HP and Quest are going to be there and I will be flying the Microsoft flag.  So if you have an opinion, experience or just want to know more about this frequently debated topic then get along to the Bank of America offices in 5 Canada Square, (E14 5AQ):

    Map image

    The full agenda is:

    6.00 p.m.    Introduction, current news, events etc.

    6.15 p.m.    Virtualisation hardware and software including HyperV and VMWare and how SQl Server runs on virtual servers

    7.30 p.m.    Refreshment and networking break

    8.00 p.m.    Open Q&A on Virtualisation and topics for the next meeting

    9.00 p.m.    Close

    To register please e-mail Mark Howden, and please turn up early as you have to clear Bank of America security to get to get to the meeting. 

    Finally thanks to Mark Howden for organising this and to Mark Eastwood for letting us use his offices.

  • Art for Arts Sake

    andrew2

    A little while ago James discovered a fun little app (SP-Studio)  for SouthParking people.  He didn't do a great job on me last time and now I have been recast as a SQL Ninja, I am not sure if that’s a tribute to my DBA skills or because he thinks I look better covered head to foot in black.

    Eileen has also been dabbling with art and has got a massive amount of hits on her blog as a result of this post with a rotating GIF.

    However I am old school and my art is done with a pencil and paint brush like this example from my Facebook Art gallery:

    The downsides are that it took a month to do and I am not sure I will get the same number of hits as Eileen as it doesn't go round and round.

    Technorati Tags:
  • SQL Server 2008 – Annoyingly Helpful

    One of the features that has often been asked for is intellisense for T-SQL and this is now implemented in SQL Server 2008.  But one of the reasons it has taken so long is that SQL isn’t a language that readily lends itself to this sort of assistance. For example in a select statement we state what we want before we say where we get it from.  So to get the most from the feature you need to start with the from clause and word backwards to fill in the select and forwards to add where and so on:

    image

    My top tips here would be to get into the habit fully declaring what you are selecting from  i.e. myschema.mytable and never to use select * from  as both of these are slower to run and with intellisense no slower to enter.  I prefer to alias my tables, but you may wish to fully declare each column as again you won’t need to type all this any more and it could help with ease of maintainability i.e.

    Select
        Production.Product.ProductID,
        Production.Product.Name  as ProductName,
        Production.ProductSubcategory.Name as ProductSubcategoryName,
        Production.ProductCategory.Name as ProductCategoryName
    From
        Production.Product
    inner join
        Production.ProductSubCategory
        on Production.Product.ProductSubcategoryID = Production.ProductSubcategory.ProductSubcategoryID
    inner join
        Production.ProductCategory
        on Production.ProductCategory.ProductCategoryID = Production.ProductSubcategory.ProductCategoryID

    rather than the more traditional

    Select
        Prod.ProductID,
        Prod.Name as ProductName  ,
        Sub.Name as ProductSubcategoryName,
        Cat.Name as ProductCategoryName
    From
        Production.Product Prod
    inner join
        Production.ProductSubCategory Sub
        on Prod.ProductSubcategoryID = Sub.ProductSubcategoryID
    inner join
        Production.ProductCategory Cat
        on Cat.ProductCategoryID = Sub.ProductCategoryID

    Another slightly confusing thing I cam across was if you try and modify the data type of a column, you get an error message in CTP6:

    image

    This is caused by the default options in Management Studio:

    image

    So do watch out for these subtle changes in the latest CTP, they might seem like a pain but they are trying to help.

    Technorati Tags:
  • Are we spending too much time with Excel?

    Despite my love of SQL Server and my respect for Oracle, and MySQL the worlds number one database is excel.  I know that many purists and DBA’s will shudder at me grouping a spreadsheet with a database, but look at this form the perspective of number of hours of users using excel and the volume of data and I am sure Excel would be in pole position.

    The problem is that while our users love it IT Professionals constantly struggle to cope with the volume and complexity of data stored in it.  IMHO Excel services, Office Live groove etc. isn’t necessarily the answer, as all of these technologies simply put the spreadsheet in safer and more discoverable places, such as SharePoint, a server or the cloud.

    The problem are about audit, accuracy and integration of the data:

    Audit.who changed what and when?  in a database proper that can be tracked, in excel compliance is far harder problem to crack.

    Accuracy. Do the numbers add up.  My wife is a forensic accountant and in her role she spends days checking and rechecking spreadsheets for accuracy for use in court.  This is a nightmare for her as her office don’t use any standards and each of her colleagues and external members of her team from the major accountancy firms all work differently.

    Integration.  Getting data in and out of excel is very easy, so why is this a problem?  Well what part of a spreadsheet do you want to get out or put in?

    So what’s a smarter way of doing things? Keep your data in a database, be it an accounts system a data warehouse , ERP etc. and query the data from excel, either natively or via the numerous add ins from Microsoft and other BI vendors.  Excel is then used for add hoc analysis but not for long term data storage.

    So why isn’t this happening?  Users, notably accountants don’t want to give up what they see as their freedom to be flexible.  They haven’t been open to the arguments above until quite recently, but they are beginning to see the light.  If that’s not the case where you work, try waggling this article from Accountancy Age under their nose.   The site is also quite a good resource to see what your users are reading about.

  • Vista sp1 vs XP sp3

    When I am not playing with SQL Server, I play PC games, such as Gears of War and Flight Simulator X, although this happens on my home gaming rig and not the office laptop. My personal PC runs 64 bit Vista, but have I shot my PC in the foot by running my games on Vista instead of XP?

    Apparently it makes no difference according to this article from Extreme Tech.  They tested both OS's against industry benchmarks and specific games like Crysis:

    To quote the article:

    "If you were expecting a huge drop in performance as your eyes scanned from the XP to the Vista results, well, surprise! As many a tech analyst predicted, Windows Vista's gaming performance conundrum has largely been solved, and it was mainly due to early graphics drivers."

    Of course the bigger question is how much longer PC gaming can exist compared to XBox and PS3 which have high definition graphics and cost less than a high end graphics card needed to play modern high end games, but if you're like me you can use Vista for business and home and not lose productivity or performance.

    Technorati Tags: ,,
  • Primary keys and Clustered Indexes

    One of the many great SQL Server resources out there is SQL Server Central, and this article on there, Worst Practices - Not Using Primary Keys and Clustered Indexes, by Andy Warren caught my eye.  For the most part he’s absolutely right you’d be mad not to have a primary key on a table and to leverage the power of clustered index. BTW don’t assume that Andy is saying that the primary key is the clustered index, You can only have one of each on a table but they often shouldn’t be on the same column.

    However I can think of one scenario where this is not a good idea, and that is in a large fact table in a data warehouse.  If you feel you need a primary key, the natural temptation is to make on by concatenating the foreign keys to the dimension tables.  But this primary key has little real value because it won’t take part in any but a few queries, and will slow down the loading of this fact table, indeed I have often removed the all fact table indexes while populating it so any primary key wouldn’t be enforcing referential integrity.  For data warehousing, Referential integrity is done by the ETL process. 

    If a primary key is needed for a a business reason, for example because you need one to create a Reporting Services Report model, then I would suggest you create a new identity column for this purpose and use that.

    Clustered indexes on fact tables can also be of little value as they can only work on one of the foreign keys to one of the dimensions, while they can seriously slow down the loading of the fact table.

    So data warehouses are a special case, and for the most part do encourage force your developers and designers to use these two features.

    The other thing I would encourage you to do is to check out the various community sites and forums out there, someone will have done it before or have an answer to your problem.

  • SQL Server 2008 Back to Basics

    With all of the BI stuff in SQL Server , the new data types like filestream and geography, it is is good to get back to basics like performance, OLTP and administration. So here’s a list of white papers for the classic DBA:

    So find a shady tree a nice cup of Earl Grey and read away:

    IMG_2672

    Technorati Tags: ,,
  • SQL Server 2008 Performance

    It is already possible to get an indication of how well SQL Server 2008 will perform before it is released as there already some TPC benchmarks published for it.  The performance of Integration Services has been separately tested, as have a number of applications such as Microsoft Dynamics and SAP.  The links to all of the tests can be found here

    These figures should improve slightly as all of the code is cleaned up closer to release, but this work has been possible because of the very different way that SQL Server 2008 was developed.  Each feature was only included in a given CTP when it was pretty much working. This has also made early adoption a lot less painful as by and large a feature has remained unchanged once it was included in a CTP.

    Of course your applications are all different and although benchmarks try and emulate the real world, they are no more than indicators like 0-60mph times for cars

    Technorati Tags: ,,

    .  So try the latest SQL Server 2008 CTP on your applications and see what it does for you.

  • Virtualized SQL Server

     

    [Note the information in this post is now way out of date, so please refer to my post here instead..

    http://blogs.technet.com/b/andrew/archive/2008/11/08/sql-server-2008-at-teched.aspx

    Thanks Andrew]

    I get more and more questions about SQL Server running in a Virtual machine so here’s 3 FAQs to start with…

    1. Is there Support for SQL Server on non Microsoft Virtualization platforms (e.g.VMWare)? The definitive document on the is here.  In summary Microsoft doesn’t test this or support it, unless you have premier support in which case “Microsoft will use commercially reasonable efforts to investigate potential issues with Microsoft software running together with non-Microsoft hardware virtualization software”.

    Of course the supported versions SQL Server (i.e.not 2000) and other Microsoft applications (Exchange, SharePoint etc.) will be fully supported on Hyper-V when Hyper-V is released. 

    2. How does licensing work?  The exact terms are in this white paper, and in summary

    • For processor licensing each physical or active virtual machine running SQL Server must have a processor license (the number of chips not the number of cores) for each processor the virtual machine uses.  However if you have licensed enterprise edition for all of the processors on a physical machine you can run any number of SQL Server virtual machines on that box.
    • For CAL licensing each physical or active virtual machine requires the server license.  The exception to this is enterprise edition which just needs to be licensed per physical machine

    Note that the licensing applies to SQL Server 2008 as well and to running SQL server on Microsoft’s Hyper-V platform in Windows Server 2008.

    3. Performance. This is very easy to answer but not very helpful… Microsoft can’t test and publish how well SQL server performs on VMWare for example so I simply don’t know.  On Hyper-V this can be done as soon as Hyper-V is released.  There is always going to be some loss of performance and this will be the price you pay for improved manageability.

    Finally don’t be put off by Virtual PC or VMWare demo’s you may have seen on laptops, you should get a better virtual experience using any virtualisation platform in a production environment, but the performance loss is going to vary depending on the physical infrastructure (SAN, number of Cores RAM etc.) and the workload you are virtualizing.

     
    Update:
    Since writing this post Microsoft has introduced the Windows Server Virtualization Validation Program (SVVP).  Essentially this means that the currently supported versions of SQL Server are now fully supported on Hyper-V, VMWare and other leading virtualisation platforms as part of this program.
  • Better Data with Metadata

    Metadata (data about data) is becoming increasingly popular not just to meet ever more stringent auditing requirements for business intelligence projects, but also to enable accelerated development and maintenance. 

    The challenge is to deliver rich metadata as part of a  BI project with minimal overhead and to keep it up to date.  So I was pleased to find this white paper on how to use SQL Server Integration Services (SSIS), to make this as painless as possible.

    The idea is to have a dedicated metadata repository which drives programmatically driven SSIS packages like this:

    Click here for larger image

    I think this paper will have an even bigger impact than the project Real resources that came out with SQL Server 2005 and as definitely worth half an hour of your time to understand the principles and approach.

     
    Note (20/04/2009): this post has been superceded by a later one, http://blogs.technet.com/andrew/archive/2009/04/20/metadata-in-the-microsoft-bi-stack.aspx as there is a new metadata toolkit available at
  • Windows Server 2008 and SQL Server 2005

    One way to speed up SQL Server 2005 is to upgrade the underlying operating system to Windows 2008. Why? My top three reasons would be:

    1. The network stack is substantially faster so you can get the data to the users faster.
    2. Windows server 2008 like SQL Server 2008 has a resource management capability so you can ensure SQL gets and keeps the memory it needs  without something else pinching it.
    3. Even though SQL Server won’t run in core because of it’s dependency on the .Net framework, you can install a really thin operating system with just what you need on it.  This reduces attack surface, the need for patching and makes more resources available for SQL Server.

    However be aware you might need some hot fixes so check these articles before you try it as SQL Server 2005 may not install cleanly on the new OS:

    Update for Windows Server 2008 for Itanium-based Systems (KB950636)

    http://www.microsoft.com/downloads/details.aspx?FamilyID=ad44cb1c-2e36-45b4-bf4c-e0b01b12b233&DisplayLang=en

    Update for Windows Server 2008 x64 Edition (KB950636)

    http://www.microsoft.com/downloads/details.aspx?FamilyID=a0d43131-bb2c-4f46-b2f0-b90b14865948&DisplayLang=en

    Update for Windows Server 2008 (KB950636)

    http://www.microsoft.com/downloads/details.aspx?FamilyID=fb0ee17e-96eb-4cbb-ac09-95a4dcf73077&DisplayLang=en