Insufficient data from Andrew Fryer

The place where I page to when my brain is full up of stuff about the Microsoft platform

January, 2008

  • Many to Many Dimensions

    One of the many cool things that's already in Analysis Services in SQL Server 2005 (SSAS) is the concept of many to many dimensions.

    Suppose I have 3 bank accounts and one of those I share with my wife, and on another I am co-signatory for a local charity. This means there is a many to many relationship between me and those accounts.  In the relational world this is typically dealt with using a bridging table - which in this case would have columns with AccountID (to reference the account table), AccountHolderID (to reference the signatories) and a few other fields such as SignatureImage, etc.

    In analysis services many to many dimensions work in a similar fashion,  but where in the relational world they would be linked directly to each other, in the multi-dimensional world, the two dimensions in our scenario would be linked by the use of an intermediate fact table, joined to the main fact table.  There's an example (as always!) in the Adventure Works sample SSAS solution, where dimSalesReason links to the main fact table (FactInternetSales) via an intermediate fact table:

    image

    Many to many dimensions can be useful to improve the performance of distinct counts, and to carry out basic basket analysis (customers who bought X also bought Y). Mark Russo has an excellent white paper on this whole area running to 84 pages of step by step instructions to get you started.

    However this technique comes at a price and can cause more performance problems than it solves.  A new whitepaper came out late last year and discusses the pros and cons of this approach and how to get the best out of it, careful use of keys and aggregation design.

  • SQL Server for Data Mining

    SQL Server 2005/2008 has a number of data mining algorithms built in to discover hidden relationships in your data or to use a set of attributes to predict the value of other attributes:

    • Microsoft Association Algorithm. Used to work out which items are most likely to appear together in a transaction, so this is the classic basket analysis you might see on Amazon (customers who bought this all bought..).
    • Microsoft Clustering Algorithm.  Groups similar data items together. For example in banking there are lemons (poor customers) and cherries (good ones) and this algorithm will do that segmentation.
    • Microsoft Decision Trees Algorithm. used to predict the value of a discrete (as opposed to a continuous) item e.g. whether a customer will buy a new car might or not.
    • Microsoft Naive Bayes Algorithm Also predicts the value of an item but this works for continuous values such as the probability of a customer making a purchase. The naive indicates that there is no interdependency between the attributes used to predict he outcome e.g. gender is not related to age.
    • Microsoft Neural Network Algorithm (SSAS) . This is supposed to mimic the way the brain works and is the scariest algorithm because there is no way to see how the answer was arrived at, and is used complex situations such as stock market prediction.
    • Microsoft Sequence Clustering Algorithm This finds similar instance of events where they have similar sequences of steps. A good example is the analysis of navigation through a web site (click-stream analysis)
    • Microsoft Time Series Algorithm. Used to predict the future e.g. future sales, profit etc.
    • Microsoft Linear Regression Algorithm Plots an ideal line to best represent a series of points to establish the relationship between two continuous variables e.g. age and salary
    • Microsoft Logistic Regression Algorithm Works in a similar fashion to linear regression but uses a curved line where the predicted state is one of a discrete set of possibilities.

    There is a great section on MSDN on when to use which one here.

    One of the best ways to get your head around this is to come to an event like this joint Microsoft / Hitachi TechNet session by Rafal Lukawiecki of Project Botichelli.

     

    Technorati Tags: ,
  • On the Twelfth Day of Christmas an IT Pro sent to me...

    Twelve New Servers Coming

    12 new servers coming 

    The product guys at Microsoft have been busy so there are going to be lots (but not 12) of new servers coming out this year:

    • Windows Server 2008 (in 8 exciting flavours depending on whether you need virtualization or not)
    • SQL Server 2008 editions: developer, express, compact, workgroup, standard and enterprise
    • Search Server 2008
    • Small Business Server 2008
    • Centro (Small Business Server's big brother)
    • ISA Server 2008

    So that's 6, 13, 11 or 18 servers, and I will have to defer to my wife (and my accountant) to do the the creative accounting to make 12!

     swag to the first person who can work out the four project code-names in the drawing and tell me what products they relate to!

    Technorati Tags: ,
  • Monitoring Reporting Services

    Reporting services, like many services, logs what is going on to to a log file, in this case the RS Report Execution Log.  Scanning that log for activity is a pain, so it would make sense to be able to query that and properly report on what's going on. It would be rude not to use Reporting Services to do this even if does seem to be case of going round in circles.  The trick is to load the data into SQL first and then run the reports over that.  Since the layout of the logs is defined, this is a standard bit of work and this can be found on codeplex, along with a load of other useful Reporting Services samples. 

    The sample project for looking at the execution log is available for SQL Server 2005 here  and SQL Server 2008 here and comprises:

    Object 

    Description

    Createtables.sql

    Creates the table structure for storing report execution log data.

    Cleanup.sql

    Removes data from the tables so that you can refresh the tables with new data.

    RSReportExecutionLog_Update.dtsx

    An Integration Services package to extract internal data from the report server database and copies it to a user-defined database that contains report execution log data.

    RSReportExecutionLog_Update.dtsConfig

    Specifies settings for the above package.

    Execution Status Codes.rdl

    Shows the success and failure rates for all report executions occurring within a specified date range. It also shows which failure codes occurred, and which reports failed to process.

    Execution Summary.rdl

    Shows overall report execution statistics for a specified date range. It shows the number of reports processed each day, the top 10 most requested reports, and the top 10 longest running reports.

    Report Summary.rdl

    Shows summary report execution information for a specific report within a specified date range.

    One tip is to change the queries in these management reports to exclude the names management reports themselves from the reports otherwise you will see an entry every time you run one of these reports and that is really going round in circles!

    Thanks to Dave Wickert from the SQL BI Product Unit for reminding me about this.

  • On the Ninth Day of Christmas an IT Pro sent to me...

    Nine Laptops Surfing

     9 Laptops Surfing

    Why do large hotels hammer us for internet in our rooms, and sometimes charge for wireless access in the lobby areas while the independent guys invariably provide it for free?

    Because they can I suppose.

    Technorati Tags:
  • SQL Server BI Everywhere

    Nearly everything Microsoft brings out these days has a SQL Server back-end, except Exchange as Eileen will tell you.  I got asked about cubes in Project Enterprise server recently and managed to resolve an issue with the data not being updated due to a connection problem.

    In a similar vein, I have been helping out one of our Dynamics Evangelists, Michael McClary  at Automaster an ISV specialising software for car dealer management.  Both Automaster and Dynamics AX rely on SQL Server for reports and analytics as well as for the database engine.

    What impressed me was how reports and cubes can be designed from within Dynamics  i.e. without a lot of effort from the IT department and without leaving the application. Of course this does mean that the business need to train up on how to get the most from the system.

    However there is some thought needed with all of this, and also what's the point of Performance Point Server if all of these tools have a BI capability built in?

    The business intelligence provided by the built in capabilities of the various Dynamics solutions supports the function that that tool provides and can provide real insight into the day to day running of that part of the business. Dynamics can be extended by using FRx and/or Forecaster. However in larger organisations you may need a more holistic/ strategic overview from multiple systems and some sort of mechanism to plan across the whole enterprise.

    This is where Performance Point Server comes in, You can pull in and report on data from these other solutions, (whether Microsoft based or not) in one place and also use that source data as a basis for planning.  Performance Point is also dynamics aware, for example this download kit allows you to pull in Dynamic AX data to start the planning process.

     
    Update 9 march 2009
    Since writing this the Microsoft has announced that the planning element of Performance Point Server will not be developed beyond sp3 (coming out later this year).  The monitroing and analytics will be included in the next release of SharePoint.  So in future the only planning tool will be Forecaster.
  • A Funny Thing Happened on the Way to the TechNet Forum

    Farnkie Howerd Forums (fora for Latin geeks) date back to well before Microsoft and even Frankie Howerd.

    TechNet has a raft of forums  on almost every Microsoft technology as it affects IT professionals where you can get and give help.  Both MVP's and Microsoft product guys keep watch on these so you can usually get a good answer within a day or so of posting your question. 

    New forum topics are popping up all the time and I have just spotted two new Performance Point forums here

     

     

    If you need a technical answer in a hurry  then I would recommend a TechNet subscription which gives you 2 support calls a year per subscriber on top of what else you might have in place in your organisation.

    Technorati Tags: ,
  • SQL Server 2008 RTM Date

    There's no easy way to write this but SQL Server 2008 won't be released (RTM) until Q3 of 2008, which is a couple of months later than the provisional date shown on many slides and blogs (including mine) to date.  This isn't really that surprising, many of you have noticed the CTP releases have been later than expected; for example CTP 6 was already supposed to be out according to the timetables being talked about last summer. 

    The product team would rather get it right than stick to a schedule to maintain the secure and reliable reputation that SQL Server has built up over the last 4-5 years, and I have to agree with that. 

    The only ray of sunshine I can offer is that the next CTP(6) will have all of the functionality of the final product and will be available for the worldwide launch event on February 27, leaving a 5-6 month period to ensure it all fits together.

    Technorati Tags:
  • On the Eighth Day of Christmas an IT Pro sent to me...

    Eight Mashups Linking

     8 Mashups Linking

    Happy  New Year.

    You've eaten and drunk too much over the last week or so and I expect your brain has been taking a break as well.  So to get you gently back into technology how about creating a mashup?  This is simply a web-site that uses dynamic components of other web-sties, sort of plagiarism web 2.0 style.  In my cartoon above You would store your picture in Flickr for example suitably tagged with location data and then plot them on to Virtual Earth. 

    But how do you do that without being a developer? the answer is Popfly a free fun easy to use bit of kit specifically designed for this.  there are some more resources  and examples here.

    Technorati Tags: ,,
  • Peas and Carrots

    I hate vegetables, and I have many happy memories of implementing business intelligence solutions in Business Objects XI so this You-tube video bought a smile to my face. 

    I personally don't think there is any mileage in competition bashing, and one of the many things I like about working for the Microsoft machine is that it's pretty much policy here as well.  So watch the video and draw your own conclusions.

  • Peas and Carrots are off

    Unfortunately, the You Tube link in my previous post has been pulled.  For those that didn't see it it was a rap video extolling the virtues of Business Objects and SAP (being together like peas and carrots) while flaming the competition.

    Somebody else out there doesn't like Peas and Carrots either!

  • Evangelism Evangelism Evangelism

    The IT Pro Evangelist team are all at a huge education event , BETT, this week.

    BETT 2008 008

    Viral, Steve, Eileen, James and me.

    We are there to give help and advice to the IT Pro's working in education, so we'll be taking a little longer to answer your comments and e-mails. 

    I have only had one question on SQL server, so that either means no one knows they have it or it's so easy to use that it causes no problems! The enquiries so far have been on Maths 3.0,  all of our web 2.0 stuff like popfly and quite a few on Microsoft Project.  Viral has been inundated with SharePoint enquiries as this is at the core of Microsoft's education platform such as the SharePoint Learning Kit that we have on the codeplex site and the Learning Gateway.

    Technorati Tags: ,
  • Data Mining - Good for Your Health?

    Using Data Mining techniques to establish and predict trends is a black art with as many myths and legends as any book by Terry Pratchett.  Like any magic it's only credible if it works and it clearly does, because certain communities are becoming increasingly worried about the unconstrained use of data mining in the marketing world. A good example is in the pharmaceutical industry where  there is now legislation in New Hampshire to ban the sale or use of of prescriber or patient data for use in marketing, because of the success of these tools in targeting doctors. 

    Of course like any magic or piece of technology it is the use these techniques the decides whether they are malign or benign.  In healthcare these techniques can be used to spot trends and relationships in the causes of disease.  This could be used in several ways , for example:

    • To inform us of the consequences of lifestyle choices such as over eating smoking etc. and few of us would have any problems with that research e.g. the recent work on the link between obesity and cancer 
    • If that data set, by some 'accident', was made available to health insurance companies, and was used to bombard the healthy amongst us with offers of lower health insurance, or was used as a basis to set premiums on life assurance policies, we would be much more concerned.

    So how does all this work in SQL Server and the answer is ...

    check my next post.

    Technorati Tags: ,,
  • Data Mining in SQL Server 2005, 2008

    Data Mining like every other term in business intelligence seams to mean different things to different people. For me the key things that data mining does are:

    • Assist in discovering relationships in data sets that were previously unknown.  An example would be to get an insight on how age, and gender relate to which mobile phone is bought.
    • Allow the creation of models from sample data which can then be applied to real data to predict what is going to happen.  So having worked out that i-phones are bought by males in their late twenties, does the live data over the next six months support that and, by targeting this demographic with appropriate advertising did sales meet our expectations?
    • Has some in built in processes (algorithms) to make this happen with minimal guidance from the user, i.e. the user doesn't do all of the work.  The litmus test here is that you didn't spend all night writing MDX or SQL to get the answers, you simply point the tool at the data, labelled up the relevant attributes, and the answer pops out the other end.

    SQL Server Analysis Services (SSAS) has had some data mining  capabilities since 2000, but they really took off in 2005:

    • The data mining tools can be used as part of the loading of a data warehouse to add new attributes to dimensions
    • Can be used by end users through an intuitive interface in Excel (the work round trips to SSAS)
    • A core set of industry standard algorithms to mine the data:
      • decision trees,
      • regression trees,
      • clustering,
      • logistic and linear regression,
      • neural networks,
      • naive bayes,
      • association,
      • sequence clustering,
      • time series.

    I will post on what each of these does over the next few days, but I have noticed a coule of upcoming TechNet Webcasts on data mining:

    • The most important thing in business intelligence is to have good data to work with and this is doubly true for data mining. Bizarrely though this is also an area where data mining  can actually help solve the problem .  To see how this can be done have a look at this TechNet Webcast on 17 January which applies equally to SQL Server 2005 and 2008 enterprise edition.
    • Predicting the future is the holy grail of business intelligence and SQL Server has a time series algorithm to help with this which gets a boost in SQL Server 2008 and there is a TechNet Webcast on 24 January dedicated to this, which is given by Donald Farmer one of the top BI guys in Microsoft.
  • On the Eleventh Day of Christmas an IT Pro sent to me...

    Eleven Pipelines Piping

    11 Pipelines Piping_thumb[2]

    Screensavers I have known and loved!

    Technorati Tags:

  • SQL Bits the SQL

    SQLBits Logo

    Following the huge success of SQL Bits last year Tony Rogerson and the UK SQL Server Community have planned the SQL for 1st March at the

    Lakeside Conference Centre
    Aston University
    Birmingham
    B4 7ET

    Like last time it's on a Saturday and there will be four tracks as well as sessions by the sponsors :

    • Development
    • DBA
    • BI
    • SQL Server 2008

    To ensure you get maximum value from the event please register as soon as possible and vote on the sessions the SQL Community team have put together.

  • On the Tenth Day of Christmas an IT Pro sent to me...

    Ten Boards a Bleeping

    10 Boards a Bleeping

    Well past Christmas now but I couldn't miss this opportunity to ask why does my motherboard need to bleep when I turn it on? why do I need a sound when windows starts, but most of all how do we get rid of the pervasive hum of modern day life.  A think passive cooling like in the Tranquil PC home server is the way to go as fans inevitably make noise.

    Perhaps the way ahead is to rig our consumer products into the heating system of our homes so they can be cooled by heating the central heating system?

    Technorati Tags: ,
  • Champion IT

    image

    Georgina (aka Curious George) and Faith who look after all things TechNet have this Champion IT competition going on which I really ought to have mentioned before.  Basically you can nominate yourself or a colleague for doing something a little bit special with the Microsoft platform in one of three areas:

    • Latest and Greatest - for work with the latest version of a product
    • Legacy - for work with previous releases and beyond
    • Interoperability - for a solution in a mixed vendor environment.

    The entries for the first run of the competition have closed (and open for the next round again on 4 Feb), so now what we want you to do is vote on the short-listed entries here.

    So all the fun of the Oscars or "I'm a celebrity..." , but without the tears or allegations of vote rigging!

  • 2008 Launch Wave

    The official launch of Windows Server 2008, SQL Server 2008 and Visual Studio 2008 will happen on the 19th March at the Birmingham ICC. 

    The format will be a sort of mini TechEd IT Forum with separate technology tracks, Hands-on Labs, Ask the Experts etc. In addition, you'll be able to hear from IT Pro's who have already adopted SQL Server 2008 in their organisations .

    It's going to be one of the biggest events we have ever run but even so places are going to run out fast so it might be wise to register.   The developer track is already full so I am NOT joking

    If you are interested but can't make it because:

    • Only your manager is allowed to go
    • The boss won't sign off your expenses
    • You live in the outer Hebrides and you can't swim
    • You've only just read this and the live event is full

    You can try out our cool new virtual event -  a sort of second life, by registering for that here

  • SQL Bits Conference Calling

    This is a short post to say thanks to all the MVP's working hard to organise the next SQL Bits conference on 1 March (register here). Tony, Chris, Simon Darren, Allan and Martin and the rest of the SQL Server Community know exactly what's like to plan this when they're scattered across the UK and trying to do their day job.  Fortunately we have regular conference calls to collaborate on this, and for those that haven't experience the delights of a Microsoft conference call this one's for you...

     

    and hopefully this YouTube video will still be there by the time you click on this!

    Technorati Tags: