With the launch of SQL Server 2008 R2 and SQL Server Denali ctp3 there has been ever more BI functionality built in but nothing has been mentioned about the strange world of data mining. Data mining has been in SQL Server since analysis services was launched but I have rarely seen it being used - it doesn’t seem to have entered mainstream consciousness in the way that some of the cool reporting front end tools have possibly because it smacks a little of black magic, or is seen as unnecessary or hard.
So here’s a quick post to get you thinking and hopefully encourage you to take it for a spin.
What exactly is Data Mining?
The simple answer is the ability to derive insights from your data that you didn’t know were there. Classic examples of use are customers who bought this also bought that (what is called basket analysis in the retail world) and trying to understand what factors influence a purchase.
How does it work?
There are a number of approaches, possibly the most famous being neural networking. In fact there are six different techniques built into analysis services each designed to do a particular job:
In all of these techniques you identify a sample set of data which has the values of the object you are trying to predict, from which you create a model. You then apply this model to live data to do your prediction and analysis.
How do I do this?
There are three ways of creating models:
1. You can create mining models in BI development studio as part of an analysis services project in SQL Server 2008 R2 or in the beta of the new version Project Denali and there are examples mining models in the sample adventure works analysis service project from Codeplex:
2. You can code a model directly in Data Mining Expressions (DMX) which might be appropriate for embedding models into an application such as an ecommerce site.
3. There’s an Excel add-in for data mining :
This works by taking the data in a spreadsheet and then posting it to an instance of analysis services for processing and then returning the results back to Excel. Originally designed for Excel 2007 it does work in Excel 2010 but only for 32bit installations, and while it can’t use data in the PowerPivot window in PowerPivot for Excel it can consume data in Excel pivot tables derived from PowerPivot data. Note that when I mention analysis services here, I mean classic analysis services in olap mode not the new shiny tabular based analysis services that also exists in SQL Server Denali.
First of all you are probably licensed for all of this now, you’ll have SQL Server standard or higher around somewhere you can use this as the mining engine and a copy of Excel for the front end (the excel add-in for data mining isn’t a separate license). However this might all seem a bit abstract and you might have a hard time convincing yourself never mind your users that this is worth the effort. One idea I have is what if you took the data out of some of the data you have access to see how this stuff works, for example data from System Center, or whatever data centre management tools you have (dare I mention VSphere?) your helpdesk system or asset management tools to get insight on what is really going on. This might be a useful project in it’s own right but as the data is not sensitive you can share it as an example of what’s possible and possibly impress your manager as well as the business owners.
The data mining add ins for office do not function last time I checked in Office 2010. Hoping something comes out for PowerPivot
That screenshot in the post above is Data mning and PowerPivot in Excel 2010, albeit 32bit Excel, so not sure what porblems you are having. However data mining built in to PowerPivot or BISM would be nice so post your thoughts on Connect.
I would love to be able to use this feature.
I've got no luck with *64* bit Excel 2010. Is there a connect item for this? The lack of support me that Microsoft might be forgetting about this feature too.
The other options open to me are learning DMX (Is it as hard as MDX? Because that's daunting) or creating a model in BIDS. How hard is that to learn do you think?
I guess you would log this under Denali. I completely agree that the lack of and x64 version needs to be addressed.
Although PowerPivot and Data Mining sit side by side in Excel they don't really work together.
My other wish is for a decoupled data mining tool that doesn't sit in BI dev studio, much as we have with Report Builder 3.
However your voice carries more weigth than mine as you are a customer so Connect away please and get your colleagues to vote as well
Hopefully I'm not hijacking someone's thread... I tried getting the existing Data Mining add-ins to work in Excel 2010 (worked fine in 2007), and it causes Excel to crash every time. I uninstalled 2010 and went back to 2007, and still the add-ins crash. Do I need to clean the registry or something... ??
Not at all sure what the problem is but it will work (as you can see from my screenshot), and we do support it. However I am not a support guy so you'll need either check technet forums or rais a call with us to resolve your issue.
I recall there being a registry hack that let me use the addin in excel 2010. Someone from MS advised me to go into the registry where the excel add ins are registred and set the key to enabled for the data mining addin. This worked but I had to do it every time I wanted to use the addin.
Not only is DMX dead (no changes since SQL 2005), but it looks like MDX is being abandoned too - the core devs have left MS and the focus has shifted to DAX - no new MDX features in SQL 2012.