If like me, you are trying out the new stuff in SQL server 2008 you might need some data to work with. The standard Adventure Works sample databases for OLTP and data warehousing are not in the CTP nor is the the analysis services project. So you'll need to get them from Codeplex which not only has these but some useful discussions. You will also find a light weight version on here.
I would also add that if you pull down the the virtual hard disk with the latest CTP4 installed on you'll still need to add the samples afterwards. I have asked the team who runs connect for them to be included.
I have always used these samples for trying out anything new or to crack a particular problem as this can be shared with anyone else including Microsoft support. For those of you who are struggling with MDX , the industry standard language used to retrieve data and add calculations to a cube ,then this is essential. I have yet to see an example that can't be written into the adventure works cube and I have often found that recoding it from the project I am trying to implement shows that the problem can often be in the cube design. The sample are also good for testing security scenarios, which can often simply be " I can't connect to the live database so can I connect to Adventure Works".
So resist the temptation to delete the database from you dev and test machines and use them for R&D on Friday afternoons like I do.
16 October will be the UK launch for Performance Point 2007 which is going to illicit three reactions:
This coaching company Performance Point have identified a number of symptoms for ailing businesses:
1. You have a vision for your business, but you don't know how to get there.
2. Sales are flat or declining. Sales are below budget.
3. Long-term customers have left. They are doing business with your competitors.
4. Profits are declining.
5. Companies in your industry are expanding or changing. Your business is not.
6. You are working longer and longer hours.
7. There is never enough time in the day to accomplish all of the things that must get done.
8. The business no longer meshes with your values and interests.
9. The business has become too complex for you to handle by yourself.
10. The business is no longer any fun.
11. There are frequent problems with your staff.
12. The best employees are leaving.
13. Problems are occurring in your personal life.
14. You feel you as though you are in a rut.
15. You dread going to work each day.
I am not going to pretend that a new piece of software is going to crack all of these issues but it will help with the first dozen, so you will have more time to work on the last three yourself. But how is a piece of software going to help? well it does three things:
Whether you decide to get some coaching or buy some business intelligence software you do have to actually apply the findings and lessons learned to your situation before anything will change.
Viral Tarpara our latest IT Pro Evangelist and ranking new guy has now blogged his way onto Technet. His chosen specialist subject is the many Office components such as Sharepoint, Groove etc. For some bizarre reason Viral has left warm and sunny LA for Cool Britannia and as he was telling us at our last monthly meeting, he has his blog going , but little else: no home, no money and no car! So please lend your support by linking to his blog. But not if you're the guys who keep posting track backs about dodgy car deals in California as we're both in London, England 6,000+ miles east of the sunshine state!
Is there actually anybody out there with enough money to buy a new car and is daft enough to search for business intelligence or Microsoft when what they meant to type was Nissan, Mazda or Toyota? I won't dignify the culprits by naming them, but maybe they need to read a few posts on business intelligence and how organisations like theirs identify potential customers with appropriate permission based marketing. On the other hand maybe our team could all do car reviews to get our hits up:
I have come across many occasions where IT Professionals have pushed really hard to use all of the BI stuff in SQL Server and have got nowhere fast. Not because they don't know what they are doing nor because they didn't give it their best. So before you have a go it might be worth exploring why business intelligence projects go wrong and what the warning signs are.
BI is just like BI tools that come with SQL Server. You probably bought SQL server or it came as part of another solution to run your business, and so the database engine is essential to what you do and you will get it working however hard it is. However you may not even have noticed all the other tools that came with SQL server, or you feel you don't need it. BI is the same; EPoS and ERP systems are seen as essential,`but many businesses think they can survive without BI. So the most important parts of the readiness test for BI are :
You need both before you can even start. Without business ownership, the need might be recognised but not properly presented to the board, so the business will put up with the pain or not even know the extent of the problem. The other way around is that a visionary manager will not be able to convince the finance director to put up the cash for the project or get the rest of the business to use the system. If a BI system is to be successful the business will have change in response to what the it tells them and changing business culture is far harder than MDX, the CLR or understanding software licensing!
The next two are directly in your area:
Finally there needs to be a culture of analysis in the business. This means users relying on the data they use to make decisions rather than 'using the force'.
Does it matter? You could simply argue everyone is doing it so it must be useful - Microsoft have spent a huge amount of time on developing BI for the masses, Oracle acquired Hyperion to have a stronger position in this space, and the niche players are still growing. You could be fed up of writing reports time and time again for different users who keep asking for changes to them or your line of business system is simply groaning under the weight of badly written end user queries.
The harder question is what do you do? I would quote Douglas Adams here and say you should make it Somebody Else's Problem. Find a business sponsor , identify the information issues arising from the data - it's your data so you can see who's running the big queries , where are the big spreadsheets etc? Try and understand the business priorities and how Bi can address them.
thanks again to Hugh McLeod
Business Intelligence is just like digital photography, the more detail you want the more space you are going to use and in both cases we talk about grain and granularity. If you have ever worked with RAW files on a 'Prosumer' (an IT Professional with as back ground in Japanese wrestling?) camera you will know that they are 3-4 times the size of the equivalent JPEG file and so most us (except James O'Neill on our team) just use JPEG to save some of our hard disks for work.
That's great for all the paparazzi, but what about business intelligence professionals? We are required to capture every transaction in the business in great detail and hold all of the historical information back to the dawn of computing. In the data warehouse this data explosion can test the database server in two key ways, it takes ever longer to add more data in and critically, query performance gets worse and worse. So what's the answer?
Reduce the amount of data held
Don't be tempted to aggregate the data as it is loaded into the data warehouse i.e. keep the grain of the information as detailed as possible from the source system. This is especially true if the data warehouse is going to hold history that isn't being held in the source system .
This is stating the obvious, but push back on some of the requirements e.g. is all that history really necessary to the success of the project? In retail, a rolling 106 weeks of information will give this week this year, this week last year and this week 2 years ago, as well as this year to date and last year to date which is usually enough. If not, think about reducing the grain of information for historical data that is available on line. For example instead of holding the exact date of a sale, aggregate sales to the week level and think about whether it is necessary to hold historical data at the product level where products come and go and the key trend is how a group of products performs. Also in this category is carefully typing the data in the data warehouse - fact tables should have bigint, int , smallint or even tinyint foreign surrogate keys , plus measures/ facts only.
Large dimensions such as customer need careful planning and data typing as well. All columns should be 'not null' for query performance and char and nchar should only be used where the length of the data never varies.
A good tip for the time dimension is to use a primary key in the form YYYYMMDD, but stored as an integer. you can apply logic to this without going back to the data dimension all the time.
As I mentioned yesterday, OLAP is designed to be fast, but you would think that with all of the pointers, data and calculations that it would be orders of magnitude larger than the warehouse it came from and further aggravate the storage issue. However the space overhead issue is not as bad as it appears and there are two reasons for this:
Of course if you can compress the data in an OLAP database surely the same data could be compressed in the source database.This useful feature will only be available in SQL server 2008 onwards and it's designed very much with large fact tables in mind. The keen eyed among you will have noticed it's not in the the latest CTP4 release of the beta and I will be posting more on compression when it's available for download.
Enterprise edition SQL Server support partitions for both relational and OLAP data. The idea is to create a new partition from the latest load of the data and then add this partition to the rest of the partitions that comprise a table or cube data to speed up loading of new information. Partitions also help improve query performance either by limiting a query to just one partition such as the most recent month's data.
Hugh McLeod is quite correct, On-Line Analytical Processing (OLAP) is actually about business, it just sounds like a science project.
Basically OLAP is an awful name, Nigel Pendse, author of the OLAP report calls the same thing FASMI, which I think is a far better term :
In OLAP the cube is the database structure that is queried on and to get a handle on how this works below is a simple 3 dimensional cube
The coordinate system in a cube not only has a reference to a point in multidimensional space it also has an understanding of hierarchies. So the cube 'knows' that January 2007 has a parent called 2007 in the example above. This forms a key part of the OLAP concept - that the results of calculations can be stored at the parent level rather than using on the fly aggregation of all the children e.g. the sales total for 2007 is stored in the cube for bike, components etc. as is the cost of sale. The profit margin % has to be worked out on the fly for bikes for 2007 but this is quick as the cost of sales and the sales that contribute to this calculation are pre-calculated. This gives OLAP it's speed while allowing for rich calculations to be stored. As always in IT there is a catch, and in my opinion that is the complexity of the language used to query a cube and that is MDX or multi-dimensional expressions. Like SQL, this is an open standard that also runs over cubes in such products as Oracle/ Hyperion Essbase, SAS, Cognos Powerplay and Microsoft's SQL Server Analysis Services (SSAS). More on MDX in future posts but if you can't wait, the go to MVP on this in the UK is Chris Webb and the book he has co-authored
You would expect there to be a record and hopefully only one record for each sales person in a typical sales system and another record in another table for each of the customers. But what happens when a one of these sales people gets assigned to a different territory or is promoted to be a regional sales manager? The usual answer is that the record is edited to reflect the change.
That's fine until you decide to start to look at trends over time and analyse how each person in the sales team is performing against target . In the example above, all of the sales in the system will appear to have been made by a person in their current role, so when a report is run the answer will be distorted. This is where a separate data warehouse can help, as choices can be made about how to deal with these kind of changes independently of the line of business system. There are three industry standard approaches to the problem of slowly changing dimensions:
The type 2 scenario requires the generation of a new unique key on the dimension as there will be multiple versions of the same source record each with the same source or business key. This new surrogate key is how the dimension is joined to the sales facts made for that version of the sales person as you can see below
For those of you in black and white, Steve's sales in E region are in orange and those when he moved to SW region are in blue. We can sum and group a join of these 2 tables by [Name] to see all Steve's sales or we can do this for [Territory] to get a true picture of sales in either scenario.
Note that the business key doesn't appear in the fact table and that the surrogate key is just an arbitrary number - usually a sequence number incremented as new rows are added to the dimension. It is also good practice to add extra columns such as [Is Current], above to mark which record is the current one. Alternatively [effective from date] and [effective to date] columns can be used to show when the row was in force.
So using type 2 slowly changing dimensions is the most work but the most versatile approach, and the good news is that there is a wizard for this in SQL Server 2005 Integration services (SSIS).
What's different about a data warehouse is that all of the information from the line of business or source systems gets rearranged into two types of information:
Dimensions contains a group of descriptive attributes. An example is a product where we have its colour, size, description, product group, product category etc. It should have a single part primary key and then everything else is typically made up of flags and strings.
Facts on the other hand are all numbers. Often these are costs and prices and quantities and all are a measures of activity.
The rearrangement process, dimensional modeling, is a disciplined structured exercise very similar to the normalisation exercise that a data base designer goes through as part of the wider systems analysis piece of a project. However dimensional modeling results in a structure optimised for reporting, where normalisation is focused on performance for transactional throughput .
Surprisingly both techniques are nearly as old as each other and both had a lot to do with the work of Edgar F "Ted" Codd He also coined the term OLAP (On- Line Analytical Processing) so he also helped to get business intelligence off the ground.
So what is about this technique that makes it faster and easier for reporting and at the same time annoys database purists?
For a start there are a lot less tables involved if we go back to the product dimension above, all of the product information in a data warehouse will reside in one table where before it would have been in six or seven related tables in the source system. When the product dimension is built these relationships are flattened, so queries have less joins in them.
Of course there's a catch and that is that this new structure appears to be a nightmare to update and looks like it wastes a lot of disk space. Yes it is hard to update and special techniques are used to do this, but the data warehouse is kept constantly up to date and disk space is cheap.
A popular quick alternative to the data warehouse is the operational data store (ODS). You mirror or snapshot your line of business system and use the off line copy for reporting. It's quick and somehow free because you were doing it already. Well not really that schema is still not designed for reporting although I suppose you could run a script to index everything that moves, but then it wouldn't be so great as your hot standby.
Just in case you don't get the technet flash from Georgina, SQL server 2008 CTP4 is out there for you to enjoy. Community Technology Previews (CTP) for SQL server 2008 are different from the previews released for 2005 in that if a feature is in a 2008 CTP it is well formed and tested, but not necessarily optimised for performance. As you can see from the download site each CTP will expose more features that will be in the final release in four broad themes:
So you do need to refresh the CTP you have, keep watching the site, and the various SQL server blogs like mine. For those of you in London there is a good briefing session on SQL Server 2008 on 26th Sep at IMG's offices in Holborn. To sign up go to the events on the UK SQL server community group at http://www.sqlserverfaq.com/
I know that there are still a lot of organisations who are still on SQL server 2000 or who have not leveraged all of the functionality in SQL server 2005. So if you want to know more the SQL Server Community is running a SQL Bits day at our offices in Thames Valley Park on Saturday 6th October. It's free to attend and our MVPs and myself will be among a star studded cast of presenters.
You could of course watch your team loose or find out they haven't got your size in that little black number, but where's the fun in that?
When not in business intelligence mode, I am a PC gamer and being an old PC Gamer I never got the console thing. Like many of you I was hesitant about upgrading to Vista as my black box has been continually modded over the last 3 years to keep up with the latest games. On the upside there are a few Vista only (direct x10) games appearing and I also wanted to see if Vista 64-bit would work with my dual core AMD Opteron CPU.
I am rarely impressed but everything just worked. Then I realised that only one of the graphics cards showed up, and I have two running in parallel, what NVIDIA call SLI. I wondered if NVIDA had fixed the SLI/Vista issue and sure enough a quick download from their site (http://www.nvidia.co.uk/page/drivers.html) and problem sorted. So I have 5.0 on the Vista performance scale which is at the high end of the master chief's recommendations for Halo 2 .
Of course Halo 3 is coming out in September, so perhaps I will need that Xbox console thing, or as I explained to my wife "a fully featured DVD player with time-slip recording, and it comes in black to go with the TV".
The terms Business Intelligence, Data Warehousing, and Data Marts are used interchangeably by many people so I thought it would be good to explain the differences.
The data warehouse is the repository for all the data to be analysed and reported on without the means to do so. A good analogy would be SQL server 2000 before reporting services was added to it, it could store all the information you needed but there was no mechanism for getting information out.
Data Warehouses can be really big and can provide analysis and reporting for the whole enterprise. These big projects got a bad press for taking too long to implement and for being too difficult to change. The response to this was the data mart, which is not simply renaming the same thing to deflect criticism.
A data mart is a subset of information in an organisation that meets a specific business need or is just for a part of the business. Individual data marts are implemented on the basis of need and return on investment. The idea is that the individual data marts are joined together to form the whole data warehouse.
This sounds great but has been beset with problems for 2 key reasons:
1. Each department does its own thing to implement tactical solutions, which leads to 'mart madness' where there are several implementations using different technologies.
2. Each data mart is designed differently and so they cannot be joined together for consistent enterprise reporting.
This really is just two sides of the same problem, but it is so important I thought it would be good to mention twice!
So just like every other area of IT, architecture and design must be properly addressed or all the later stages will fail.
If you're staying with this Business Intelligence (BI) 101 discussion, you are probably wondering why bother? No that doesn't mean why bother with this blog but why bother with BI at all. Businesses need reliable information in order to survive and thrive, and business users need this when they want it and how they want it. The sort of information they're after is shown below:
How does this information help a business? Here are a few examples below from projects I have worked on before I got swallowed by the blue monster:
So it isn't just about retail, the traditional home of business intelligence, the techniques can be applied to just about any business to gain better insight into problems and opportunities.
In the world of management speak it is common practice to re-brand an essentially old idea with a new name. Many years ago we had EIS (executive information systems), decision support systems and MI (management information) systems. I am sure there are experts out there who will post me precise definitions of these and be able to define the differences, but today this kind of solution is generally referred to as Business Intelligence (BI). What BI does is to make sense of the various types of structured data in a company and present this to the business user (aka Information Workers) in such a way that this can be analysed and reported on with little further support from the IT professional.
Business Intelligence is a collection of technologies :
So that's a view of BI from about 5 miles away and your question is why do I need to do this at all? My next post will address this, but if it wasn't a really valuable part of a modern business, why do so many and varied companies implement a BI solution. If there isn't any obvious BI in your company, then there will be at least one person using the most complex set of excel spreadsheets you will have seen and that will be what the business will be relying on to set it's strategy, and that's your BI solution.
As I have mentioned I am the new kid on the blog in Eileen Brown's team. This means that for most of the time I am the trainee and my peers are all light years ahead of me - except that is when it comes to Business Intelligence. Then the conversation goes a bit like this (with apologies to the Cat in Red Dwarf)
Evangelist 1 - "So What is it?"
Me - "It's a suite of tools to enable a business to better manage its performance"
Me - "It's the combination of a Data Warehouse coupled with analytical and reporting tools"
Me - "It's a system that combines all the disparate data in the enterprise into a cohesive whole"
Me - "It empowers professionals with information"
Me - "OK I give up - read my blog"
So over the next few days and weeks I will be sharing my thoughts on the world of business intelligence for the IT professionals out there.
After the bad news about what's missing in SQL Server 2008, I thought it would be good to talk about what the most exciting bit is in the new release. For me this is the next version of Reporting Services.
Reporting Services in SQL server 2005 can do amazing things and has already shaken up the reporting market place, but for me it has one major drawback and that is that to make it fly you have to be using Visual Studio. And that's OK for us IT Professionals, but what about lawyers (no jokes please), accountants, sales staff, and all of the non-IT Professionals? They get report builder which is very easy to use but is also severely limited in what it can do. This will change in the next release of Reporting Services as you can see from the screen grab below:
This is going to make our lives a little easier - we build the report models for the information workers with all the content they need and let them get on with designing their reports their way. Which will leave us a bit more time to work on those documentation, and maintenance tasks at the bottom of our in trays.
I have just arrived on Eileen Brown's team as the Data IT Pro evangelist. During my extensive interview process I was asked by Keith Burns about the common language runtime (CLR) in SQL server 2005 and I replied with a quotation from his blog "that if it can't be done in SQL it isn't worth doing". That would seem to rule out blogging but with the arrival of SQL server 2008 and it's ability to handle the vast amounts of unstructured data we encounter both at home and work this could well turn out to be absolutely true!
The other point about this remark is our digital immortality. At our internal technical conference last week, one of the speakers talked about the first computer he had to use, a secondhand missile guidance system which he had to hook up to a mass spectrometer. After spending many sleepless nights getting the system working he prepared a comprehensive demo for the chemistry lab staff. They weren't impressed at all by his technical wizadry and just wanted to get on and use the mass spectrometer. So he slightly 'modified' the code to make the machine crash randomly and added a lot of extra unnecessary steps to the startup instruction manual than was necessary. This all came back to haunt him when the machine kept hanging as his random number generator turned out not to be so random and he had to use his old time consuming instructions to continually restart the system in order to debug his code.
So not only are your finest moments preserved forever but also our worst mistakes. Which needs me neatly onto one of the darker corners of SQL Server and that is deprecation of features as occasionally we need to tidy up some of the the throwbacks to earlier versions of the product.
There are two parts to this - features that won't be supported in SQL server 2008 as shown below:
And features that will be deprecated in future releases SQL server after the 2008 version.
So careful planning is needed if you plan to migrate to SQL server 2008.