Updated 19th September 2005 - included link (below) to aforementioned Nature article
Business Intelligence (BI):The way it is without the blah blah.
I love BI and I want you to love it too. Why? Because I think it can benefit mankind; not just because it can contribute to wealth creation by contributing to more efficient business but also because the supporting technology could make a massive contribution to science and a host of other human pursuits, if only it were more widely understood, appreciated and consequently adopted. I guess we’ll need to find a new term for the technology before we can reach that plateau.
Business Intelligence, as a term, was coined by Howard Dresner (of Gartner) in 1996; well that’s what I read somewhere sometime. He may have coined the term but I don’t believe he defined it; for two reasons.
Firstly, I haven’t found any definition that looks remotely like a definition. Howard works for Gartner and Gartner doesn’t appear to do anything with such few words as might make for the kind of definition you expect from a dictionary. Having been to a Gartner BI conference, I’m quite convinced they would find it a challenge to describe something as simple as flicking a light switch in anything less than a 500k pdf.
Secondly, a word or expression is defined by its common usage; it is this usage that earns it its place in a dictionary or accepted parlance. An obvious example is the word ‘gay’, a word the definition of which has been hijacked and redefined by common usage.
I’m now going to give you two definitions, one for the business person and one for the ITPro. Why two? Because these two groups speak a different language. Starting with the business persons’ definition, I give you this; Business Intelligence is:
“Knowing what you need to know, when you need to know it; then taking action based upon this knowledge.”
Now I run the risk of sounding like Donald Rumsfeld as I try to explain that sometimes, we don’t know actually what it is we need to know about. This of course does suggest as Mr Rumsfeld said, “….there are things we don’t know, we don’t know”, which when he said it caused half the world to collapse in a heap of laughter – but I think that was the context as much as anything else.
For the things ‘we don’t know, we don’t know’ we can use Data Mining to discover. Data Mining employs the huge amounts of computing power we have at our disposal to scan potentially vast quantities of data looking for patterns and trends we literally can’t see with the naked eye. Once Data Mining has uncovered these patterns and trends we can use them to make predictions as though they were rules. More on Data Mining later.
For the things ‘we know, we don’t know’ (I can’t help hearing Donald say these words), we traditionally turn to the relational database for the answer. Just for a moment lets assume our relational database contains the information we want; well that’s fair enough as it often does. However, when you go to your DBA and ask him/her, “can I have a report showing our top ten most profitable customers in each country we trade in”, they just hold on to their sides, fall off their stools, open their mouths and scream with laughter.
When, finally, the DBA wipes away the tears and stops repeatedly relapsing in to fits of hysteria every time they try and repeat your request; they finally repeat your question. Then, they manage to squeeze out probably the funniest question they’ve ever heard, “When do you want it by?” You just know that this question isn’t worth answering, anything you had in mind, even when multiplied by ten, is merely going to cause this DBA to explode like a life size jack-in-the-box. You walk off feeling stupid.
So why is this such a laughable request? The answer is that traditional relational databases really can’t answer questions like this in anything like a reasonable time scale. What’s worse is the data required to answer the question is often held in some particular source system or line of business application, like SAP, SAGE, bespoke or in house application; and possibly more than one such system. This kind of report can therefore impact the performance of those often critical systems to make the whole request doubly hysterical implausible. This assumes that actually combing the data, the units of measurement etc, from these systems is a formality – which of course it isn’t.
So why is such a question so arduous that a traditional relational database finds it so difficult to answer? This is because the answer involves scanning vast amounts of data and performing lots of maths. Scanning the data can also involve joining numerous tables because the source systems are usually On Line Transactional Processing (OLTP) systems that have their data tables and indexes optimised to reduce data redundancy and to speed up short transactions involving few records. OLTP optimisations are in many ways the reverse of the optimisations required by reporting applications.
The maths involved in this question are also potentially crippling to a database server. In this case we would have to work out the profit for each and every sale in all the sales regions where the profit might be expressed as the revenue – cost – commission – discount and then converted to another currency. Then we would need to aggregate the profit for each customer and then rank the lot in each region.
It all gets worse when 30 other managers all want the similar data. It all gets much worse when different managers want to view aggregate sales data across different analytical dimensions such as product, employee and date. Each of these dimensions may well be hierarchical and require aggregate information at each different level e.g. a Geography dimension may have a hierarchy of trading region – country – district - county – town.
The type of question we’re trying to answer is typical of the type of question Business Intelligence is here to answer. Sometimes referred to as complex questions, these questions require numeric answers. A list of your customers living in London who have a surname beginning with ‘s’ is not a complex question; the answer doesn’t really constitute Business Intelligence as it doesn’t really offer any real insight into your business. Nor does knowing your business is on fire – valuable information yes – but we can leave that question to highly specialised atmosphere analysing devices that employ audibly invasive feedback mechanisms to offer insight into the state of combustion ones business might be in. These devices are commonly known as fire alarms.
In the technical community, the systems that support BI are usually referred to Decision Support systems because they support decision making – I guess that’s because technical people tend to say it the way it is. So there's the definition for IT Pros the definition of BI is:
"Decision support" or as we'll go on to see, it could be deifined as "Delivering OLAP and Data Mining".
So how do good BI systems overcome these limitations of traditional databases? The answer is - they cheat; well by relational database standards they do. They pre-calculate many of the answers, or constituents thereof. By pre-calculating these answers, which are by their nature aggregates and found at the intersection of fact and dimension tables, a useful BI system can summon up answers to complex questions with the same effortlessness a baby can summon up projectile vomit. These aggregates are stored in OLAP (On-line Analytical Processing (silly name)) cubes
Fact and dimension tables – what are these? A fact table is comprised of records that are each known as facts. A fact, as a record, is comprised of fields where each field (or column), not employed by the key, is known as a measure. This is because the field holds a value that a business wants to ‘measure’ in order to assess the state of that business.
The dimension tables fan out from the fact table, in either a star or snowflake arrangement, connected to the fact table by its key value. All the dimension table keys form a complex key for the fact table. The dimensions contain the data that describe the context of the reported measures, a context desired by analysts and not OLTP architects. For example geography, products, time and employees are natural examples and are typified by their hierarchical nature. E.g. A time dimension can have a hierarchy of year, quarter, month, week, day etc. A product dimension may have a hierarchy of brand, product category, product sub category and (at the bottom) product.
So, in attempt to bring this ramble to some sort of an illuminating pinnacle, I need to summarize that which I have already written. Business Intelligence is predicated, at least in part, on the notion of finding answers to mathematically challenging questions, storing those answers, and then being able to utilise them to answer further ad-hoc complex questions, quickly, and on demand from anyone, at anytime. We store these aggregates in the famous ‘OLAP cube’.
The OLAP cube, as a concept, is merely a structure for storing pre-calculated aggregates. However, its name could not have been better chosen to confuse and to deter the inquisitive. A cube, as we all know, has three axis x, y and z. Most people don’t have a problem with this, until that is, we start talking about cubes with more than three axis; a four of five dimensional cube just fries people brains – understandably.
Now is the time to assassinate the terms OLAP and cube. OLAP should be replaced by FASMI – Fast Analysis of Shared Multi-dimensional information. FASMI may sound even worse than OLAP – but at least it has a pertinent meaning. Its ‘fast’ – this is what it’s all about, remember the DBA laughing his head off earlier on? Being fast means we get the answers we want, when we want them and when they are still useful to us. ‘Shared Multi-dimensional’ means there are a number of dimensions we can use for our analysis which are shared by more than one set of facts; this means we can analyse our business by a consistent set of dimensions across our measures, across our entire enterprise, regardless of language, currency or measuring unit (metric).
I say, ‘in part’, because there is another ‘part’ to Business Intelligence which seeks to provide competitive advantage by providing predictions derived from the data mining I mentioned earlier. These predictions enable a business to get ahead of competition that relies on guess work and intuition to help form its strategy. These predictions can be used for many purposes including: spotting fraud, target marketing to the most receptive, generate maintenance plans, provide suggestions for up-sell (basket analysis), automated equities trading and even data cleansing by predicting anomalies. There are all sorts of applications for the predictive capabilities of data mining; I wager many are yet to be thought of.
So, in second desperate attempt to bring this rambling to a conclusion, I will, reluctantly resort to summarising Business Intelligence as the ability to gain knowledge from a panorama of visible data and in a timeframe that gives value to that knowledge. The term BI, however, says nothing about the true capability of the underlying technology, a technology that should be employed to gain understanding in realms far beyond those of just ‘Business’, but also in science, engineering, politics, economics, anthropology and who knows, maybe even art.
I have seen OLAP cubes used in sport; Cricket to be precise. A company has taken all the Cricket data from the Wisden Cricket Almanac going back into the 1800s and put it in a cube, which commentators then slice and dice in a never ending quest to generate bizarre statistics that wow Cricket enthusiasts. Did you know for each ball bowled they record 60 facts! I also know of a research team that's about to publish, in Nature magazine, some interesting findings in the field of evolution. Their research was aided by SQL Analysis Services OLAP - I'll link to it here when it goes live. The article is now live @ http://www.nature.com/nature/journal/v436/n7051/index.html#Feature and talked about in my blog post @ http://blogs.technet.com/mat_stephen/archive/2005/09/20/Darwin.aspx
If you can find any other examples, outside business, where OLAP and Data Mining have been, or could be used, please let me know.
For those interested in Data Warehousing - I hope to do a 'Data Warehousing without the blah blah' soon