• The business case for Exchange 2007

    (this is a follow on to the previous post on measuring business impact, and are my own thoughts on the case for moving to Exchange 2007)

    There are plenty of resources already published which talk about the top reasons to deploy, upgrade or migrate to Exchange 2007 - the top 10 reasons page would be a good place to start. I'd like to draw out some tangible benefits which are maybe less obvious than the headline-grabbing "reduce costs"/"make everyone's life better" type reasons. I'll approach these reasons over a number of posts, otherwise this blog will end up reading like a whitepaper (and nobody will read it...)

    GOAL: Be more available at a realistic price

    High availability is one of those aims which is often harder to achieve than it first appears. If you want a really highly-available system, you need to think hard not only about which bits need to be procured and deployed (eg clustered hardware and the appropriate software that works with it), but the systems management and operations teams need to be structured in such a way that they can actually deliver the promised availability. Also, a bit like disaster recovery, high availability is always easier to justify following an event where not having it is sorely missed... eg if a failure happens and knocks the systems out of production for a while, it'll be easier to go cap-in-hand to the budget holder and ask for more money to stop it happening again.

    Example: Mrs Dalton runs her own business, and like many SMBs, money was tight when the company was starting up - to the extent that they used hand-me-down PC hardware to run their main file/print/mail server. I had always said that this needed to be temporary only, and how they really should buy something better, and it was always something that was going to happen in the future.

    Since I do all the IT in the business (and I don't claim to do it well - only well enough that it stops being a burden for me... another characteristic of small businesses, I think), and Mrs D is the 1st line support for anyone in the office if/when things go wrong, it can be a house of cards if we're both away. A year or two after they started, the (temporary) server blew its power supply whilst we were abroad on holiday, meaning there was no IT services at all - no internal or internet access (since the DHCP server was now offline) which ultimately meant no networked printers, no file shares with all the client docs, no mail (obviously) - basically everything stopped.

    A local PC repair company was called in and managed to replace the PSU and restore working order (at a predictably high degree of expense), restoring normal service after 2 days of almost complete downtime.

    Guess what? When we got back, the order went in for a nice shiny server with redundant PSU, redundant disks etc etc. No more questions asked...

    Now a historical approach to making Exchange highly available would be to cluster the servers - something I've talked about previously in a Clustering & High Availability post.

    The principal downside to the traditional Exchange 2003-style cluster (now known as a Single Copy Cluster) was that it required a Storage Area Network (at least if you wanted more than 2 nodes), which could be expensive compared to the kind of high-capacity local disk drives that might be the choice for a stand-alone server. Managing a SAN can be a costly and complex activity, especially if all you want to do with it is to use it with Exchange.

    Also, with the Single-Copy model, there's still a single point of failure - if the data on the SAN got corrupted (or worst case, the SAN itself goes boom), then everything is lost and you have to go back to the last backup, which could have been hours or even days old.

    NET: Clustering Exchange, in the traditional sense, can help you deliver a better quality of service. Downtime through routine maintenance is reduced and fault tolerance of servers is automatically provided (to a point).

    Now accepting that a single copy cluster (SCC) solution might be fine for reducing downtime due to more minor hardware failure or for managing the service uptime during routine maintenance, it doesn't provide a true disaster-tolerant solution. Tragic events like the Sept 11th attacks, or the public transport bombs in cities such as London and Madrid, made a lot of organisations take the threat of total loss of their service more seriously ... meaning more started looking at meaningful ways of providing a lights-out disaster recovery datacenter. In some industries, this is even a regulatory requirement.

    Replication, Replication, Replication

    Thinking about true site-tolerant DR just makes everything more complex by multiples - in the SCC environment, the only supported way to replicate data to the DR site will be to do it synchronously - ie the Exchange servers in site A write data to their SAN, which replicates that write to the SAN in site B, which acknowledges that it has received that data, all before the SAN in site A can acknowledge to the servers that the data has successfully been written. All this adds huge latency to the process, and can consume large amounts of high-speed bandwidth not to mention duplication of hardware and typically expensive software (to manage the replication) at both sides.

    If you plan to shortcut this approach and use some other piece of replication software (which is installed on the Exchange servers at both ends) to manage the process, be careful - there are some clear supportability boundaries which you need to be aware of. Ask yourself - is taking a short cut to save money in a high availability solution, just a false economy? Check out the Deployment Guidelines for multi-site replication in Exchange 2003.

    There are other approaches which could be relevant to you for site-loss resilience. In most cases, were you to completely lose a site (and for a period of time measured at least in days and possibly indefinitely), there will be other applications which need to be brought online more quickly than perhaps your email system - critical business systems on which your organisation depends. Also, if you lost a site entirely, there's the logistics of managing where all the people are going to go? Work from home? Sit in temporary offices?

    One practical solution here is to use something in Exchange 2003 or 2007 called Dial-tone recovery. In essence, it's a way of bringing up Exchange service at a remote location without having immediate access to all the Exchange data. So your users can at least log in and receive mail, and be able to use email to communicate during the time of adjustment, with the premise that at some point in the near future (once all the other important systems are up & running), their previous Exchange mailbox data will be brought back online and they can access it again. Maybe that data isn't going to be complete, though - it could be simply a copy of the last night's backup which can be restored onto the servers at the secondary site.

    Using Dial-tone (and an associated model called Standby clustering, where manual activation of standby servers in a secondary datacenter can bring service - and maybe data - online), can provide you a way of keep service availability high (albeit with temporary lowering of the quality, since all the historic data isn't there) at a time when you might really need that service (ie in a true disaster). Both of these approaches can be achieved without the complexity and expense of sharing disk storage, and without having to replicate the data in real-time to a secondary location.

    Exchange 2007 can help you solve this problem, out of the box

    Exchange 2007 introduced a new model called Cluster Continuous Replication (CCR) which provides a near-real-time replication process. This is modelled in such Cluster Continuous Replication Architecture

    a way that you have a pair of Exchange mailbox servers (and they can only be doing the mailbox role, meaning you're going to need other servers to take care of servicing web clients, performing mail delivery etc), and one of the servers is "active" at any time, with CCR taking care of the process of making sure that the copy of the data is also kept up to date, and providing the mechanism to automatically (or manually) fail over between the two nodes, and the two copies of the data.

    What's perhaps most significant about CCR is (apart from the fact that it's in the box and therefore fully supported by Microsoft), is that there is no longer a requirement for the cluster nodes to access shared disk resources... meaning you don't need a SAN (now, you may still have reasons for wanting a SAN, but it's just not a requirement any more).

    NET: Cluster Continuous Replication in Exchange 2007 can deliver a 2-node shared-nothing cluster architecture, where total failure of all components on one side can be automatically dealt with. Since there's no requirement to share disk resources between the nodes, it may be possible to use high-speed, dedicated disks for each node, reducing the cost of procurement and the cost & complexity of managing the storage.

    Exchange 2007 also offers Local Continuous Replication (LCR), designed for stand-alone servers to keep 2 copies of their databases on different sets of disks. LCR could be used to provide a low-cost way of keeping a copy of the data in a different place, ready to be brought online through a manual process. It is only applicable in a disaster recovery scenario, since it will not offer any form of failover in the event of a server failure or planned downtime.

    Standby Continuous Replication (SCR) is the name given to another component of Exchange 2007, due to be part of the next service pack. This will provide a means to have standby, manually-activated, servers at a remote location, which receive a replica of data from a primary site, but without requiring the servers to be clustered. SCR could be used in conjunction with CCR, so a cluster which provides high availability at one location could also send a 3rd replica of its data to a remote site, to be used in case of total failure of the primary site.

     The key point is "reasonable price"

    In summary, then: reducing downtime in your Exchange environment through clustering presents some challenges.

    • If you only have one site, you can cluster servers to share disk storage and get a higher level of service availability (assuming you have the skills to manage the cluster properly). To do this, you'll need some form of storage area network or iSCSI NAS appliance.
    • If you need to provide site-loss resilience (either temporary but major, such as a complete power loss, or catastrophic, such as total loss of the site), there are 3rd-party software-based replication approaches which may be effective, but are not supported by Microsoft. Although these solutions may work well, you will need to factor in the possible additional risk of a more complex support arrangement. The time you least want to be struggling to find out who can and should be helping you get through a problem, is when you've had a site loss and are desperately trying to restore service.
    • Fully supported site-loss resilience with Exchange 2003 can only be achieved by replicating data at a storage subsystem level - in essence, you have servers and SANs at both sites, and the SANs take care of real-time, synchronous, replication of the data between the sites. This can be expensive to procure (with proprietary replication technology not to mention high speed, low latency network to connect the sites - typically dark fibre), and complex to manage.
    • There are manual approaches which can be used to provide a level of service at a secondary site, without requiring 3rd party software or hardware solutions - but these approaches are designed to be used for true disaster recovery, not necessarily appropriate for short-term outages such as temporary power failure or server hardware failure.
    • The Cluster Continuous Replication approach in Exchange 2007 can be used to deliver a highly-available cluster in one site, or can be spanned across sites (subject to network capacity etc) to provide high-availability for server maintenance, and a degree of protection against total site failure of either location.

    NET: The 3 different replication models which are integral to Exchange 2007 (LCR, CCR and SCR) can help satisfy an organisation's requirements to provide a highly-available, and disaster-tolerant, enterprise messaging system. This can be achieved without requiring proprietary and expensive 3rd party software and/or hardware solutions, compared with what would be required to deliver the same service using Exchange 2003.

     

    Topics to come in the next installments of the business case for Exchange 2007 include:

    • Lower the risk of being non-compliant
    • Reduce the backup burden
    • Make flexible working easier
  • Technology changes during the Blair era

    So Tony Blair stepped down as the UK's Prime Minister this week, just over 10 years since his ascendance to the position. Funnily enough, I got my "10 year" service award at Microsoft recently (a fetching crystal sculpture and a note from Bill 'n' Steve thanking me for the last decade's commitment), which got me all misty-eyed and thinking about just how far the whole technology landscape has evolved in that time. I also did a presentation the other day to a customer's gathering of IT people from across the world, who wanted to hear about future directions in Microsoft products. I figured it would be worth taking a retrospective before talking about how things were envisaged to change in the next few years.

    When I joined Microsoft in June 1997, my first laptop was a Toshiba T4900CT - resplendent with 24Mb of RAM and a Pentium 75 processor. My current phone now has 3 times as much internal storage (forgetting about the 1Gb MicroSD card), a CPU that's probably 5 times as powerful and a brighter LCD display which may be only a quarter the resolution, but displays 16 times as many colours.

    In 1997, there was no such thing as broadband (unless you fancied paying for a Kilo- or even Mega-stream fixed line) and mobile data was something that could be sent over the RAM Mobile Data Network at speeds of maybe 9kbps. I do remember playing with an Ericsson wireless adapter which allowed a PC to get onto the RAM network - it was a type III PCMCIA card (meaning it took up 2 slots), it had a long retractable antenna, and if you used it anywhere near the CRT monitor that would be on the average desk, you'd see massive picture distortion (and I mean, pulses & spikes that would drag everything on the screen over to one side) that would make anyone think twice about sitting too close to the adapter...

    The standard issue mobile phone was the Nokia 2110, a brick by modern standards which was twice as thick & twice as heavy as my Orange SPV E600, though the Nokia's battery was only half as powerful but was said to last almost as long as the SPV's. Don't even think about wireless data, a colour screen, downloadable content or even synchronisation with other data sources like email.

    People didn't buy stuff on the internet in 1997 - in fact, a pioneering initiative called "e-Christmas" was set up at the end of that year, to encourage electronic commerce - I recall being able to order goods from as many as a handful of retailers, across as many as a few dozen product lines!

    One could go on and on - at the risk of sounding like an old buffer. If Ray Kurzweil is right, and the pace of change is far from constant but is in fact accelerating and has been since the industrial revolution, then we'll see the same order of magnitude change in technology as we had in the last ten years, within the next three.

    In tech terms, there was no such thing as the good old days: it's never been better than it is now, and it's going to keep getting better at a faster rate, for as long as I think anyone can guess.

  • The Campaign for Real Pedantry, erm, I mean numbers

    Hats off to James O'Neill for a display of true, world-class pedantry to which I could only aspire. It drives me nuts to get emails with badly formatted phone numbers which can't be dialled on Smartphones without first editing them, and now that I've started using Office Communications Server 2007 (more later) as the backbone for my real office phone, it impedes the usability of that too.

    James' beef is that a lot of people incorrectly write a UK phone number which would be defined as 0118 909 nnnn (where 0118 is the area dialing code, and 909nnnn is the local number, the last 4 digits of which form an extension number in this specific example, available through DDI).

    Here are some examples of number crime:

    • (0) 118 909 nnnn - Incorrect and useless. Why put the first zero in brackets at all? Nobody is ever going to dial starting '118'
    • +44 (0) 118 909 nnnn - Incorrect, though perhaps useful to people who don't understand country codes. There may well be lots of people out there who don't ever call international and don't understand the "+44" model of dialing from a mobile phone. So maybe the (0) will indicated to them that maybe they should add it in... but it could be confusing to overseas dialers who're calling this number - how do they know if they should dial +44 118 or +44 0 118?
    • +44 (0) (118) 909 nnnn - someone likes the brackets just a little too much
    • +44 (0118) 909 nnnn - even worse than number 2. Either drop the brackets and the 0, or drop the +44 altogether.

    The only correct way to write this number is +44 118 909 nnnn, or for the truly pedantic, +44118909nnnn. Maybe you wouldn't publish an E.164 formatted number (as the scheme is called) as your primary customer services number, and it doesn't make sense to use it for numbers that won't be dial-able from abroad (eg some 0870 numbers or 0800 numbers). But for everything else, I'd encourage everyone to please make sure your email signature has a properly formatted number (either simplifying it by dropping the +44 or losing the brackets and leading zero). If your company publishes your number in its online address book, then make sure that's formatted correctly too so that people using telephone-aware systems (like Windows Mobile or Outlook Voice Access) can correctly call you.

    In my profession, if someone doesn't figure that +44 118 909 nnnn is my phone number and that if they're in the UK and not in the Reading area, they need to drop +44 and add "0" if they're dialing from a plain old phone system, then I'm quite happy to have them not phoning me up...

  • Outlook 2007 signatures location

    Following my post about .sig files, I had cause to dig around looking for where Outlook actually puts the Signature files. I came across a post which Allister wrote a little while ago, but it's such a useful little tip that it's worth repeating...

    Basically, Outlook 2007 offers a nice simple editor UI for building your own signatures, however it's complicated by the need to present the signature in HTML (the default format for mail now), Rich Text Format (aka RTF, the original Exchange/Outlook format dating back to the mid 90s) and plain old Text (with total loss of colour, formatting etc).

    image

    Outlook actually generates 3 versions of the sig, to be used for the different formats. In some cases - notably with plain text - the end result following the conversion isn't quite what you'd want... my nicely formatted .sig about comes out a bit mangled, as

    Ewan Dalton |     | Microsoft UK | ewand@microsoft.com |+44 118 909 3318 Microsoft Limited | Registered in England | No 1624297 | Thames Valley Park, Reading RG6 1WG

    so it may be necessary to do a bit of tweaking to the formats. Do bear in mind, that if you do edit the sig files directly, then go back into the menu in Outlook and make some other change, your original tweaks will be over-written.

    Anyway, you could find the signature files in something akin to:

    [root]\Users\[username]\AppData\Roaming\Microsoft\Signatures

    (there may not be a \Roaming folder, depending on how your environment is set up, or it may be in \Documents and Settings\ and under an Application Data folder depending on your version of Windows).

  • "Success kills" - Marc Andreessen on Facebook

    Like so many other people in the last few weeks, I started using Facebook. They're growing at such a ridiculous rate, adding 100,000 new users every day, and it's reckoned that 50% of the millions of active users return to the site every day.

    Following a link from a post by Steve, and reading Marc Andressen's opinions on why Facebook is so successful (and what it's done spectacularly right, and what in his opinion its shortcomings are), one particular section shocked me the most... after discussing the viral spread of Facebook applications and focusing on iLike as probably the most successful. Facebook app developers need to host their own servers (and bandwidth) to provide the services that Facebook will provide the gateway to. When iLike launched, they had near-exponential take up of their application, which completely hammered the servers they had access to. Here's what Andreesen says subsequently:

    Yesterday, about two weeks later, ILike announced that they have passed 3 million users on Facebook and are still growing -- at a rate of 300,000 users per day.

    They didn't say how many servers they're running, but if you do the math, it has to be in the hundreds and heading into the thousands.

    Translation: unless you already have, or are prepared to quickly procure, a 100-500+ server infrastructure and everything associated with it -- networking gear, storage gear, ISP interconnetions, monitoring systems, firewalls, load balancers, provisioning systems, etc. -- and a killer operations team, launching a successful Facebook application may well be a self-defeating proposition.

    This is a "success kills" scenario -- the good news is you're successful, the bad news is you're flat on your back from what amounts to a self-inflicted denial of service attack, unless you have the money and time and knowledge to tackle the resulting scale challenges.

    I love that analogy - self-inflicted DOS :) But what a scary situation to be in - suddenly having to provide real-time, world-class infrastructure or else risk losing the goodwill of everyone who's accessing the service if it fails or is too slow.

    All of which makes me think - where on earth does the revenue to pay for all stuff this come from?