• Batch-convert Office files to 2007's Open XML format

    A customer asked me the other day if Microsoft was ever going to build batch-conversion facilities to take old format files that live on a network fileshare, and convert them to the newer XML-based formats - his reasoning was the sometimes considerable reduction in size when saving as .DOCX or .XLSX compared to the binary .DOC and .XLS formats.

    <wistful memories>

    I remember writing a tool to do exactly this with Word docs, going from Word 2.0 to Word 6.0, using Visual Basic to automatically pump the necessary keystrokes into a Word 6.0 application, using the SendKeys() function... crude and somewhat clumsy, but for a one-off process, it worked fairly well :)

    </wistful memories>

    Coming back to the present, I was pleasantly susprised to discover the release last month of the Office Migration Planning Manager - a collection of tools which allows for scanning of networked files to report on any potential conversion issues, and batch conversion of those documents (either by creating an Open XML format document alongside the old binary one, or by replacing the original with the OOXML format file).

    Remember of course that you can consume these formats in older versions of Office, using the Compatibility Pack. There's a growing movement to adopt OpenXML as an industry standard - ECMA has already given the format its blessing, and the ISO is reportedly amenable to ratifying the format as a standard also. The momnentum is growing for 3rd parties who are building support for OpenXML - even OpenOffice now has a way of consuming and creating OpenXML documents.

    UPDATE: Conincidentally, perhaps, Geek in Disguise, Steve Clayton posted last night about an online petition to the ISO to support the Open XML proposals- if you really value open-ness, even if Microsoft is the instigator of the efforts, go ahead and sign the petition...

  • Exchange 2003/2007 clustering & high availability

    The Exchange development team have done a nice job of expanding the high availability options with the 2007 release. With Exchange 2003, the only real HA design was to use what is now known as a Single Copy Cluster (SCC) - ie. there's one copy of the databases and log files, held on a Storage Area Network, and multiple physical cluster nodes connect to that SAN. Exchange 2007 introduced Local Continuous Replication and Cluster Continuous Replication, and is due to add Standby Continuous Replication later this year.

    In the 2003 model, the "Exchange Virtual Server" (EVS) was a collection of resources that run on one of the physical cluster nodes at any given time, with each EVS having its own name and IP address which the clients connected to.

    This model works well in providing a high level of service for clients - Microsoft's own IT department ran an SLA of 99.99%, a maximum of 53 minutes of downtime a year. Routine maintenance tasks (like patching the OS, upgrading the firmware etc) could be performed one node at a time, by having the workload fail over to the passive node during the maintenance period. The downside with this single-copy approach is that there's a single point of failure: the disks. Even though the SAN technology is highly fault tolerant, it's still possible to knock out the entire SAN, or to have some catastrophe make the SAN corrupt the data on the disks.

    Exchange 2007 added a couple of additions to the high availability arena - Local Continuous Replication (LCR), which doesn't use clustering at all, and Cluster Continuous Replication (CCR) which does. The name "Exchange Virtual Server" used in clustering has also changed to "Clustered Mailbox Server" to prevent confusion with the Virtual Server software virtualisation technology.

    Local Continuous Replication

    In an LCR environment, the server keeps a 2nd copy of its databases and log files on a separate physical set of disks, which could be in the same place (maybe even a USB disk hanging off the back of the server, if it was a branch office or small business one). Basic Architecture of Local Continuous Replication

    LCR could also replicate data to another datacenter using iSCSI storage, accessed over the WAN (assuming the bandwidth and network latency are OK). Downsides to LCR are that the server in question is doing more work (by keeping two separate sets of disks updated) and that there's no automatic failover - an administrator would need to manually bring the LCR copy of data back online, in the event of a hardware failure of the server.

    Cluster Continuous Replication

    CCR provides a more complex but more robust (in terms of recovery) solution. There are two nodes in a cluster (and there can only be two, unlike the SCC approach which could have up to 8 nodes), with each node containing a copy of the databases and the log files being used by the active node. When a log file is closed on the active node, the passive one will copy it over the LAN/WAN and will apply the changes to its own copy of the database. The plus side of CCR is that there's little overhead on the active node (since it's not taking care of the 2nd copy) and because we're using clustering, the nodes can fail over between each other automatically - they maintain a networked heartbeat between the nodes, so the passive node can tell if it needs to come fully online. 

    In the case where either planned or unplanned failover occurs, the passive node will take over the role of servicing users, meaning the clients continue connecting to the same name and IP address they were using to previously, and the formerly active node will now take up the passive role, and will start pulling any changes back from the newly activated one.

    In order to prevent the situation of both nodes coming online at the same time (something that's referred to as a "split brain" cluster), there's also a new "witness" role which is used to prevent the scenario where the passive node thinks the sky has fallen in and everything's gone dead, when in fact, it's the passive node that's fallen off the network. The witness is just a file share, which uses locking semantics to illustrate if the active node is still alive (since both nodes connect to the file share witness) - so if the passive node can read the witness and deduce that the active node is still running, it won't bring itself online, even if it can't currently see the heartbeat from the active node.

    CCR provides a solution to the single point of failure in the SCC model, but there are some limitations - namely, there can only be two cluster nodes, and they need to be on the same IP subnet. This means it can be tricky to have a node in a Disaster Recovery datacenter, what with needing to span a subnet and an AD site across the WAN. What many people feel would be the ideal scenario would be to have the both CCR nodes & copies sited in one datacenter, but then have a 3rd node in the DR datacenter, on a different subnet.

    Standby Continuous Replication

    Service Pack 1 for Exchange 2007 (due in the second half of 2007) plans to introduce a new replication paradigm called Standby Continuous Replication (SCR). This could be used in conjunction with a CCR model, where the active/passive nodes are in one place and will automatically fail over between each other, but a third (standby) node is in a different place. Activation of the 3rd node will only take place when both of the primary nodes are offline, such as if the primary datacenter failed completely. In that environment, a manual process will be followed to mount the databases on the standby node, similar to how an administrator would bring a backup copy from an LCR server online. The third node is not a member of the cluster, and will not need to be on the same IP subnet.

    SCR will also offer the option of having a standalone Exchange server sending a copy of its data to another standalone server, meaning that cross-datacenter fault tolerance could be achieved without clustering at all, albeit at the expense of a manual failover regime.

    More information on High Availability in Exchange 2003 can be found online here, and for Exchange 2007, here. Further details of what's going to be in SP1 will be posted in the coming weeks to the Exchange team blog.

  • Windows Live Mail Desktop - replaces Outlook Express for Hotmail use

    I've been running the "dogfood" version of Windows Live Mail Desktop (WLMD) for a while now, and found it to be really stable and usable. It's basically a superset of the built-in Windows Mail application from Windows Vista, which supercedes Outlook Express.

    WLMD is now available for beta testing (on Windows XP as well as Vista) from http://ideas.live.com and it works against MSN/Hotmail (including the mail from Office Live, so if you sign up for your own free domain name you can pick up the mail without being in a browser), POP/IMAP accounts and other providers' mail services, such as Yahoo!, AOL and GMail. It seems it's been available for some time, in fact :)

    I was prompted about this when Steve Clayton was being interviewed today on TalkSport Radio, and a caller had asked why Vista no longer gave him access to Hotmail... I guess he was meaning that since Outlook Express isn't the box any more, he was trying to use the supplied Windows Mail program, which doesn't offer the ability to connect to Hotmail... so the solution is to either stay with browser-based mail or to use WLMD.

  • Calibri: a font like no other...

    Someone asked me a semi-bizarre question today: the new fonts which are in Office 2007 and Windows Vista, especially Calibri (which, I must say, I think looks great)...

     
    Can they be installed on older versions of Windows or Office?

    I had never really appreciated all the work that goes into generating a decent font, including getting cross-industry support for stuff like building it into printer ROMs etc. It turns out there's a whole Typography research group within Microsoft - if you're interested in finding out anything more about fonts, I'm sure you'll get it there...

    Anyway, the answer to the question is two-fold...

  • Is your email compliant with the (UK) Companies Act?

    A semi little-known fact... as of the 1st January 2007, the rules for UK companies regarding business stationery changed. Just like every registered company is bound to include certain information (the registered office, the geography of registration (eg England & Wales) and its company registration number) on all its official letters & order forms, electronic communications now fall under this rule.

    As Companies House says:

    Whenever an email is used where its paper equivalent would be caught by the stationery requirements then that email is also subject to the requirements.

    I can honestly only think of one case where a company includes all this stuff in their email, along with a long-winded disclaimer. I suppose the rules are now in place and people are waiting to see how they're interpreted... might be worth thinking about including your details on your own e-mail .sig...

    There's quite a good discussion of the whole area on legal eagles Pinsent Masons site, here.

    Oh, and did you know that Exchange 2007 now has the ability to include standard disclaimers on all mail that passes through it? For a step-by-step illustration, have a look over on msexchange.org.