Insufficient data from Andrew Fryer

The place where I page to when my brain is full up of stuff about the Microsoft platform

September, 2011

  • Beyond Relational– Semantic Search with SQL Server FileTable

    Semantic Search new for SQL Server Denali adds extra intelligence to the FullText search capability that has existed in SQL Server for several releases, by looking at the relationships between words and in and across each document.  For example you can find which documents are similar to a specified document, and this could help with matching candidates’ CVs to job adverts, or you can find the key words and phrases in a given set of documents. 

    This statistical analysis information  is stored in a special database (which is supplied but not installed in SQL Server Denali) and is refreshed as part of the process that update the FullText indexes it depends on.

    Once it is setup you have access to three tabular functions:

    Note that in CTP3 only words not phrases work in these functions and that currently doesn’t understand word stems e.g. that cloudy and clouds are derivations of the term cloud and treats each as separate words.  This could be fixed as Denali gets nearer to release but there is no guarantee of this.

    There are examples of this in the adventure works database which you can try, but I wanted to see what it would do with my FileTable of PowerPoint presentations I created in my last three blog posts, and have a short screen cast on this ….

    and the 4 FileTable screencasts in this series are now linked (hyperlinks at the end of each screencast) so you can hop through the series.

    Here are some notes if you want to install semantic search:

    • You’ll need to have installed SQL Server Denali ctp3 with the FullText option. 
    • Having done that you’ll need to install the SemanticLanguageDatabase.msi in the SQL install media (there are x86 and x64 versions). 
    • This database then needs to be attached to SQL Server 
    • It is then registered with this statement:
    EXEC sp_fulltext_semantic_language_statistics_db @dbname = N'Semanticsdb'
    •   You’ll need to add FullText indexing on the FileTables (or any other table)  the extra semantic_statistics clause:
    CREATE FULLTEXT INDEX ON dbo.MyDecks
        (name 
            LANGUAGE 2057 
            STATISTICAL_SEMANTICS,
        file_stream 
            TYPE COLUMN file_type 
            LANGUAGE 2057
            STATISTICAL_SEMANTICS)
        KEY INDEX PK_MyDecks_5A5B77D56E0F5D8F
        ON DeepFat_Catalog
        WITH
            CHANGE_TRACKING AUTO,
            STOPLIST = SYSTEM;
  • Beyond Relational– FullText indexing with SQL Server FileTable

    In my part 3 of my series on FileTable I wanted to search the documents I had saved in parts 1 & 2. This is pretty straightforward, as Full text indexing has been in several versions of SQL Server and is SQL Server 2008 R2 the FullText indexes are part of the databases they refer to. 

    What caught me out was that I sort of assumed that SQL Server Denali would be aware of the Office 2010 document types and it isn’t without doing some extra steps.  To explain:  If you store documents in SQL Server be that in the varbinary(max) column of a table or using the new FileStream or really new FileTable, SQL Server needs to understand what the document is so it can open it to index it.  This is done using ifilters and this works by having another column in the table that contains your varbinary(max)  data which refers to the relevant ifilter for each document, and note: you also need to have a single column, non computed, unique, clustered index on the table in order to create a FullText index.

    There is a standard management view to see what ifilters are in operation:

    select * from sys.fulltext_document_types

    You can add more ifilters to the database server from Microsoft and other vendors e.g. from Firefox or Adobe for pdf.  I wanted the Office 2010 ifilters as I am using PowerPoint 2010 in my demos, but when I tried to install them they were already there, however they weren’t referenced by SQL Server.  So after installing an ifilter you need to go into SQL Server and run these commands:

    EXEC sp_fulltext_service @action='load_os_resources', @value=1;
    EXEC sp_fulltext_service 'restart_all_fdhosts';

    to reference the new ifilters and restart the FulllText service. you’ll also then need to restart SQL Server itself.

    Anyway back to my FileTable.  FileTables have a fixed schema but do meet the requirements for FullText indexing:

    • file_stream is the varbinary(max) column that references the document itself.
    • file_type contains the file extension of the file which will correspond to the ifilter.
    • there is a unique index that meets the FullText criteria for a unique clustered  index.

    So you can right click on a FileTable and run the FullText wizard as I have done in this short screen cast…

    ..or you can use T-SQL:

    CREATE FULLTEXT CATALOG DeepFat_CATALOG AS DEFAULT;
    
    CREATE FULLTEXT INDEX ON dbo.MyDecks
        (name LANGUAGE 2057,
        file_stream TYPE COLUMN file_type LANGUAGE 2057)
        KEY INDEX PK_MyDecks_5A5B77D56E0F5D8F
        ON DeepFat_Catalog
        WITH
            CHANGE_TRACKING AUTO,
            STOPLIST = SYSTEM;

    Note: the language code 2057 is British English,  and  the index name will be different as it’s a random guid created by SQL Server for each FileTable

    The FullText index will be populated automatically unless you specify otherwise, and I didn’t so I can use it immediately like this to test that it worked:

    SELECT name
    FROM dbo.MyDecks
    WHERE
    FREETEXT (file_stream, 'nist')
    and creation_time > '2011/01/01';

    and this returns the decks I created this year that have the term nist in (the National Institute for Science & Technology) . 

    Next time I am going to hook up the new Semantic Search, capability in SQL Server Denali to my FileTable. In the meantime if you want to try this yourself then you’ll need SQL Server Denali ctp 3

  • A quick introduction to Hyper-V

    Further to my last post, I also see a lot of confusion around Microsoft’s server virtualisation engine, Hyper-V, in part because it is so easy to setup. The easiest way to start using Hyper-V is by adding the Hyper-V role in Windows Server (2008 & later). Roles are the services that a server provides and features are what a server does and by default all roles and features are disabled when  you  do a clean install of Windows Server.

    Having done that (which requires a reboot) you get the hyper-V manager when you expand the role and then connect to you server and you are ready to create or import virtual machines (VMs).  The problem with this approach is the appearance of a whole copy of windows on which you then put your virtual machines and this is not strictly true, what you have is a copy of of Windows from which you are managing Hyper-V as per this old diagram:

    image

    so Hyper_V sits under all of the operating systems including what is referred to as the parent partition on this diagram (which is nothing to do with parent partitions on disks)

    This is not the only way to run Hyper-V, there is free version of Hyper-v called Hyper-V server which you can download here.  This is based on server core, an install time option in Windows Server (2008 & later) that is the bare minimum operating system with no real graphical interface, and is either managed from the command line (e.g. with PowerShell, DOS, netsh, diskpart etc.)or remotely.  Hyper-V server is even more cut down, it has all of the roles and features removed except hyper-V and so you need be pretty good at command line stuff or know who to connect and setup the remote administration tools for Windows.

    So having got a hyper-V environment setup you can then use the create new virtual machine wizard in the virtual machine manager to create new virtual machines.  This process is similar to the way you would specify your requirements for a physical server except that you are telling the wizard which resources you are using on the physical server the VM will run on.   You can also import a VM from an export created on a another Hyper-V environment or complete the wizard but use a VHD that you have got from somewhere e.g. there are some on  Microsoft to save you having to install and configure a Microsoft application in order to evaluate it.

    If you are creating a new virtual machine, then the virtual hard disk will be empty and will need an operating system.  Theoretically This can be anything that runs on x86 x64 hardware form DOS 3.3 to Windows 8, or even other OS’s like Unix, and Linux.  However Microsoft will only support it’s operating systems and applications that are supported to run on physical hardware e.g. Windows 7 and not Windows 95 which is out of support.  They word support here means you can get support from Microsoft and not a vague statement along the lines of  “we got it to work but you’re on your own if you get stuck”. When it comes to Linux the latest versions of Red Hat SUSE and Centos are also supported because those distros have been made the subject of support arrangements between those Linux vendors and Microsoft so you can get support from Microsoft for them.

    I have posted a step by step PowerPoint deck to help you do all of this on SkyDrive (if you view it in the browser expand the notes at the bottom of the screen). If you want to try this all you’ll need is an old machine* and an evaluation copy of Windows Server  2008 R2 sp1

    *The old machine will need to have either an Intel-V or AMD-VT cpu and you may need to enable virtualisation in the bios to get hyper-V to install cleanly

  • SQL Server sprawl

    Normally the term sprawl carries an implied criticism, in the case of SQL server that DBAs have lost control of their infrastructure and users are install it willy nilly without regard for data protection or disaster recovery.  However when it comes to SQL Server user group meeting then sprawl is a good thing, as it means there’s probably regular user group meetings somewhere near you on a regular basis.  In October for example there is SQL Relay a UK wide series of events , I am doing three of these and Itzak ben Gan, SQL rockstar form SQL Skills has got a gig in London on the 6th.

     

    Monday 3rd October

    Manchester SQL Server User Group

    SQL Relay: An Evening With Allan Mitchell(MVP) on Data Quality Services and Master Data Services in SQL Server Denali

    Manchester

    http://sqlserverfaq.com/?eid=319

     

    SQL Midlands User Group

    SQL Relay: High Availability with Tony Rogerson(MVP) and SQL Denali with Andrew Fryer from Microsoft

    Birmingham - Midlands

    http://sqlserverfaq.com/?eid=296

     

    Surrey SQL Server User Group

    SQL Relay: A packed evening with amongst others Christian Bolton MCA/MCM/MVP

    Farnham - Surrey

    http://sqlserverfaq.com/?eid=321

     

    Kent SQL Server and .NET User Group

    SQL Relay: This wisdom of James Rowland-Jones(MVP) & Replication with Neil Hambly

    Ashford - Kent

    http://sqlserverfaq.com/?eid=322

    Tuesday 4th October

    Leeds Area SQL Server User Group

    SQL Relay: Mobile Business Intelligence in Action with Jen Stirrup (MVP) and Andrew Fryer (Microsoft Evangelist) on BISM and Crescent

    Leeds - West Yorkshire

    http://sqlserverfaq.com/?eid=299

     

    London SQL Server User Group

    SQL Relay: On Guitar Mr Chris Testa-O'Neill (MVP) and Neil Hambly on XQuery

    London

    http://sqlserverfaq.com/?eid=320

     

    SQL Server Club

    SQL Relay: An evening with Tony Rogerson (MVP)

    Bristol

    http://sqlserverfaq.com/?eid=325

     

    SQL Herts User Group

    SQL Relay: An evening with Chris Webb (MVP)

    Hatfield - Hertfordshire

    http://sqlserverfaq.com/?eid=281

     

    Maidenhead SQL Server User Group

    SQL Relay: Inaugural Meeting of the Maidenhead SQL Server User Group

    Maidenhead - Berkshire

    http://sqlserverfaq.com/?eid=328

    Wednesday 5th October

    Scottish Area SQL Server User Group

    SQL Relay: Andrew Fryer (Microsoft Evangelist) on BISM and Crescent and Martin Bell(MVP) talking about What's new for developers in SQL Server Denali

    Edinburgh - Scotland

    http://sqlserverfaq.com/?eid=311

     

    Cardiff SQL Server User Group

    SQL Relay: SQL Server upgrade 2000/2005 to 2008/R2: best practices with Satya SK Jayanty (MVP)

    Cardiff - Wales

    http://sqlserverfaq.com/?eid=295

     

    SQL South West User Group

    SQL Relay: High Availability with Tony Rogerson (MVP)

    Exeter - Devon

    http://sqlserverfaq.com/?eid=323

     

    Southampton SQL Server User Group

    SQL Relay: Justin Langford on SQL Server Denali and Nicholas Colebourn talking about designing tables for scale.

    Southampton - Hampshire

    http://sqlserverfaq.com/?eid=309

     

    Cambridgeshire SQL Server User Group

    SQL Relay: Inaugural Meeting of the Cambridge SQL Server User Group

    Cambridge

    http://sqlserverfaq.com/?eid=326

     

    Essex SQL Server User Group

    SQL Relay: Inaugural Meeting of the Essex SQL Server User Group with Chris Testa-O’Neill and Dave Ballantyne

    Basildon - Essex

    http://sqlserverfaq.com/?eid=327

    Thursday 6th October

    UK SQL Server User Group

    SQL Relay: SQL Relay Finale with Itzik Ben Gan

    London

    http://sqlserverfaq.com/?eid=324

    We are all looking forward to this with great excitement.

    and I’ll be in Newcastle on the 6th of October presenting on SQL server to NEBytes.

    If this isn’t enough then SQL Bits “Query cross the Mersey” 29th Sep – 1st October is also happening if you can spare a day or more out of the office and/or away from the family.

  • Beyond Relational - Working with SQL Server FileTable

    In my last post I setup FileTable in SQL Server Denali, and now I ma ready to start using it. Just to recap before I do that FileTable is a way of storing files in what appears to be a normal set of folders under a UNC share but in such a way that  these files are under the control of SQL Server.  The instance is the share name, and then the database has a folder associated with it underneath which there is a folder associated with each FileTable as you can see see from this short screencast..

    This all works because under the covers you are actually using Filestream - when you create a database that you want to use for FileTable the database must have a special Filestream file group, and this has a directory associated with it which is actually where the files reside on disk.  However as with previous version of SQL Server which have the Filestream feature, the files themselves have random GUID filenames and live in folders with GUID names (as you can see in the screencast).

    Another interesting thing about FileTable is the interlay between relational command on the FileTable and what happens to the files – in my screencast I deleted all of the files that weren’t 2007 or later PowerPoint decks ( file type <> pptx) and while the rows in the table were instantly deleted, it took a couple of minutes for them to disappear from the FileTable folder.  However this doesn’t affect referential integrity because if I issued a backup command after the delete only the files that correspond to rows that remained in the table would be backed up.  This is exactly the same as happens with Filestream it’s just that in Filestream you weren’t exposed to the files (with the GUID filenames in the directory specified for the FileStream File Group).

    So hopefully this has helped get your head around what FileTable is all about but how useful will it be? Although you can control the files in FileTable from SQL Server you can’t really interact with the contents, you can’t look inside those files. Actually you can and in my next screencast I’ll hook up full text search which has also been getting better and better in each release of SQL Server.

    In the meantime if you want to try this yourself you’ll need to pull down SQL Server Denali ctp 3

  • A quick introduction to server virtualisation

    This might seem an odd post to write, given that server virtualisation has been around for years, however at a couple of recent events I was at it became obvious that there still loads of intelligent, reasonable IT professionals who don’t understand what it is.  If that’s you please read on if not I won’t be offended if you click away now.

    So what is server virtualisation and why should you care?

    It started off as a simple recognition that most servers and PCs are largely idle , running at between 10-15% of capacity.  This is waste the capital asset – the hardware itself and it also consumes a lot of unnecessary power. 

    The concept is a simple one: run multiple virtual machines (VMs) on one physical server, each with its own operating system and applications. Each disk in a given VM shows up as a single  file on the physical server and other resources such as networking memory and cpus are then assigned to each VM through some sort of console (depending on the vendor). 

    So implementing this approach means that one server is now doing the work of 6 –10 servers, and this gives all the benefits of less power less cooling and less space in needed in a data centre.  There is also now the possibility to move VMs from server to server to balance workloads, to copy them for testing e.g. applying patches etc.  and development.

    Desktop operating systems like Windows 7 can also be run as VMs on servers and end users can then connect to one of these from a thin client device or any old desktop to create a Virtual Desktop Infrastructure (VDI)

    Of course there are downsides to server virtualisation:

    • The failure of any of few physical servers left will have a bigger business impact so steps should be taken to make them highly available.
    • Each VM still needs to be maintained patched etc.
    • VM sprawl can be an issue now that it is so easy create new VMs to management and control are essential.
    • It’s not obvious which VMs are mission critical and who owns them so management and control critical.

    However these have not prevented many organisations adopting this on a wholesale basis for their production environments and there is a healthy job market for  IT Professionals who are experts in that field. Many of the skills required are simply an extension of what most of us do today in configuring physical servers, however the setting are now done in management tools or via scripts rather than, fiddling with actual hardware.

    To find out more about virtualisation Microsoft has the Microsoft Virtual Academy , a collection of online resources to help  people new to virtualisation get up to speed complete with simple tests to make sure it’s all sinking in.