Here are some notes on “SQL Server 2008 Integration Services (SSIS)” I took while attending an advanced class on SQL Server taught by Greg Low (from http://sqlblog.com/blogs/greg_low/ and http://www.sqldownunder.com/).

Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without some pre-requisite knowledge and the right context. Reading the post from top to bottom will help.


SQL Server Integration Services (SSIS)

Demo

  • Open Visual Studio
  • Create  new project
  • In Control, drag a two items from the toolbox (execute process task)
  • Write notes to the control surface, text for documentation
  • Noting the red (X) indicating something is missing
  • Adding properties for each task (run calc.exe)
  • Two calculators running at the same time, boxes turn yellow (running)
  • Execution Results – your best friend, good place to look at what happened
  • Package Explorer – Looking at the two tasks created
  • Creating a sequence – green line means execute on success
  • Check properties of the the green line: success/failure/completion
  • Adding a third task. One runs on completion, one on success.
  • Adding a fourth task. Depends on another. Two starting points now.
  • Options to select two lines, configure properties, change from AND to OR…

ForEach

  • Unlike DTS, looping containers. Processing file by file in a batch, for instance.
  • Looking at the Foreach.
  • Set the collection property. For each file, For each time, For each SMO (database).
  • Drop a couple task inside.
  • Much simpler than DTS, no need for specific variables, etc.

Data Flow

  • Two tasks, data flow in the middle, links
  • Some data flows from task to task.
  • Data flow has its own tab, since the data flowing can be harder to describe.
  • In the Toolbox: Data Source (like ADO.NET), Data Transformation.
  • Drop an ADO.NET source
  • Connection – look at objects, switch to SQL command, write a query
  • Look at preview – very helpful
  • Look at columns – can change the name in the “Output Column”
  • Error output – What do you want to do when error occurs (you can fail, redirect, send to error output)
  • There’s also a connection object
  • Drop a Flat File Destination, connect the two.
  • Flat File – Has a connection also (?!). File name, format, columns, preview
  • Mapping – What input column goes to what output file columns
  • Run the package. Data flow is red… Not good.
  • Look at Progress… Can’t find column at the data source.
  • Look at data source and mappings again.
  • Run again, it’s happy…

Derived column

  • Look at transformations: Derived column.
  • Drop a derived column transformation.
  • Properties, add as new columns, create an expression.
  • Type an expression to add two columns, typecast for integer, return as Unicode.
  • Expressions, typecasts (WSTR=Unicode)
  • Add new flat file destination, now shows the derived column
  • Run again. Note – run in data flow tab also runs everything, not just data flow.
  • Looking at the resulting text file

Unicode

  • Go to SSMS, create a new database
  • Create TableA (ID int identity, description varchar(30))
  • Create TableB  (ID int , description varchar(30))
  • Insert a few rows in table A.
  • Now back Visual Studio, to move data from TableA to TableB
  • Set up the connection to the database
  • Add ADO.NET data source, use a SQL command to
  • Different between ADO.NET destination and SQL database?
  • Let’s try ADO.NET destination… Not really what we want.
  • Trying SQL Server destination… Create new connection. Select database, TableB.
  • Mapping are OK. Validation error: cannot convert between Unicode and non-Unicode (?!)
  • DTS didn’t have any type safety … Now we do!
  • Note – Everything it does internally is Unicode… Description is coming up as Unicode.
  • How to fix? Add a data conversion.  Input, Output, Data Type… Fix the mapping. Run. OK!

Script

  • You can create your down data source with a script
  • Source, Destinatio or Transformation.
  • Looking at the properties of the script (script, input columns, inputs and ouput, connection)
  • Script language options: Visual Basic 2008, Visual C# 2008
  • You can then drop to code and write whatever you want (you get a starting template).
  • You can also have a script task.
  • Note – This ends up as compile code, so it has good performance
  • Note – You can also write your own components in .NET

Also in the toolbox

  • Sources: XML, Excel
  • Destination: Data Mining model, SQL Server CE
  • Transformations: Most are straightforward

Fuzzy lookups

  • Using the fuzzy look transformation
  • Only enterprise edition
  • Looking for similar columns.
  • You set the “similarity threshold” under “advanced”
  • Useful for data de-duplication.

Lookup

  • Lookup transformation
  • Important for looking up data in data warehouse
  • Looking for the key that’s different in the DW (surrogate keys in DW environment)
  • Inferred members – I have a fact about the member, but I don’t know the member
  • Inferred members in the error output in the lookup
  • Slowly changing dimensions – Customer region as it is now or as it was then?
  • Type two slowly changing dimension – Surrogate keys for column, associated with a start/end date
  • Lookup performance is important, since you might have several lookups as you bring data in
  • Optimizations - Cache mode: full cache (connection type), partial cache, no cache

Other transformations in data flow

  • Merge
  • Merge Join – sort order
  • Union
  • FTP – Basic, considering writing your own with .NET
  • Web Service task – Also basic, remember you have full .NET support here…

Variables

  • Whole set of system variables
  • Add your own variables
  • Can be scoped to package or component
  • Scripts: ReadOnly or ReadWrite variables

Package

  • Project files, can add different types of files.
  • Looking at project properties. Can create a deployment utility.
  • Use the build option. Create a package.
  • Looking at the files that are part of it (.sln, .dtsx).
  • In bin\deployment – bits to deploy
  • DTS package is just an XML file. You can open it with notepad.
  • Config and SSISDeploymentManifest are also XML.
  • Copy the files, right-click the manifest and click deploy, create desired agent jobs.

Package Configurations

  • You can hard code connections in the package. Could be trouble down the line.
  • Consider abstracting those.
  • Check enable package configuration.
  • Where do you want to store the configuration? Default is an XML file, good choice
  • Other options: SQL Server, parent package variable, Registry, Environment variable
  • Configuration properties: Long list of items, including database connections, working directory
  • In connection, for instance, you can check the box for “Server Name”
  • Look at XML configuration file. It still says <DTSConfiguration> :-)
  • You can change it and run again with a custom value as you run in a different environment
  • Design for that upfront. Don’t put that on later…

Development

SSIS 2008

  • Improved Package Upgrade
  • New ADO.NET Component
  • New Data Profiling Task and Data Profile View – understand the data
  • Connections Project Wizard – for connection-only projects
  • Script Environment
  • Improved Data type handing in Import/Export wizard
  • Faster bulk loading
  • Support for date/time data types, higher precision options
  • Support for MERGE statement – option working row by row
  • Support for Composable SQL – OUTPUT clause to track inserted, deleted rows, $action

Upgrades

  • Upgrade advisor: Run before any upgrade (from DTS or SSIS 2005), reports on issues
  • Package Migration Wizard: Creates new SSIS package, leaves the old one alone
  • Package Migration Wizard – Careful – Can’t really deal with certains things, like transactions
  • From SSIS 2005: Careful – Package upgrade is one-way only
  • From DTS 2000: Consider using the DTS 2000 package task, dtsrun.exe still available
  • From DTS 2000: Careful – ActiveX script tasks
  • From DTS 2000: Careful – Connection related issues
  • From DTS 2000: You really should consider re-writing … There are much simpler ways now!
  • Consider works on SSIS 2008 ahead of time, before you install the SQL Server 2008 database 

Manageability, Troubleshooting

  • Design-time validation – Tries to connect to data sources to validated, can work offline
  • Set breakpoints, locals, call stack, immediate window, row count transform
  • For the data flow, consider using only a subset of data
  • Data Viewer – Helps you see the data as it is flowing through
  • Data Viewers - Commonly used with a Grid, also available as data charts
  • Logging, log providers (SQL, text, etc..)
  • Consider adding your own logging using the logging infrastructure
  • Consider using package-level event-handler for OnError event (send e-mail, perform task)
  • In SSMS – Connect to Integration Services (not associated with a specific SQL instance)

Change Data Capture

  • Ideal for populating data warehousing
  • Allows incremental loads
  • Debug dump files (.mdmp and .tmp) provide info about what happened
  • Consider using LSN to track instead not time (there are functions to convert)
  • Return as BINARY(10), need to deal with conversions
  • Books online example
  • See http://msdn.microsoft.com/en-us/library/bb895315.aspx

Security

  • Layered security. Both SQL and File System-based scenarios.
  • Packages can be encrypted (levels), digitally signed, stored in SQL, secure with file system ACLs
  • Can use SQL Server Agent to run SSIS Packages
  • See http://support.microsoft.com/kb/918760/ 
  • Where to store credentials? Consider storing the sensitive credential info in a database

Guidelines

  • Keep packages simple, readable. Use comments, naming conventions.
  • Consider dividing things into multiple packages, multiple containers.
  • Understand the business: Kimball’s ETL and SSIS books
  • Start with some blueprint design patters or ETL frameworks
  • See Erik veerman, rushabh Mehta, pass summit 2008
  • Careful – Have enough threads but understand asynchronous tasks, parallelism
  • Consider using more T-SQL statements, MERGE statement comes to mind
  • Careful – Avoid SELECT * and defining just “table”, which does a SELECT *
  • Careful – Don’t bring data you don’t need: filter data in data source, use T-SQL there as required
  • Careful – Consider optimizing the queries you use, covering indexes with INCLUDE
  • Careful – Use IsSorted if input is ordered. SSIS won’t know otherwise…
  • Consider using FastParse for text data that are clean
  • Understand the data types returned by the databases, queuing systems
  • Design your deployment strategy from the beginning, use configurations.
  • Buffers: SSIS uses large buffers to optimize the process
  • Careful - Width of the row used to define the size of buffer
  • Consider removing output components to optimize buffer use
  • Partially blocking components: Merge, Merge Join, Union All
  • Blocking components: Sort, Aggregate
  • CPU: Pipeline with multiple threads at once. Control with EngineThreads, MaxConcurrentExecutables
  • Transactions: Default isolation is Serializable. Do you really need that?
  • Discussion – Where to run SSIS – On same database server or another server? It depends :-)

Data Warehouse Loading

  • ETL – Extract, Transform, Load
  • ELT sometimes makes more sense…
  • Can do some work at extract time, use advanced queries, typecasts
  • Dimensions – Loaded first typically. Consider using multiple packages, run in parallel.
  • Slow changing dimensions.  Some built-in support. Might need to customize…
  • Discussion – What to do with inferred members?
  • Careful – Lookup transform is case sensitive
  • Snowflakes – Dimensions that are subsets of other dimensions. Loading order is important
  • Fact tables – Load dimensions first. Consider some advanced Lookup strategies, like MRU caching
  • Data mining – SSIS can help populate a data mining model. Data mining query component.

Related links