All things Data Related....

On this site I will be posting entries related to Data platform and analytics that I learn and believe could be valuable learning to others who want insight from their data

Improve ETL performance in SSIS by introducing parallelism in the data flow of an SSIS package.

Improve ETL performance in SSIS by introducing parallelism in the data flow of an SSIS package.

  • Comments 2
  • Likes

There is a new transform component available for SQL Server Integration Services. It’s called the Balanced Data Distributor (BDD) and the download is available here.

Note

Downloading the BDD transform can be a bit tricky as when you go to the above site it will only have a link to a word document. Please download this document and read the instructions, at the bottom of it is a button with “I Agree”, hover your mouse over it and use Ctrl + your left mouse click to open the link to download the actual msi package that contains the BDD component. (its bit of an IQ test Smile)

image

The BDD provides an easy way to amp up your usage of multi-processor and multi-core servers by introducing parallelism in the data flow of an SSIS package.This diagram gives a trivial example:

If you run this data flow on a laptop, there probably won’t be any speed advantage, and there may even be a speed cost.  But suppose you run this on a server with multiple cores and many disk spindles supporting the destination database.  Then there might be a substantial speed advantage to using this data flow.

For more information read the SQL Server Performance Teams blog article on this component -  http://blogs.msdn.com/b/sqlperf/archive/2011/05/25/the-balanced-data-distributor-for-ssis.aspx

Enjoy speeding up the data refreshes to your data warehouse.

Comments
  • <p>The performance benefits do become very clear on large servers, but on smaller machines there is a perfomance loss a lot of the time because of contention issues. Nice article!</p>

  • <p>Parallelization for queries (SELECT) is covered quite well by the SQL engine itself, but when it comes to large volume data modifications (UPDATE, INSERT, DELETE), the standard engine does parallelize towards best use of all available resources (disk, multiple cpu-cores, etc.).</p> <p>There exists also a lightweight more generic and engine focused approach ;-)</p> <p>The concept of SQL Parallel Boost is a native T-SQL based solution an allows a performance gain up to factor 10 (!) in a 8-16 CPU core environment.</p> <p>The purely SQL engine related parallelisation solution takes advantage of minimized complexity and has no &#39;external&#39; components like SSIS involved, Furthermore it&#39;s the best performing solution regarding task splitting and synchronization, as it hasn&#39;t potential connection and communication overhead. The solutioncan also be used to execute multiple SQL statements in parallel initiated thru the same connection.</p> <p>The free &#39;Community Edition&#39; can be downloaded at <a rel="nofollow" target="_new" href="http://sqlparallelboost.codeplex.com">sqlparallelboost.codeplex.com</a></p>

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment