SQL Server: Using SSIS to retrieve data from AS400

Using SSIS, getting data from one SQL Server to another SQL Server is straightforward enough. However, what about getting data from other non-Microsoft data sources? Here, we will discuss the integration of data from an AS400 system with SQL Server 2005, and insertion into a SQL Server 2005 data warehouse. There isn’t much documentation on the subject of getting AS400 data into SQL Server 2005. The current blog will provide a summary regarding data retrieval from an AS400 to a SQL Server 2005 database, which will hopefully help to start other people on the right track.

The key benefit of is that report consumers can see a unified, integrated view of disparate data sources (AS400/SQL Server/Excel), often for the first time. Further, the integrated data can be presented effectively and easily within a browser by Sharepoint. Since the data is presented in an integrated manner, this can reduce the load on IT to produce reports. This releases the power to create reports into the hands of the end users. This important feature allows user to manage, produce and edit their own reports in accordance with their requirements, and within their own timescales.

One issue in extracting data from disparate data sources is that people tend to know SQL Server 2005 or AS400 very well – but not both systems. One issue can focus in getting the two groups of experts together and finding a common language to ensure the successful retrieval of the AS400 data using SSIS. On one hand, AS400 experts may not recognise that SSIS is a key component of ETL in SQL Server. On the other hand, as a SQL Server fan that was familiar with Microsoft SQL Server service packs, it wasn’t particularly obvious to me that IBM produce hotfixes for the AS400 as well. In particular, there is an IBM hotfix specifically for SSIS. In terms of creating a SSIS package which retrieves data from the AS400 and places it into SQL Server, using this hotfix was an essential step to ensuring success.

Another issue in retrieving data from distinct data sources is the troubleshooting process. Due to the integration of two or more systems, it can sometimes be difficult to identify where failures may be occurring. From the IBM AS400 perspective, it can be difficult to identify issues since, without the hotfix, the situation might exist that no errors are being logged in the trace or log files for AS400. From the SQL Server perspective, it was found that the SSIS package was not able to access customised error files. It was found that this was due to the fact that the designated package login didn’t have appropriate permissions to run. Specifically, the user was not part of the IIS_WPG (IIS Worker Process) group. Once this was resolved, the package could write to the log files. More information about this issue can be found on an excellent forum post.

Advice on creating SSIS packages

To get started, it is worthwhile looking at the Kimball Data Warehouse methodology. Some of the specific SSIS main points are distilled here:

1. Use an ‘Audit’ table to store information about whole ETL execution process. This is very useful for troubleshooting.

2. Use a ‘Parent-Child’ package structure. Here, one main package is the ‘parent’, which executes many smaller packages. Every small package does a small piece of work and loads data in one table. Again, this is useful for identifying any issues in loading the data into the data warehouse.

3. Naming Conventions are very useful. It is useful to have package names set to be the same as table name. This simplifies the corresponding package search and clarifies exactly what package does.

It is also useful to look at SSIS package tuning. A good document can be found here on this subject.

The steps involved in using SSIS to access AS400 data can be found here:

1. AS400 Data Provider - In order to extract data from the AS400 to the SQL Server, it will be necessary to install some special providers. There are a number of these providers available, for example, IBM Client Access Data Provider. In order to ensure consistency, it is worthwhile checking whether your client has other data going from AS400 to SQL Server, since they may already have this provider in-house.

2. Hotfixes – it may be worthwhile looking at adding hotfixes from the AS400 side, as prescribed above. Further, it is also suggested that it is worthwhile implementing Server 2005 Service Packs where applicable. In my experience, implementing Service Pack 2 for SQL Server 2005 was useful since it specifically resolved a memory issue.

3. SSIS Package configuration – An AS400 trigger was set up to export data to specific Change Data Capture (CDC) tables on the AS400. The SSIS package was only required to extract the data, and then clear out the tables when completed successfully. To do this:

a. An OLE DB source component was deployed to retrieve data from dedicated CDC tables on the AS400.

b. The data source was set to the ‘IBMDASQL’ connection.

c. The Data Access mode was set to ‘SQL Command’

d. Due to the CDC architecture, a simple ‘select *’ SQL Statement was used in order to retrieve data from CDC tables.

e. It is possible to press the ‘Preview’ button in order to verify the data.

Since the trigger work was all done at the AS400 source, this removed the complexity away from the SSIS package.

To summarise, it is possible to use SSIS to obtain data from non-Microsoft data sources. Here, the actual retrieval of the data from AS400 using SSIS was straightforward once the additional IBM elements were in place. It is important to note that the SSIS packages that retrieve data from the AS400 have been running robustly and successfully now for some time, and this is a mechanism I would consider for data retrieval in the future.