In my last post I went over the Deployment process that brings over MP’s from Service Manager and how those MP’s drive the structure, data, and reports for Data Warehouse and Reporting. Once the schema and reports are deployed, we need to populate the database with some actual data for reporting purposes. This is done via the Extract, Transform, and Load (ETL) process.
ETL works in this way (see figure below also):
Many folks wonder why it is necessary to have these three different processes. The three processes each serve their own specific purpose:
There are a few different reasons for having three different databases, but one of the main reasons is so that folks can optimize their hardware environment more easily. In high volume environments, the DWStagingandConfig and DWRepository databases will need to be on hardware that is optimized for read/write IO whereas the DWDatamart will need to be optimized for read IO. With that in mind, customers will be able to separate out the DW Datamart to a different server/drive from DWStagingandConfig and DWRepository (these two must remain on the same server) with Service Manager v1.
We’ve pretty much covered getting the DW configured, deployed, and populated with data. Next post, I am going to be talking about the reporting side of the house and how it will work within Service Manager. Please feel free to leave any comments/questions/suggestions for future topics.
How would I go about writing my own ETL-proces to get data from the appDB to the dwDB?
How do I find out what the load schedule is?