Management pack synchronization, MP sync for short, is a key component of the Data Warehouse in Service Manager. The synchronization of management packs from the Service Manager management server to the Data Warehouse (DW) is responsible for defining the content of the data in the DW.
The figure below shows an overview of how MP sync works. The MP Sync job synchronizes all the management packs from the Service Manager source. This job starts to run as soon as you register the Service Manager management group, and it can take several hours to complete on its initial run, with subsequent runs taking a much smaller amount of time to complete. Additionally the MP sync job is currently hardcoded to run on an hourly basis.
You can read more on MP synchronization here and here.
The issue here is that MP Sync has been a hot bed for errors related to the deployment and cleanup of management packs in the data warehouse for a while now. The primary complaint around MP sync is that when it “fails” it usually gets stuck in a running state and does not recover / fail safe. Consequently, the fragility of the DW has earned it a reputation of being a component which is rather easy to break, but incredibly difficult to recover.
To make matters worse, when our support team gets hit with MP sync issues, the difficult of diagnoses, and the amount of time and care required for a reliable recovery can in many cases be extremely high. In fact, one of our senior escalation engineers, Richard Usher, likened solving DW issues to “brain surgery”.
Based on the information collected from our support team, we’ve learned that in most MP sync issues in fact stem from custom MPs being imported into the management server. For example, most customers when “trying out” custom MPs, are unaware of an hourly sync job in the background trying to import those custom MPs into the DWStagingAndConfig database. Or for that matter, the clean-up required on the DW side when those same MPs are later deleted. This, as you can imagine, causes considerable churn in the system, substantially increasing the odds of MP sync getting stuck in a failed state.
The plan is to make MP sync happen on demand, allowing users to choose the time and possibly MPs they want synchronized, instead of the automatic inflexible schedule and scope it currently follows. The idea was originally proposed by Manoj Parvathaneni, a long time expert on SM and one of our best escalation engineers. A typical user scenario would look something like this:
1. The DW is registered with the Management Server, triggering a one-time forced synchronization of all MPs.
2. Content described by the imported MPs starts funneling from the CMDB to the DW for archival.
3. Admin imports new MPs to the management server to enable additional functionality and customizations, but since MP sync is not a scheduled job, no new changes get migrated over.
4. Once the admin is happy with the new MPs, they navigate over to the Data Warehouse Jobs view in the console, and hit resume under tasks.
Option 1: At this time we synchronize all MPs over to the DW, just like a regular MP sync job would have done.
Option 2: We show a new form, allowing administrators to choose the MPs they would like to synchronize (see mockup below)
Option 1 is clearly much easier to implement, and while it does allow for some measure of control on when the job runs, it doesn’t provide control over what gets synchronized and can be delivered fairly quickly, possibly in the coming update rollup in October.
Option 2 allows explicit specification of MPs targeted for synchronization and provides greater control. However, it does incur heavier engineering costs making it a more expensive solution which will probably be delivered in vNext.
Here’s how you can help
We would love to hear your thoughts on this. Would you consider either of these approaches feasible? Which one is more preferable to you, and why? Are there any scenarios we might be overlooking?
Please use the comments section below to let us know what you think :)
I think just having the job be disabled by default and then having to manually enable it and run it is probably not a good solution. It will prevent "accidental" MP syncs but the reality is that it won't really stop people from putting bad stuff in an
MP and then wanting to revert it etc. It will also create another problem. If you do that you are going to start having customers call you saying 'Why aren't my DW extensions showing up?'
In my experience the only reliable way to manage this is to develop DW extensions in a dev/test environment, test them thoroughly, and *then* import them into production. Creating a VM checkpoint during development is the only way to go. You can easily revert
back if something goes awry.
Also, customers should thoroughly test 3rd party DW extensions in a test environment first to make sure there won't be any complications that result from importing the 3rd party solution.
Option 2 is going to be tricky because of MP dependencies too.
Personally, I think the time is better spent in making the DW MP Sync job more robust and easier to revert changes. If there is bad DW stuff in an MP it doesn't matter if it is sync'd automatically or manually. It's still going to cause a problem.
Even though you asked us to select between two options, I have to agree with Travis here. Neither option will reliably prevent problems with the data warehouse, and might even introduce new problems or added confusion...
I would rather have a reliable data warehouse. I would hope there are some smart people at Microsoft who have been given the time and task to fix the issues instead of comming up with workarounds..
Learning tricks from various blog posts to fix strange DW issues seem to be the norm now. I wish it wasn't.
While I agree with Travis - that custom MPs should be run through DEV/TST first, bad stuff is still bad stuff and that the process of clean up would be a very worthy investment of time - I still believe that there is a very good case for improving the
control we have over the MPSync job, especially in a development environment where things are changing constantly as devs build customisations. I think it would be great to be able get things right first (for non DW extensions of course) without having to
worry about the DW and then run the MPSync. Also, not all Devs have the permissions to use VM snapshots or restore from backups, so this would give them a bit more control and avoidance of some DW issues.
The other thing to me is, does the MPSync job really need to be hourly in production, or any environment.
It would be nice if the MPSync (and other DW Jobs?) had admin options (like a connector for example) and then the job could have scheduling options:
1. Run on Schedule (defined)
2. Manual (Started from task)
3. Disabled (cannot be started from task)
Scheduled would be good for production, bit more hands off as things are already tested. Manual or Disabled for DEV.
Also, I would use "Run", "Start" or "Sync Now" for task names as "Resume" implies it was manually paused.
Option 2 with granular control over individual MP is probably a little unnecessary and effort is better spent elsewhere (like reliability and clean-up of DW Extensions).
So my thoughts would be Option 1 + configurable scheduling option. This would be very nice :-)
I completely agree with Travis & Kenneth - Please make the MPSync more robust and make it easy to remove MP's and their DW-changes.
As it is now I feel SCSM is becoming more and more of a product that needs advanced level consulting even for things that would appear as simple. That evoulution is not good neither for the product nor the users.
You should not need to think about this at all.
However, if there are no other alternatives than a workaround I'd vote for a mix between alternative 1 & 2 - an interface with a list of just the MP's that has not yet synced where you could select which that should sync.
I'd also want the interface to make it very visible for an admin that there are MP's waiting to be synced, an alert flag to the wunderbar or equivalent that would make it hard to miss. That would manage the associated risk with the workaround whilst making
it easy enough for most.
I agree with Travis that the synchronization job should be more robust and if there is something wrong with MP to skip it and its depended MPs. Of course error should be logged that will show why this MP cannot be synced with DW.
i'd highly suggest and request to SM team to rather spend time on export of data from corrupted DW database to easily import into newly registered DW store.
That's should manage to take out all the horror out of the overall solution, preventing unavoidable 'brain surgery' on DW at all.
The direction of two options still won't guarantee the proper use of manual sync, but would sure add either to administrative overhead or to the complexity of the solution.
The import and export option would give a great escape route to register the fresh DW and import the legacy data from corrupted DW machine without adding further complexity to the solution.
Option 1 does nothing in my opinion except confuse folks. I like the granularity of Option 2 but I do think the current method is a good method in the fact that I know if I seal an MP it will get synced over.. there isn't a question of why something isn't
over there.. Also Option 2 reminds me of trying to sync VMM / OM / SM and the confusion on the MPs that need to come over dependencies etc .. and why isn't something showing up ..
To me Management Packs are great things that I should be able to import and change the way the tool works, whether its OM or SM .. but it should also be very modular that if I remove it and it has no dependencies I should have the knowledge knowing I go back
to a start prior to the MP. That is where I would like to see the work go .. ensure that MPs are modular and can be imported and removed and not leave artifacts or jobs in bad states..
This is really good feedback folks! Thanks for taking the time to share it with us. It seems the general consensus is towards fixing the root cause of the issue, and not just reducing the likelihood of running into it.
Keep the feedback coming :) We will make a call on allocation of resources to DW tasks over the next couple of weeks.
As irritating as the data warehouse falling over is I feel that there are more important things to focus on (work items background updating while form is open any one?). I don't feel neither option adds any great benefit to the system and the focus should
be on making it work properly full stop. A consistent, easy method of getting extensions to work items into the data warehouse would be a better investment of time, along with some in built recovery methods/validation would significantly improve it and reduce
the chances of these errors happening in the first place. Heck even some decent error logs would be a start!
Don't get me wrong, the idea behind using analysis services is great as it provides massive flexibility for reporting over most other systems with only excel pivot table knowledge required, but improving the overall reliability and ability to add extensions
would be more welcomed IMHO
Folks, Thanks for the comments. This is great. Alex, WRT (work items background updating while form is open any one?), it is already in the list of items to be fixed.
Im just amazed and thrilled to finally getting to see a post about this!
That DW has haunted me so many times over many installations.
Great news Srikanth. Very much looking forward to the next major release. There's some really good traction behind the product currently and it will be great to further it's reach by getting some of these little bits sorted. Roll on vNext!
Why don't you stage it, do option one now which solves the immediate problem for developers and administrators. Then look to enhance that to allow packs to be explicitly chosen for syncing. If a bundle has been loaded or a solid dev session taken place
it option 2 can help isolate it. A though would also be if it could list the packs installed in the warehouse and give you the option to remove them, or perhaps with the selection idea an option to Remove selected packs from the DW so suspected offenders can
I think the issue that people have is based on the way the MP sync job handles issues. To have a reliable fail safe would be good enough to get things done. I always recommend a 2 or 3 way test cycle for development of things. If our customers do that
the chance to break something is very small. However to manually sync the MPs would bring up what Travis mentioned.
So I think that brings me in line that what Travis wrote already and I stick to Stuart completely as well. In a dev environment I makes sense to have more control on what is synchronized and when. This will reduce operational costs for a development environment.
In a production environment the schedule option should be recommended with a note to an article that explains the concept of dev test and run.