Welcome to TechNet Blogs Sign in | Join | Help

Content Deployment – The complete Guide – Part 6 – Logging

Part 1 - The Basics
Part 2 - The Basics continued
Part 3 - Configuration
Part 4 - Communication
Part 5 - Quick Deployment
Part 6 - Logging
Part 7 - coming soon...
Part 8 - coming soon...
Part 9 - coming soon...

ULS Log

Both Content Deployment and the Content Deployment and Migration API use the ULS log to report problems and progress information using the following categories:

For the WSS Content Deployment and Migration API:

  • Category: Upgrade

For the MOSS Content Deployment feature:

  • Category: Content Deployment

To troubleshoot, content deployment issues these two categories should be set to Verbose:

This will ensure that the whole progress of content deployment is logged into the ULS logs.

In some situations it might be required to add additional categories (such as "General") as the content deployment and migration API uses the WSS object model during export and import. If a problem occurs inside the WSS object model the error will show up in the related category first.

Export/Import log

An often requested feature is to get the same log files as you will automatically get them using STSADM -o export/import.

There are two possible ways to get this log file for content deployment:

  1. Install October 2009 Cumulative Update for WSS and MOSS

    If you have this patch level or later these logfiles are always automatically generated and can be found on the export server of the source farm. The location of these logfiles can be found in the content deployment job report:

    The name in front of the path is the machine name where the logfiles can be found. The logfiles are stored on this server in cab file format.

    The directory will be in the ImportExportLogs directory which will be created in the directory configured as temporary path in the content deployment general settings.

    Starting with October 2009 CU these log files will always be created and it is required to manually clean them up to free disk space.

  2. The second option is available also with previous builds. If you enable verbose logging for the "Upgrade" category you will find the entries which are usually written to the export/import logs in the ULS log. You only have to filter the logfile for the "Upgrade" category

    The problem with this method is that you will only see the entries AFTER you changed the logging option to verbose. So you will usually not have this information available when the deployment failed the first time.

Content Deployment – The complete Guide – Part 5 – Quick Deployment

Part 1 - The Basics
Part 2 - The Basics continued
Part 3 - Configuration
Part 4 - Communication
Part 5 - Quick Deployment
Part 6 - Logging
Part 7 - coming soon...
Part 8 - coming soon...
Part 9 - coming soon...

Quick Deployment Concepts

Quick deployment enables authors to select specific items to be quick deployed as soon as possible. The problem here is that authors are usually not allow to start timer jobs. To overcome this limitation it is required to store the information about which items have to be quick deployed at a place where a normal author of the site collection has write permission. And this is a list inside the source site collection. The quick deploy job that performs the actual deployment has to read the information about the items to be quick deployed from this list when it runs.

As multiple different quick deploy jobs which can run with different schedules can originate at the same source site collection it is required that the items are flagged for each of quick deploy job separatelly. So it is required that the author knows about all the enabled quick deploy jobs. As the author does not have rights to enumerate all quick deploy jobs it is required to store the information about all enabled quick deploy jobs at a place where the author can read. And this is in the property bag of the root site of the source site collection.

Enable Quick Deployment

To enable quick deployment you need to modify the settings of the Quick Deploy Job listed on the Manage Content Deployment Paths and Jobs page.

To enable Quick Deployment you need to check the "Allow Quick Deploy jobs along this path" option.

As soon as a quick deployment job for a path is enabled two things will happen:

  1. The job will be marked as enabled by adjusting a field in the list item relate to this quick deploy job in the Content Deployment Jobs list if the root site of the central admin web site.
  2. The unique Id of the quick deploy job is added to the quickdeployjobs property in the property bag of the root web of the source site collection which stores the Ids for all enabled quick deploy jobs in XML format.

Allow users to perform Quick Deployment

To be allowed to perform quick deployment, a user needs to have rights to create new items in the hidden Quick Deploy Items list in the root site of the site collection. This list does not inherit permissions by default and only site collection administrators and members of the Quick Deploy Users group have the required permissions.

Be aware that the title of this list is localized. In a german system (e.g.) the title will be Elemente für schnelle Inhaltbereitstellung. But the URL will always be http://url-to-site-collection/Quick%20Deploy%20Items.

The recommended way to give users permissions to perform a quick deployment is to add them to the Quick Deploy Users group.

Perform a quick deployment

The following prerequisites need to be fulfilled to allow a user to perform a quick deploy:

  1. The user has contributor rights on the Quick Deploy Items list - means he is a member of the Quick Deploy Users group.
  2. At least one quick deploy job is enabled for the site collection - means that at least one unique id for a quick deploy job has been added to the quickdeployjobs key.
  3. The page is in a valid state to be quick deployed - means that the latest revision of the page is a major version or an approved minor version that is scheduled to go live in the future.
  4. No save conflict has happened.

A user can perform a quick deployment of a publishing page using the console provided by the publishing feature. This action creates an entry in the Quick Deploy Items list (which resides in the root Web of the source site collection) for each of the enabled quick deployment jobs providing the server relative URL to the item that has been selected for quick deployment.

That means if the site collection is the source for three content deployment paths where each has an enabled quick deployment job you will get three list items per quick deployed publishing page.

As soon as the quick deploy job gets executed it will retrieve the list of all the items flagged for this specific quick deploy job and deploy them. After a successful quick deploy the deployed items are removed from the Quick Deploy Items list.

Potential problems related to Quick Deployment

Possible Problems when enabling a quick deploy job

As outlined earlier in this section the quickdeployjobs property in the property bag of the source site collection has to be adjusted to enable a quick deploy job. Problems will occur in a situation that the user trying to enable the quick deploy job does not have rights to adjust the value of this property.

In this situation the user will see an access denied and an option to enter credentials of a different user. Using the "Back" button in the browser will allow the user to go back to the content deployment paths and jobs page and here he can see that the quick deployment job is marked as enabled now.

The problem is that only part of the enabling has happened: the job itself has been flagged as enabled by adjusting the field in the list item in the content deployment jobs list in the central admin web site but the unique Id of the job has not been added to the property bag of the source site collection.

The result of this inconsistency will be that the items being quick deployed will not be added to the quick deploy items list for this specific quick deploy job.

In case that this is the only quick deploy job for this site collection it will also mean that the Quick Deploy option in the authoring toolbar will remain disabled.

The following method is very effective to resolve this issue:

  • Disable the quick deploy job
  • Ensure that the user has rights to update the property (e.g. by adding him to the site collection administrator group)
  • Enable the quick deploy job again

Possible Problems with backup/restore

In case that quick deploy has been enabled for a site collection before taking a backup means that the property bag of the root site of the site collection contains the unique id of a quick deploy job of this farm.

After restoring the backup in a different farm means that the quickdeployjobs property now contains the unique id for a job that does not exist in the current farm.

Problems with this scenario will occur as soon as quick deployment is enabled for this site collection again in the new farm. What happens now is that whenever an author selects a page for quick deployment two entries will be placed in the quick deploy items list: one for the job in the new farm and one for the job that only existed in the original farm.

The items for the job in the new farm will get deployed and afterwards these items will be removed from the quick deploy items list.

But the items that have been added for the job that does not exist in the current farm will remain in the list forever (or at least till they get cleaned up manually).

In the past I have had a couple of support cases where several ten-thousand of these orphaned entries existed in the quick deploy items list.

Even cleaning up such a huge list is challenging and time consuming.

To correct this it would be required to remove the invalid entries from the quickdeployjobs property in the property bag of the source site collection and afterwards it would be required to delete all entries from the quick deploy items list which do not belong to valid quick deploy jobs.

The following method is very effective to resolve this issue:

  • Manually start all quick deploy jobs for the site collection to ensure that the quick deploy items list gets cleared from valid entries
  • Disable all quick deploy jobs to ensure that the ids of the valid quick deploy jobs are removed from the quickdeployjobs property in the property bag of the source site collection
  • Delete the quickdeployjobs property using object model (it will be recreated when enabling the quick deploy jobs later) (see code sample below)
  • Delete all entries in the Quick Deploy Items list (only invalid items should be in the list by now) either manually or using object model! Important: do not delete the list itself! Only delete the items in this list!
  • Reenable all quick deploy jobs for the source site collection which will recreate the quickdeployjobs property and add the unique ids of the valid quick deploy jobs back to the property.

Code sample

The following code sample demonstrate how to enumerate all enabled quick deploy jobs in the system and dump the unique ID of these quick deploy jobs:

using System;
using Microsoft.SharePoint.Publishing.Administration;

namespace StefanG.Tools
{
    class DumpQuickDeployJobs
    {
        static void Main(string[] args)
        {
            foreach (ContentDeploymentJob job in ContentDeploymentJob.GetAllJobs())
            {
                if (job.IsEnabled & job.IsQuickDeployJob)
                {
                    Console.WriteLine("---------------------------");
                    Console.WriteLine("Job ID {0}, Name {1}", job.Id, job.Name);
                    Console.WriteLine("Path: " + job.Path.Name);
                    Console.WriteLine("Source Site Collection: " + job.Path.SourceServerUri.ToString() + job.Path.SourceSiteCollection);
                    Console.WriteLine("Target Site Collection: " + job.Path.DestinationServerUri.ToString() + job.Path.DestinationSiteCollection);
                }
            }
        }
    }

The following code sample demonstrates how to dump the content of the quickdeployjobs key for a given site collection:

using System;
using Microsoft.SharePoint.Publishing.Administration;
using Microsoft.SharePoint;

namespace StefanG.Tools
{
    class DumpquickdeployjobsProperty
    {
        static void Main(string[] args)
        {
            using (SPSite site = new SPSite("http://localhost:2000"))
            {
                using (SPWeb rootweb = site.RootWeb)
                {
                    if (rootweb.Properties.ContainsKey("quickdeployjobs"))
                        Console.WriteLine(rootweb.Properties["quickdeployjobs"]);
                    else
                        Console.WriteLine("Sorry, quickdeployjobs property does not exist.");
                }
            }
        }
    }
}

The result emitted by the above listed code piece will look similar to this:

<?xml version="1.0encoding="utf-16"?>
<ArrayOfGuid xmlns:xsi="http://www.w3.org/2001/XMLSchema-instancexmlns:xsd="http://www.w3.org/2001/XMLSchema">
<guid>97e45b62-523a-45af-93c6-0fc3ebc44ed3</guid>
<guid>3fc1a783-a25b-7889-3cca-545523549cbb</guid>
</ArrayOfGuid>

Each guid in the property represents a different quick deploy job. 

To delete the quickdeployjobs key (see backup/restore problem above) you can use the following code:

using System;
using Microsoft.SharePoint.Publishing.Administration;
using Microsoft.SharePoint;

namespace StefanG.Tools
{
    class DumpQuickDeployJobs
    {
        static void Main(string[] args)
        {
            using (SPSite site = new SPSite("http://localhost:2000"))
            {
                using (SPWeb rootweb = site.RootWeb)
                {
                    if (rootweb.Properties.ContainsKey("quickdeployjobs"))
                    {
                        rootweb.Properties["quickdeployjobs"] = null// set to "null" will delete the property
                        rootweb.Properties.Update();
                    }
                }
            }

        }
    }
}


 

Content Deployment – The complete Guide – Part 4 – Communication

Part 1 - The Basics
Part 2 - The Basics continued
Part 3 - Configuration
Part 4 - Communication
Part 5 - Quick Deployment
Part 6 - Logging
Part 7 - coming soon...
Part 8 - coming soon...
Part 9 - coming soon...

Content Deployment uses the http (or https) protocol for communication between source and target farm. Most of the communication is performed through Web service calls except for the file transfer of the exported files to the target farms which is done using a regular http POST request against an ASPX page to perform the upload operation.

Communication inside the same farm (between worker process and timer service) is implemented by writing to and reading from the configuration database (starting and stopping of timer jobs) and central administration content database (job status information) of the farm.

Communication through a Web Service

This Web service is hosted within the central administration Web application at the following URL:

http://target-central-admin-url/_vti_adm/ContentDeploymentRemoteImport.asmx

It provides the following operations:

GetVirtualServersInformation

This operation is used during the configuration of a content deployment path. It allows enumerating all SharePoint Web applications on the target farm which will then be presented in the UI to allow a user to select it as target for the content deployment path.

GetSiteCollectionNames

This operation is also used during configuration of a content deployment path. It allows enumerating all site collections on the target farm for a given virtual server which will then be presented in the UI to allow a user to select it as a target for the content deployment path.

GetRemoteAdminServerUrl

When configuring content deployment in a farm you have to define one server that has to act as the importing server for content deployment operations. The GetRemoteAdminServerUrl operation returns the information about this server to the caller.

This operation is called as well during execution of a content deployment job to identify which server in the target farm will perform the import.

CreateJob

This operation is called by an executing content deployment job on the source farm. It allows the creation of the shadow job on the target farm that will perform the import operation.

RunJob

This operation is called by an executing content deployment job on the source farm to start an earlier created shadow content deployment job on the target farm. This job will perform the import operation.

GetJobStatus

This operation is called by an executing content deployment job on the source farm in a configurable interval (default: 10 seconds) to get the status of the import operation from the target farm.

It should be noted that refreshing the content deployment status page does not necessarily result in a request for status to the target farm.

It should be noted that if the returned status does not change for a configurable timeframe (default: 10 minutes) the content deployment job on the source farm will report a timeout – independent whether or not the import operation later finishes successfully.

This affects mainly full deployment jobs which deploy a large amount of data. In this situation the decompressing phase on the target server can take longer than 10 minutes which will result in such a timeout.

CancelJob

This operation is called by an executing content deployment job on the source farm to cancel its shadow content deployment job on the target farm when a user requested the cancellation of the content deployment through the UI. Be aware that such a cancel operation can only be performed if the decompressing operation has not yet started.

DeleteJob

This operation is called by an executing content deployment job on the source farm to delete its shadow content deployment job on the target farm after the deployment has been completed.

File transfer through http POST

File transfer of the export packages to the target farm is performed through an http POST request to the following page which is hosted within the central administration Web application:

http://target-central-admin-url/_admin/Content%20Deployment/DeploymentUpload.aspx

The uploaded file is sent as payload of the POST request. The information about the name of the file and the shadow content deployment job the uploaded file belongs to is provided through query string parameters. The actual URL sent with the http request would look similar to this:

http://target-central-admin-url/_admin/Content Deployment/DeploymentUpload.aspx?filename="ExportedFiles.cab"&remoteJobId="49bebd7d-62d0-4d68-a1f0-9118b3ac4416"

Content Deployment Communication Flow

To describe process used during content deployment we will use following diagram. Be aware that communication in the same farm (red arrows) is implemented by writing to/reading from the sharepoint configuration and central administration content database. Communication between farms happens through the web services (dark blue arrows) and the http post request to do the file uploaded (light blue arrow):

When a content deployment job is started through the central admin UI a new one time timer job is created and executed (1). Alternatively the preconfigured timer job for a scheduled deployment job can be started by the timer service manually.

The timer job exports the configured content into a temp directory and compresses it into one or multiple cab files.

When the export is finished the timerjob calls the GetRemoteAdminServerUrl operation of the ContentDeploymentRemoteImport Web service to get the information about the import server to use (2).

The next step is that the timer job calls the CreateJob operation of the ContentDeploymentRemoteImport Web service to create the shadow content deployment job and the associated one time import timer job on the target farm. This job will remain in a stopped state (3/4).

Afterwards the timer job on the source farm uploads the exported cab files to the temp directory on the import server in the target farm using http upload to the DeploymentUpload.aspx (5).

After the upload is completed the timer job on the source farm calls the RunJob operation of the ContentDeploymentRemoteImport Web service to start the import timer job on the target farm (3/4).

While the import is running the import timer job will update the status of the deployment in the associated content deployment job list item.

Every 10 seconds (configurable though object model by setting the RemotePollingInterval setting – see part 2) the timer job on the source farm will call the GetJobStatus operation of the ContentDeploymentRemoteImport Web service to retrieve the status of the import operation (3/6). The timer job on the source farm will update the status in the content deployment job list item with the status received from the target farm. If the update hasn't changed for 600 seconds (configurable though object model by setting the RemoteTimeout setting – see part 2) the timer job on the source farm assumes that something went wrong and will report a timeout.

When the timer job on the source server receives the information that the import operation has completed (either succeeded or failed) it will delete the shadow deployment job and the associated timer job through a call to the DeleteJob operations of the ContentDeploymentRemoteImport Web service (3/4)

The central administration retrieves the status of the timer job by looking at the content deployment job and job report list item which are updated by the timer service on the source farm (7). Every refresh of the page only reads the status in the local content deployment job list item. No call to the target farm to get a status update is done when refreshing this page.

Content Deployment – The complete Guide – Part 3 – Configuration

Part 1 - The Basics
Part 2 - The Basics continued
Part 3 - Configuration
Part 4 - Communication
Part 5 - Quick Deployment
Part 6 - Logging
Part 7 - coming soon...
Part 8 - coming soon...
Part 9 - coming soon...

Content Deployment General Settings

The content deployment configuration in MOSS 2007 differentiates between settings that are unique for each content deployment path or job and general settings that affect the configuration of the whole farm and applies to all paths and jobs.

This section will cover the general settings.

Access to the content deployment configuration information is provided through the following class:

Microsoft.SharePoint.Publishing.Administration.ContentDeploymentConfiguration

The content deployment general settings are stored in the configuration database of the farm.

It should be noted that only part of the settings can be configured through the Central Administration UI. Some settings have to be done in the web.config of the central administration website others can only be adjusted using custom code.

Settings configurable through the UI

After installing MOSS 2007 you will find the content deployment general settings on the operations tab in the central administration:

Content Deployment Settings
          

The following settings can be configured using the Content Deployment Settings page in the UI:

Accept Content Deployment Jobs

This option allows farm administrators to decide whether or not this farm can be used as the target of content deployment operations. If this option is set to "Reject incoming content deployment jobs" then the current farm can act only as the source for content deployment but other farms cannot deploy content to this farm.

The default setting is "Reject incoming content deployment jobs".

Import Server

Each farm has to define one server that acts as the import server for incoming content deployment jobs.

Import operations can impact the performance of a server significantly (for example, the memory consumption can be pretty high), therefore it is a good practice to use a dedicated server to perform the import operation.

It is required that the specified server has the central administration Web site provisioned. It should be noted that with older hotfix levels the UI does not verify if the central administration Web site has really been provisioned on the selected server. If a server has been selected that does not have the central administration Web site provisioned, content deployment will fail.

The default setting for this option will be the server that hosts the instance of the Central Administration Web used to render the page.

Export Server

Each farm has to define one server that acts as the export server for outgoing content deployment jobs.

Export operations can impact the performance of a server significantly (for example, the memory consumption can be pretty high), therefore it is a good practice to use a dedicated server to perform the export operation.

It is required that the specified server has the central administration Web site provisioned. It should be noted that with older hotfix levels the UI does not verify if the central administration Web site has really been provisioned on the selected server. In the case where a server has been selected that does not have the central administration Web site provisioned, content deployment jobs will not start and will remain in the "Preparing" phase.

The default setting for this option will be the server that hosts the Central Administration Web site.

Note:

 

If the Content Deployment Settings have never been configured the Content Deployment settings page will show the default values which will not get committed to the configuration database until you press OK. This means that although you see a selected export server and a temporary directory, these values are not saved.

Connection Security

This is a setting that impacts incoming content deployment jobs. If this setting is set to require encryption then SSL encryption is required for the incoming traffic. Any incoming deployment requests without SSL will be denied.

This option is usually required in the case where the upload goes over the internet or another unsecure line.

The default is that SSL encryption is required as this is the more secure setting.

Temporary Files

This setting specifies the location of the compressed cabinet files before they get uploaded to the target farm or after they have been received from the source farm. Please note that this setting does not specify the location of the uncompressed content!

Warning:

Be aware that the UI tries to create the temp directory during saving to verify if the path is accurate. This will happen on the server running the Central Administration Web site used to configure the settings - not on the configured export or import server!

In case where the export and import servers are not identical to the central admin server you are using to configure the settings the export / import process will try to create the directory during export / import if it does not exist.

So be sure to enter a path that is valid on both the export and import server and on the central admin server as no further checks will be done before executing content deployment the first time.

The uncompressed data will be stored in a directory below the directory returned from the Windows OS function, GetTempPath. GetTempPath will use the following logic to determine the location (the first hit wins):

  • The path specified by the TMP environment variable.
  • The path specified by the TEMP environment variable.
  • The path specified by the USERPROFILE environment variable.
  • The Windows directory.

These locations are usually found on the system drive.

Due to the above, it is a good practice to configure the TMP environment variable of the user assigned to the OWSTIMER service and the application pool of the central admin Web site to a location that has sufficient disk space.

Important: 

Ensure that the TMP environment variable of the farm account points to a location with sufficient disk space on both the source and the target farm.

Reporting

This setting specifies how many reports are kept for each content deployment job. The default setting is 20.

Settings configurable through the web.config

Some settings which affect content deployment have to be configured in the web.config of the central administration Web application.

Currently the following options are available:

RetryOption

This is a new option that has been added with build 12.0000.6315.5000.

It has been implemented to mitigate the problems that can occur when content deployment jobs are executed in parallel or if authoring activities badly interact with the export phase.

To overcome this limitation the RetryOption has been implemented which allows an export to be retried if the previous export failed. This option can either do a simple retry, which means the same export will be performed again or it can be configured to exclude the sites (SPWeb objects) that caused problems from the next attempt. This allows content deployment to be used in a limited manner where persistent problems in specific sites would cause content deployment jobs to fail.

This setting defines the default value which applies to new jobs. It is possible to specify different values for individual jobs using additional settings which will be covered later.

Possible values for this option are None (=0), SkipFailedWebs (=1), and SimpleRetry (=2).

The default value is None which means that no retry attempt is performed.

See later in the Content Deployment Job configuration settings for an example on how to configure this option in the web.config.

ExportRetries

This option is tied to the previous option. If the RetryOption is enabled this setting defines how many times the export will be retried before the job finally ends with a failure. To ensure that the deployment jobs does not run forever if a persistent problem occurs you should avoid configuring this to a number higher than 5.

This setting defines the default value which applies to new jobs. It is possible to specify different values for individual jobs using additional settings which will be covered later.

Possible Values: 1-999. A small number (3-5) is recommended

(See later in the Content Deployment Job configuration settings for an example on how to configure this option in the web.config.)

Settings configurable through custom code

Some settings are only exposed through the object model because the product group assumed that there is no need to change the default values.

Access to the content deployment configuration information is provided through the following class:
Microsoft.SharePoint.Publishing.Administration.ContentDeploymentConfiguration

In addition to the settings listed in the last two sections, this class exposes the following additional properties:

FileMaxSize

This setting allows the maximum size of each generated cab file to be defined.

The default value for this setting is 10 MB.

Please note that this does not mean that cab files will never exceed 10 MB. Content deployment does not split files over multiple cab files. As a result cab files can exceed this limit. For example, if a file in a document library that needs to be deployed cannot be compressed to a size smaller than 30 MB then the cab file will end up at least 30 MB in size.

RemoteTimeout

After the import job on the target server has been started the content deployment job on the export server of the source farm polls the target farm for status updates in regular intervals. If the status does not change for the timeframe configured through this value the exporting server reports a timeout.

This affects mainly full deployment jobs which deploy a large amount of data. If in such a situation the decompressing phase on the target server takes longer than the configured RemoteTimeout values you will see such a timeout.

The default for this setting is 10 minutes.

RemotePollingIntervall

This setting is tightly bound to the previous. It defines the interval used by the exporting server to check for status changes on the target server.

Usually it should not be required to change this setting.

The default for this setting is 10 seconds.

The following code sample demonstrates how to adjust the RemoteTimeout value. The other settings can be adjusted in a similar manner:

using System;
using Microsoft.SharePoint.Publishing.Administration; 

namespace StefanG.Tools 

   class AdjustContentDeploymentDeploymentSettings 
   { 
      static void Main(string[] args) 
      { 
         ContentDeploymentConfiguration config = 
         ContentDeploymentConfiguration.GetInstance(); 
         config.RemoteTimeout = 3600// 3600 seconds = 1 hour. 
         config.Update(); 
      } 
   } 
}

Content Deployment Path Settings

As discussed earlier a content deployment path defines the content flow from a single source site collection to a single target site collection.

Access to the content deployment path configuration information is provided through the following class:
Microsoft.SharePoint.Publishing.Administration.ContentDeploymentPath

This class stores its data in the Content Deployment Paths list which resides in the root web of the central admin website:
http://url-to-central-admin/lists/content%20deployment%25paths

Be aware that only part of the settings can be configured through Central Administration. Some additional settings can to be done through STSADM commands.

Settings configurable through Central Administration

The following settings can be configured using the Content Deployment Path Settings page in Central Administration:

Name

The Name for a content deployment path has to be unique.

Description

The description for the content deployment path.

Source Web Application

This setting defines the URL to the source Web application. The UI provides a drop down with the names of all Web applications on the source server. It should be noted that although the UI presents the Web application name in the drop down only the URL to the site collection is stored in the path settings.

Source Site Collection

This setting defines the server relative URL for the source site collection which resides inside the selected source web application. As soon as a source Web application has been chosen by a user the UI populates the drop down box with all server relative URLs for the selected Web application.

Destination Central Administration Web Application

This setting defines the URL to a server in the target farm that hosts an instance of the central administration Web site. It does not matter if this is the Import server in the target farm or a different server as long as the central administration Web application has been provisioned on this server.

It should be noted that the protocol used in this setting defines whether the future communication to the target farm will use http or SSL.

The UI will present a warning when http is being used:

Authentication Information

This setting defines the credentials being used to communicate with the target farm. The account being used has to have farm administrator rights on the target farm. When the same farm account is used in both farms or when the path links two site collections in the same farm the option to connect using the application pool account can be utilized.

Alternatively you can enter the credentials to the target farm and decide whether to use integrated authentication or basic.

Destination Web Application

This setting defines the URL to the target Web application.

The UI provides a drop down with the names of all Web applications on the target server which is populated as soon as the URL to the target central administration Web application has been entered and the connect button has been pressed.

It should be noted that although the UI presents the Web application name in the drop down only the URL to the site collection is stored in the path settings.

Destination Site Collection

This setting defines the server relative URL for the destination site collection which resides inside the selected destination web application. As soon as a destination Web application has been chosen by a user the UI populates the drop down box with all server relative URLs for the selected Web application.

User Names

This setting is the equivalent to the SPImportUserInfoDateTimeOption setting of the underlying content deployment and migration API. It defines whether user information from the source system should be associated with the content after it has been deployed to the target.

Security Information

This setting defines the level of granularity to use for security information from the source system should be deployed to the target system.

Three different levels are possible:

  • All: all security information should be deployed
  • Role definitions only: this option allows to deploy everything except the actual user assignments. That is an interesting option if you would like to have the same sharepoint groups with identical permissions in source and target environment but different users due to different AD infrastructure assigned to the groups.
  • None: no security information should be deployed to the target server. That means all security related settings will be inherited from existing content on the target farm.

Settings configurable through STSADM

Some settings are only exposed through STSADM commands.

KeepTemporaryFiles

When performing a content deployment operation the exporting server first exports the content into a temporary directory on the disk and then compresses the data. After the deployment is done, the temporary files are cleaned up - more or less (we will cover this later).

This setting allows you to control in which situations the cleanup will happen. This can be interesting when it comes to troubleshooting content deployment problems. Enabling this option allows you to keep and inspect the generated cab files to identify what has been deployed.

We at Microsoft Support often use this option when working on support cases.

To modify this option please use the following STSADM command:

STSADM -o editcontentdeploymentpath -pathname <pathname> -keeptemporaryfiles Never|Always|Failure
(In case your content deployment path name contains blanks please specify it with double quotes.)

Possible values: Never (=0), Always (=1), Failure (=2)

Be aware that Failure means fatal errors. In case that warning or non-fatal errors are reported in the deployment log the job would still report "Success" and the temporary files would not be preserved even if you specified "Failure" for this option.

The default is Never.

EnableEventReceiver

This property allows to control whether or not asynchronous event receivers are allowed to fire during the import. The default is that asynchronous event receivers are disabled during import. This configures the SPImportSettings.SuppressAfterEvents setting of the content deployment and migration API when performing the import.

Any modification to the target database that happens in parallel with an import could potentially lead to a deadlock in the SQL database which could cause content deployment import to fail. As asynchronous event receivers get executed in parallel with the import operation such deadlocks could be caused by these event handlers. That's the reason why this option is disabled by default. In general it should not be required to have these event handlers enabled in case that they should modify sharepoint content as the same actions would have fired on the source system and would have performed the modification there already. So the changes done by the event receiver will be deployed to the target system using content deployment. You should only enable after events if you are sure that these events are not modifying anything inside the sharepoint database the import runs in. E.g. if the event receivers only send emails or update content in other databases.

This option can be modified using the following STSADM command:

STSADM -o editcontentdeploymentpath -pathname <pathname> -enableeventreceivers yes | no

Possible values: No (=0), Yes (=1)

Default is No.

EnableCompression

This property allows you to control whether the data being deployed is being compressed during the deployment or if the uncompressed data should be sent to the destination server. This setting is enabled by default to provide acceptable performance when uploading the content to a server over a slow connection. In an intranet scenario or when deploying content between two site collections on the same server, the overhead to perform compression and decompression might be higher than the additional time to transport the uncompressed data. Here the compress and uncompress time could cause a significant overhead in the performance of the content deployment jobs.

In such a situation it might make sense to disable the compression.

This setting configures the SPDeploymentSettings.FileCompression property of the content deployment and migration API during export and import.

To modify this option use the following STSADM command:

STSADM -o editcontentdeploymentpath -pathname <pathname> -enablecompression yes | no

Warning:

Before disabling compression, check the version of the Microsoft.SharePoint.Publishing.dll on your disk. A known problem with disabling this option was fixed in build 12.0000.6315.5000. If your Microsoft.SharePoint.Publishing.dll has a lower build number you cannot disable this option as it would cause your content deployment job to fail.

Possible values: No (=0), Yes (=1)

The default value is Yes.

Content Deployment Job Settings

As discussed earlier a content deployment job defines for which part of a site collection and in which schedule content is transferred to the target site collection.

Access to the content deployment job configuration information is provided through the following class:
Microsoft.SharePoint.Publishing.Administration.ContentDeploymentJob

This class stores its data in the Content Deployment Jobs list which resides in the root web of the central admin website:
http://url-to-central-admin/lists/content%20deployment%25jobs

Be aware that only part of the settings can be configured through a UI. Some additional settings can to be done through web.config or STSADM commands.  

Settings configurable through the UI

The following settings can be configured using the Content Deployment Job Settings page in the UI:

Name

The Name for a content deployment job has to be unique. Be aware that it has to be unique for all Jobs - not just for all jobs for a specific path. For example, you cannot create a job named "Deploy All" for Path 1 and a job named "Deploy All" for Path 2.

Description

The description for the content deployment job.

Path

This setting identifies the content deployment path that is associated with the current content deployment job. This setting is represented by a drop down in the UI which is populated with all content deployment path definitions in the current farm.

Scope

This setting defines whether the whole site collection or only specific sites should be deployed. The UI provides an additional option to show a treeview with all sites in the site collection where a user can pick the sites that should be deployed.

Here he can also decide whether to deploy the whole subtree starting at the selected site or only the selected site.

It is not possible to explicitly exclude specific sites from the deployment. To achieve this it will be required to use a combination of site and subtree inclusions which omit the part of the site collection that should not be deployed.

Frequency

This setting allows the schedule for the content deployment job to be defined. This option is disabled by default which means that the content deployment job has to be executed manually. The UI provides various different options to schedule the execution of a content deployment job.

It should be noted that content deployment does not work reliably if multiple export or import operations for the same site collection are executed at the same time. That means you should schedule different jobs for the same source or target site collection so that they do not overlap.

It should also be noted that the schedule is not stored in the content deployment job itself. For each Content Deployment Job there is an assigned Timer Job (the timer job is an instance of the class Microsoft.SharePoint.Publishing.Administration.ContentDeploymentJobDefinition) which does the actual work. The schedule will be assigned to the Schedule property of this timer job.

The content deployment job object itself holds a reference to the timer job.

Deployment Options

This setting defines if the deployment job will perform a full or an incremental deployment. This setting is equivalent to the SPExportMethodType property passed into the content deployment and migration API.

Notification

This setting defines if an email is sent when a content deployment job has succeeded and/or failed. The UI requires that an email address be provided when one of the options is enabled.

Settings configurable through the web.config file

Some settings which affect a content deployment job have to be configured in the web.config of the central administration Web application.

Currently the following options are available:

RetryOption

This is a new option that has been added with build 12.0000.6315.5000.

It has been implemented to mitigate the problems that can occur when content deployment jobs are executed in parallel or if authoring activities badly interact with the export phase.

To overcome this limitation the retry feature has been implemented which allows the export to be retried if the previous export failed. This option can either do a simple retry – means perform the same export again or it can allow excluding the sites (SPWeb objects) that caused the problem from the next attempt. This allows you to use content deployment in a limited manner in situations where persistent problems in specific sites would cause content deployment jobs to fail.

This setting defines the default value which applies to new jobs. It is possible to specify different values for individual jobs using additional settings. We will cover this later.

Possible values for this option are None (=0), SkipFailedWebs (=1), SimpleRetry (=2)

The default value is None which means that no retry attempt is performed.

ExportRetries

This option is tied to the previous option. If Retry is enabled this setting defines how many times the content deployment retries the export option before the job finally ends with a Failure.

To ensure that the deployment jobs does not run forever if a persistent problem occurs you should avoid configuring this to a number higher than 5.

This setting defines the default value which applies to new jobs. It is possible to specify different values for individual jobs using additional settings. We will cover this later.

Possible Values: 1-999. A small number (3-5) is recommended

The following lines have to be added to the web.config to enable the retry options:

<configuration> 
   <configSections> 
      <sectionGroup name="SharePoint"> 
... 
         <section name="ExportRetrySettingstype="Microsoft.SharePoint.Publishing.Administration.RetrySectionHandler, 
            Microsoft.SharePoint.Publishing, Version=12.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c
" /> 
... 
         </configSections> 
      <SharePoint> 
... 
   <ExportRetrySettings DefaultOption="<option>" DefaultRetries="<number-of-retries>"> 
      <Job Name="<Name-of-Deployment-Job1>" Option="<option>" Retries="<number-of-retries>"/> 
      <Job Name="<Name-of-Deployment-Job2>" Option="<option>" Retries="<number-of-retries>"/> 
... 
   </ExportRetrySettings> 

<option> can be one of the following: None, SkipFailedWebs, SimpleRetry

<number-of-retries> can be an integer between 1 and 999

If you don't add any <Job...> elements, then all new jobs will get the values configured as DefaultOption and DefaultRetries. Using the <Job...> element you can configure different jobs with different values.

DefaultOption and DefaultRetries are also the entries that get added to the Content Deployment general settings which we discussed in an earlier section.

After you have added the above listed configuration to the web.config of the Central Administration you have to open the content deployment job definition once in the central administration and save it again. During this save the configured options will be silently added to the content deployment job.

In addition you need to open and save the content deployment general settings page to ensure that the default settings are updated as well. New jobs created after you changed the web.config will automatically get the configured values.

Potential problems related to Configuration

Problems caused by not saving the content deployment general settings 

As outlined earlier in this article the settings visible on the content deployment general settings page are not persisted in the database until explicitelly saved using the OK button.

Problems will occur in situations where the settings haven't been saved (default settings used) and later an option is adjusted using object model. E.g. when the RemoteTimeout value is adjusted.

In this situation the settings will be populated in context of the current machine and current user. That means that potentially the machine where the code is executed will become the export and import server of the farm. In addition the path for the temporary files will not be populated with c:\windows\temp but with c:\documents and settings\<username>\local settings\temp.

Often the user adjusting the timeout value is not the same account as the farm account of the sharepoint farm. That means when content deployment later tries to create the temporary files it will try to create them at a location where the farmaccount does not have rights to and the deployment will fail.

Problems caused by insufficient permissions in the source site collection

If the user configuring the content deployment job does not have read permission in the source site collection he cannot configure the job for selective deployment. The reason is that the treeview which allows a user to select the sites that should be deployed is created by enumerating the site structure in the source database in context of the current user. If this user does not have rights to see these sites or some of these sites they will not be included in the tree view. In case that the user does not have any rights in the source site collection the dialog will show an access denied message.

Content Deployment – The complete Guide – Part 2 – The Basics continued

Part 1 - The Basics
Part 2 - The Basics continued
Part 3 - Configuration
Part 4 - Communication
Part 5 - Quick Deployment
Part 6 - Logging
Part 7 - coming soon...
Part 8 - coming soon...
Part 9 - coming soon...

Content Deployment Phases

The content deployment process is divided into several different phases which are partly performed on the source and partly on the target server.

Preparing

This phase is unique to jobs that are started through the Central Admin UI. In this phase, the code in the UI of the exporting server creates the one time timer job that will perform the export.

Exporting

This phase is performed on the exporting server of the source farm. It uses the WSS content deployment and migration API to export the relevant content for the specific content deployment job. The content will be exported into a flat file structure in a temporary directory on the exporting server.

Compressing

This phase is performed on the exporting server of the source farm. It is done internally in the WSS content deployment and migration API and compresses the exported files into one or multiple cabinet files (cab files). These files will be stored again in a temporary folder on the exporting server.

Transferring

This phase is performed by the exporting server on the source farm. It uses functionality provided by the content deployment feature coming with MOSS 2007. This phase transfers the cabinet files using http upload to the importing server on the target farm where it ends up in a temporary folder.

Decompressing

This phase is performed by the importing server on the target farm. It is done internally in the WSS content deployment and migration API and decompresses the cabinet files into a temporary folder on the importing server.

Importing

This phase is performed by the importing server on the target farm. The import is performed using the WSS content deployment and migration API which reads the content from the temporary directory where the decompressed files are located and generates the content in the target site collection.

Content Deployment Path and Job

Content Deployment is built on two underlying conceptual objects: content deployment path and job.

Content Deployment Path

Content Deployment allows the deployment of content from exactly one site collection on the source farm to exactly one site collection on the target farm.

The link between the source and target site collection is called a content deployment path. You can have multiple content deployment paths in your farm as long as you don't have multiple paths that reference the same source and target site collection.

A content deployment path is always defined on the source farm.

It is possible to have two content deployment paths linking one site collection on the source farm with two different site collections on the target farm.

It should be noted that the source and target site collection can reside in the same farm. So a path can point from one site collection to a second site collection in the same farm as long as the two site collections reside in different databases.

Important:

You cannot have source and target site collection in the same database. Performing a deployment along such a path would cause exceptions during import as it would cause items with the same ID to exist twice in the same database which is not supported by SharePoint.

For the same reason you also cannot have multiple target site collections which receive content from the same source site collection in the same database.

Content Deployment Job

It is possible to create one or more content deployment jobs for a single content deployment path. Each job can be of a different type (full, incremental or quick deployment) and can be configured with a different schedule.

For all jobs except quick deploy jobs it is possible to configure different content deployment jobs for the same path to deploy different parts of the source site collection. This is done by selecting specific sites that should be deployed with this specific job. Using this method it would be possible to deploy one part of the site daily and a second part which contains more frequently updated information hourly.

Quick Deploy jobs cannot be created manually. They will be created automatically when saving the content deployment path if the source site collection has the publishing feature enabled. There can only be one Quick Deploy job per path.

The content deployment job is always defined on the source farm. During the deployment to the target farm a "shadow" content deployment job will be created on the target farm to perform the import operation. This job will automatically be deleted after the import has been completed.

Also be aware that this "shadow" job will not show up in the content deployment paths and jobs page in the target farm as this job will not be bound to a content deployment path.

Content Deployment – The complete Guide – Part 1 – The Basics

Part 1 - The Basics
Part 2 - The Basics continued
Part 3 - Configuration
Part 4 - Communication
Part 5 - Quick Deployment
Part 6 - Logging
Part 7 - coming soon...
Part 8 - coming soon...
Part 9 - coming soon...

A while ago I created a deep dive article series covering the WSS content deployment and migration API which helped many people to develop their own applications to do export and import in a customized manner.

Today I will start a new article series which will discuss all aspects of Content Deployment – with other words the MOSS feature sitting on top of the WSS API.

Most customers see this feature as a monolithic implementation which does not allow any customization – but that is not the case as you will see in future chapters of this article series

But before we can start with the internals we first have to start with the basics.

Content Deployment – The Idea Behind

MOSS 2007 contains Content Deployment as a new feature which has been added to fulfill the requirements of companies which plan to use SharePoint as a Web server to host public facing Web sites.

The main purpose is to allow authors and reviewers to modify and evaluate on a different farm before the content is finally pushed to the public facing server farm – but also to have a single authoring environment and then push the content to multiple different farms of different departments - potentially on different continents.

A similar concept (site deployment) was already available in Microsoft Content Management Server 2002 but required additional programming for automated deployments. With MOSS 2007 this can now be automated and customized easily through a build in UI.

Content Deployment – The Feature

The Content Deployment section shown on the Operations tab of the Central Administration Website is provided by the "Content Deployment" feature which is shipped with MOSS 2007.

You can find this feature in the following directory: …\12\template\features\DeploymentLinks

This feature is activated on the Root Web of the Central Admin site collection. Be aware that this is a hidden feature so you cannot see it in the site feature list in the UI.

Deactivating this feature will remove the Content Deployment section from the central admin Web site and also disable the underlying functionality of Content Deployment as several methods verify internally if this feature has been enabled on the central admin Web site or not.

You can easily try this on your own using the following command:

STSADM -o deactivatefeature -url http://central-admin-url -name DeploymentLinks

After executing this command you will see that the content deployment section is gone from the operations tab. To reenable the functionality use this command:

STSADM -o activatefeature -url http://central-admin-url -name DeploymentLinks

Content Deployment – The different flavors

Content Deployment as it is available in MOSS 2007 comes in three different flavors: full deployment, incremental deployment and quick deployment. Here is a short overview about the three different flavors:

Full Deployment

Full Deployment allows all the content of a site to be deployed to the target site collection independent of when the content was created. This is usually done when deploying a site collection to the target farm for the very first time.

Limitations

Full deployment will only deploy the current state of the site. It will not deploy information about historical actions performed on the site (for example, if an item has been moved or has been deleted).

Benefits

Full deployment does not rely on information about what has been previously deployed to the target. It will always export the current state of the source site collection.

Caveats

Full deployment into a database that has already received content through a content deployment job can cause inconsistencies between the source and the target farm as information about historical actions is not deployed. For example, if a list item is deleted on the source farm between the two content deployment runs you will notice that the item will not get deleted on the target farm.

Incremental Deployment

Incremental Deployment deploys all content that has changed since the last successful run of a content deployment job. Only modified content will be deployed to the target farm. Incremental deployment also transports the information about historical actions such as move, delete and rename to the target farm.

Limitations

The complexity of an incremental deployment is higher than the complexity of a full deployment job. This is due to the additional logic included to deploy only the modified content and items directly linked to the modified content. This increases the chance for problems which can cause incremental content deployment to fail. A second limitation is that an incremental deployment will only look at the source system to determine the data that has to be deployed. It will not analyze source and target and based on the difference decide which content to deploy.

Benefits

Due to the fact that only modified content is deployed, the time for an incremental deployment is usually much shorter than the initial full deployment. In addition, the feature to deploy information about historical actions allows the target system and the source system to keep in sync beyond what is possible with full deployment.

Caveats

If an incremental deployment job fails repeatedly it might be required to perform a new full deployment to the target to re-sync the content on the source farm with the target farm. Since a full deployment does not transport information about historical actions, this will only work reliably if this full deployment is performed into an empty database.

Quick Deployment

Quick Deployment allows authors to pick specific pages for content deployment. This is vital for companies which need to quickly update specific pages on their site which need to be independently updated from other content that would be deployed with an incremental deployment.

Limitations

Quick deployment will deploy only the selected publishing pages and resources bound to these pages. In addition, it is not possible to select items in a regular document library or list for quick deployment.

Benefits

Quick Deployment enables authors to register specific items for quick deploy which will be deployed to the target farm independent from any other changes that have been done on the source farm since the last full or incremental deployment.

Caveats

Quick deployment only deploys publishing pages and resources bound to the selected pages. Sites and lists are not deployed through quick deployment. Due to this, all target folders and libraries for all selected an dependent items must already exist on the target farm – otherwise quick deployment will fail. For example, consider the scenario where a new picture library has been created and a picture from this picture library has been bound to a field in a publishing page that needs to be quick deployed. Quick deployment will fail if the picture library has not been previously deployed to the target site collection.

Myths and Facts about Full, Incremental and Quick Deployment

Myth: I need at least one full and one incremental deployment job per path as I have to run at least on full deployment at the beginning before I can run an incremental deployment

Fact: The above statement is not correct. The very first incremental deployment will always be a full deployment. So there should be no need to create a full deployment job.
 

Myth: Full deployment is more than incremental deployment

Fact: The above statement is not correct. Full deployment only deploys the current state of the site while incremental deployment deploys all changes done to the site since the last deployment. That also includes delete and move operations. An extreme example would be a site collection that contained 100.000 items when it was last deployed. After deleting 90.000 items full deployment would only deploy the 10.000 remaining items and would not deploy the delete operations. So after a full deployment the number of items in the target database would not have changed (assuming that no other operations have been done to the source database as the delete operation). On the other hand an incremental deployment will only deploy the 90.000 deleted items to ensure that these items will also be deleted on the target. With other words: after the full deployment the source and target database will be out of sync while incremental deployment will keep these databases in sync.
 

Myth: It is best practice to run incremental deployments during the week and a full deployment during weekend to ensure that my content database is in sync

Fact: The above statement is not correct. Every single full deployment has the risk that the target site collection runs out of sync with the source as delete and move operations will not be applied the target which also has the risk of breaking future the content deployment jobs which will fail due to the inconsistencies between source and target.
 

Myth: If incremental deployment fails I can do a full deployment to fix the problem

Fact: This statement is not correct. Indeed: full deployment often succeeds when incremental deployment failed and indeed future incremental deployments will often work again afterwards but this is caused by the fact that after the full deployment the next incremental will deploy only content changed after the full deployment. The problem is that in this situation you will usually not really resolve the underlying issue. You are just avoiding it by adding all other content that exists in the source site collection to the package so that inconsistencies in the source database which led to the failing incremental deployment will no longer show up. But often the same problem occurs again a couple of days later when incremental deployment again has to deploy the problematic items. And through the full deployment you made things worse as you now have potentially additional inconsistencies between source and target due to the fact that delete and move operations that would have been deployed with the failing incremental deployment will never be deployed again. So now you added additional inconsistencies to the one that caused the earlier failure which can cause additional problems with future content deployments (as well incremental and full).
 

Myth: Quick deploy is an easy way to ensure that important content can go live easily out of sync with other content.

Fact: The above statement is not fully correct. The quick deployment has to be used very carefully. For every single quick deployed item – and also for all dependent items like referenced images and documents – it has to be guaranteed that the site, the folder the list or library the item is in has been deployed to the target earlier with an incremental deployment. If this is not the case the quick deploy job and all future quick deploy job runs will fail till the next incremental deployment has been performed. That means a single user that quick deploys a page that references a document in a doc lib which has not been previously deployed to the target system can cause a problem that affects the ability of all other users to perform a quick deployment. The main problem is here that quick deployment is done by authors – while incremental deployment is controlled by farm administrators. So it is usually unclear to the authors when the last successful incremental deployment has been done and which sites and lists have been deployed.

x64 version of Debug Diag 1.1 troubleshooting tool has been released to web yesterday

After a long beta period the 64-bit version of Debug Diag 1.1 was made available for public download from Microsoft download center.

The Debug Diagnostic Tool (DebugDiag) is designed to assist in troubleshooting issues such as hangs, slow performance, memory leaks or fragmentation, and crashes in any user-mode process. The tool includes additional debugging scripts focused on Internet Information Services (IIS) applications, web data access components, COM+ and related Microsoft technologies.

The whitepaper about the tool has been updated as well.

Posted by Stefan_Gossner | 0 Comments
Filed under: ,

October 2009 CU for WSS and MOSS has been released yesterday

The October cumulative update for WSS 3.0 and MOSS 2007 has been released yesterday here are the KB article links for these fixes:

The fixes can be downloaded using the following links:

Modifying SharePoint RSS Feeds using a custom control adapter - Implementing Delta Encoding

We frequently get questions from customers if it is possible to modify the data sent in the RSS feed to the client. SharePoint itself allows to specify the columns which can be included in the Description field, the number of items to return as a maximum and the maximum days to include in the RSS feed.

A feature which is not supported by SharePoint is Delta Encoding for the feeds.

In short words: Delta Encoding would allow to reduce the amount of data sent in an RSS feed to the client to the exact information required by the client to get him up-to-date to the current point in time. That means that items that have been sent to the client earlier would not be sent a second time.

Delta Encoding requires that the client sends an If-Modified-Since http header to the server which allows the server to determine at which time this specific client received the last update.

SharePoint actually implements a check of If-Modified-Since and returns a 304 - not modified in case that the content of RSS feed has not changed. But in case that the content has changed it will return the whole RSS feed.

This implementation is correct as otherwise scenarios with caching proxy servers would fail as content for the exactly same URL would be different based on the If-Modified-Since header sent.

E.g. like this:

  • User 1 requests the RSS feed and receives content with items A, B and C which is then also cached by the proxy
  • Item D is added
  • Later User 1 requests the RSS feed after the cache is expired and the cached content is replaced with content D as this would be the delta for this user
  • Now User 2 requests the RSS feed and retrieves content D from the cache and never receives A, B and C

So Delta Encoding is only of useful in case that no caching proxy servers are between the clients and the SharePoint server.

Still some customers have requested this feature and I will now explain how this can be implemented using an ASP.NET ControlAdapter.

Control Adapters allow to modify the behavior of existing controls without subclassing by adding a simple configuration file to the App_Browser directory. To modify the behavior of the SharePoint RSS feeds it will be required to implement a custom PageAdapter which allows to consume the XML response generated by the RSS feed and filter it against the If-Modified-Since header.

Usually a control adapter that needs to modify the content rendered by a control would create a custom HtmlTextWriter pass this to the Render method of the control, consume the content generated by the control and modify it before sending it to the original HtmlTextWriter which will then send the content to the client:

protected override void Render(System.Web.UI.HtmlTextWriter output) 

    // catch the output of the original Control
    TextWriter tempWriter = new StringWriter();
    base.Render(new System.Web.UI.HtmlTextWriter(tempWriter)); 
    string origHtml = tempWriter.ToString(); 

    // adjust the content as required
    string newHtml = origHtml.Replace(...); 

    // send the adjusted output to the client
    output.Write(newhtml); 
}

Unfortunatelly this method does not work with the SharePoint RSS feeds as the RSS code does not use the passed in HtmlTextWriter to send the RSS response but uses Response.Write to send the content. This method bypasses the HtmlTextWriter.

So a different method is required to consume and modify the RSS response. ASP.NET allows to do this using a custom stream filter.

In .NET a stream filter is actually nothing but a Stream object which chains itself between the filtered stream and the receiver.

After the custom stream filter has been configured as stream for the http response object it will receive all content sent from the control and can modify it in any way it likes. We will use this method to implement Delta Encoding for the RSS feed.

The Stream filter has to implement all the method of a standard Stream object. Here is the basic stream filter without the custom logic to modify the content:

public class RSS_Filter : Stream

        private Encoding enc = null;
        private bool closed;
        Stream BaseStream; 

        public RSS_Filter(Stream baseStream, Encoding encoding)
        {
            // here we keep track of the underlaying base stream and the encoding.
            BaseStream = baseStream;
            enc = encoding;

            // stream is not closed when we create it.
            closed = false;
        }

        public override void Write(byte[] buffer, int offset, int count)
        { 
            // ensure that the stream has not been closed before 
            if (Closed) throw new ObjectDisposedException("RSS-Stream-Filter");

            // -- our logic to consume the written data needs to be added here --
        }

        public override bool CanRead
        {
            get { return false; }
        }

        public override bool CanWrite
        {
            get { return !closed; }
        }

        public override bool CanSeek
        {
            get { return false; }
        }

        public override void Close()
        {
            closed = true;
            BaseStream.Close();
        }

        protected bool Closed
        {
            get { return closed; }
        }

        public override void Flush()
        {
            // -- our logic to modify the written data before sending it on the wire needs to be added here --

            BaseStream.Flush();
        }

        public override int Read(byte[] buffer, int offset, int count)
        {
            throw new NotSupportedException();
        }

        public override long Length
        {
            get { throw new NotSupportedException(); }
        }

        public override long Seek(long offset, SeekOrigin origin)
        {
            throw new NotSupportedException();
        }

        public override void SetLength(long value)
        {
            throw new NotSupportedException();
        }

        public override long Position
        {
            get { throw new NotSupportedException(); }
            set { throw new NotSupportedException(); }
        }
}

As you can see in the above code there are two methods where we have to add custom logic: in the Write method and in the Flush method. This is required as the RSS response can be larger than a single write buffer. So it is not guaranteed that the whole RSS response is in the first buffer we get. As the write method receives a byte buffer it is also not guaranteed that UTF-8 characters (which can be larger than a single byte) do not spread between two buffers.

So in order to allow us to correctly process the RSS feed we have to buffer all the received buffers in the Write method till the Flush method is called which indicates the end of the stream.

In the Flush method we can then add the logic to post-process the RSS content.

The following code fragment demonstrates how the buffers are consumed:

        Queue byteQueue = new Queue();
        int responseLength=0;

        public override void Write(byte[] buffer, int offset, int count)
        { 
            // ensure that the stream has not been closed before
            if (Closed) throw new ObjectDisposedException("RSS-Stream-Filter");

            // handle the special case that only part of the given buffer 
            // should be sent to the wire. We will then copy the required part
            // to a second buffer and buffer this
            if (offset > 0)
            {
                byte[] newBuf = new byte[count];
                for (int i = 0; i < count; i++)
                {
                    newBuf[i] = buffer[i + offset];
                }
                // we store the buffer in a queue which ensures that we can easily 
                // process the content in the same sequence
                byteQueue.Enqueue(newBuf);
            }
            else
            {
                // we store the buffer in a queue which ensures that we can easily 
                // process the content in the same sequence
                byteQueue.Enqueue(buffer);
            }
            // current buffered response
            responseLength += count;
        }

In the Flush method we now add the logic that handles the delta encoding. To allow easy modification of the RSS feed we first have to convert the consumed byte buffer into a object which we can then easily filter using Xml methods. After the modification is done we have to convert the generated XML back into a byte buffer to allow us to send it to the client. To ensure that even if an exception occurs the original content is sent to the client we will add the relevant logic in a try/finally block:

        public override void Flush()
        { 
            // retrieve the reference time from the If-Modified-Since header
            DateTime ifModSince = Convert.ToDateTime(HttpContext.Current.Request.Headers["If-Modified-Since"]);

            // here we copy the different buffers into one large buffer
            // we need this in a single buffer for the conversion to string
            byte[] fullBuffer = new byte[responseLength];
            int index = 0;
            while (byteQueue.Count > 0)
            {
                byte[] buf = byteQueue.Dequeue() as byte[];
                buf.CopyTo(fullBuffer, index);
                index += buf.Length;
            }

            // now we convert the byte buffer into a string object
            string content = enc.GetString(fullBuffer, 0, responseLength);

            // before we convert the content we have to remove all characters 
            // which are not part of the Xml content
            string nonXml = content.Substring(0, content.IndexOf("<"));
            if (nonXml.Length > 0)
                content = content.Substring(nonXml.Length); 

            try
            {
                // -- here we add our code to manipulate the XML --
            }
            catch (Exception e)
            {
                // our code raised an exception - there is not much we can do about it
            }
            finally
            {
                // ensure to write the data we have to the wire 
                byte[] newBuffer = enc.GetBytes(nonXml + content);

                responseLength = newBuffer.Length;
                BaseStream.Write(newBuffer, 0, responseLength);
            }

            // now we flush the base stream
            BaseStream.Flush();
        }

The next step is to implement the logic that allows us to filter the RSS feed using Xml methods:

        // now we create a new XmlDocument and pass in the rss content as InnerXml 
        XmlDocument doc = new XmlDocument(); 
        doc.InnerXml = content; 

        // to get the Urls to the different items in the RSS feed we filter the Xml  
        // for the link nodes 
        XmlNodeList itemNodes = doc.GetElementsByTagName("link"); 
        Stack nodesToRemove = new Stack(); 

        // loop over each link in the RSS 
        foreach (XmlNode node in itemNodes) 
        { 
            // we are only interested in links that reside in item nodes of the RSS feed 
            if (node.ParentNode.Name == "item"
            { 
                // the link to the item is in the InnerText of the current node 
                string link = node.InnerText; 
                if (!string.IsNullOrEmpty(link)) 
                { 
                    // lets retrieve the sharepoint object identified by the link 
                    Uri uri = new Uri(link); 

                    // -- here we need to add the code to verify the last modified date        --
                    // -- of the item identified by the url against the If-Modified-Since date --
                } 
            } 
        } 
        while (nodesToRemove.Count > 0
        { 
            // get the item node we need to remove 
            XmlNode node = (nodesToRemove.Pop() as XmlNode).ParentNode; 
            node.ParentNode.RemoveChild(node); 
        } 
        content = doc.InnerXml;

After this code is executed we have only the items in the RSS feed which were modified after the date in the If-Modified-Since header. The missing piece here is only the code that verifies the last modified date of the item against the If-Modified-Since header:

        using (SPSite site = new SPSite(uri.AbsoluteUri.ToString())) 
        { 
            // to avoid exceptions we need to cut the URL if we find a reserved name. Reserved names start with an "_" char. 
            string modUrlWithoutReservedNames = uri.AbsolutePath.Substring(0, (uri.AbsolutePath + "/_").IndexOf("/_")); 
            using (SPWeb web = site.OpenWeb(modUrlWithoutReservedNames, false)) 
            { 
                // there are two possible way to encode the URL in SharePoint RSS feeds depending on how it is configured: 
                // using the ID and using direct item URL 
                if (HttpUtility.ParseQueryString(uri.Query)["ID"] == null
                { 
                    // here we handle the items identified using a direct Url 
                    // check if we can retrieve the object and verify if it is a List Item 
                    SPListItem item = web.GetObject(uri.GetComponents(UriComponents.PathAndQuery, 
                                                    UriFormat.UriEscaped)) as SPListItem; 

                    if (item != null
                    { 
                        DateTime modTime = (DateTime)item["Modified"]; 
                        if (DateTime.Compare(modTime, DateTime.Now.AddHours(-1)) < 0
                        { 
                            nodesToRemove.Push(node); 
                        } 
                    } 
                } 
                else 
                { 
                    // here we handle the items identified by ID 

                    // first we try to get the list 
                    SPList list = web.GetList(uri.AbsolutePath); 
                    if (list != null
                    { 
                        // to retrieve the list item we need to take the item id from the query string in the rss 
                        int itemId = int.Parse(HttpUtility.ParseQueryString(uri.Query)["ID"]); 
                        SPListItem item = list.GetItemById(itemId); 

                        // remove the item if it's last modified date is older than "If-Modified-Since" 
                        DateTime modTime = (DateTime)item["Modified"]; 
                        if (DateTime.Compare(modTime, ifModSince) <= 0
                        { 
                            // we cannot remove it directly as this would modify the collection foreach is running over 
                            // so we buffer it 
                            nodesToRemove.Push(node); 
                        } 
                    } 
                } 
            } 
        }

So we have now all code pieces together for the stream filter. To register this we have to register it as follows from within our control adapter:

        protected override void Render(System.Web.UI.HtmlTextWriter writer)
        {
            if (HttpContext.Current.Request.Headers.AllKeys.Contains("If-Modified-Since"))
            {
                // add our custom stream filter if we receive an If-Modified-Since header
                base.Page.Response.Filter = new RSS_Filter(base.Page.Response.Filter, base.Page.Response.ContentEncoding);
            }
            base.Render(writer);
        }

For your convenience here is now all the code together:

using System;
using System.Xml;
using System.IO;
using System.Collections.Generic;
using System.Collections;
using System.Linq;
using System.Text;
using System.Web;
using System.Web.UI.Adapters;
using Microsoft.SharePoint;

namespace StefanG.ControlAdapters
{
    public class RSS_Filter : Stream
    {
        private Encoding enc = null;
        private bool closed;
        Stream BaseStream;
        Queue byteQueue = new Queue();
        int responseLength=0;

        public RSS_Filter(Stream baseStream, Encoding encoding)
        {
            // here we keep track of the underlaying base stream and the encoding. 
            BaseStream = baseStream;
            enc = encoding;

            // stream is not closed when we create it.
            closed = false;
        }

        public override void Write(byte[] buffer, int offset, int count)
        {
            // ensure that the stream has not been closed before 
            if (Closed) throw new ObjectDisposedException("RSS-Stream-Filter");

            // handle the special case that only part of the given buffer  
            // should be sent to the wire. We will then copy the required part 
            // to a second buffer and buffer this 
            if (offset > 0)
            {
                byte[] newBuf = new byte[count];
                for (int i = 0; i < count; i++)
                {
                    newBuf[i] = buffer[i + offset];
                }
                // we store the buffer in a queue which ensures that we can easily  
                // process the content in the same sequence 
                byteQueue.Enqueue(newBuf);
            }
            else
            {
                // we store the buffer in a queue which ensures that we can easily  
                // process the content in the same sequence 
                byteQueue.Enqueue(buffer);
            }
            // current buffered response 
            responseLength += count;
        }

        public override bool CanRead
        {
            get { return false; }
        }

        public override bool CanWrite
        {
            get { return !closed; }
        }

        public override bool CanSeek
        {
            get { return false; }
        }

        public override void Close()
        {
            closed = true;
            BaseStream.Close();
        }

        protected bool Closed
        {
            get { return closed; }
        }

        public override void Flush()
        {
            // we retrieve the reference time from the If-Modified-Since header
            DateTime ifModSince = Convert.ToDateTime(HttpContext.Current.Request.Headers["If-Modified-Since"]);

            // here we copy the different buffers into one large buffer
            // we need this in a single buffer for the conversion to string
            byte[] fullBuffer = new byte[responseLength];
            int index = 0;
            while (byteQueue.Count > 0)
            {
                byte[] buf = byteQueue.Dequeue() as byte[];
                buf.CopyTo(fullBuffer, index);
                index += buf.Length;
            }

            // now we convert the byte buffer into a string object
            string content = enc.GetString(fullBuffer, 0, responseLength);

            // before we convert the content we have to remove all characters 
            // which are not part of the Xml content
            string nonXml = content.Substring(0, content.IndexOf("<"));
            if (nonXml.Length > 0)
                content = content.Substring(nonXml.Length);

            try
            {
                // now we create a new XmlDocument and pass in the rss content as InnerXml
                XmlDocument doc = new XmlDocument();
                doc.InnerXml = content;

                // to get the Urls to the different items in the RSS feed we filter the Xml 
                // for the link nodes
                XmlNodeList itemNodes = doc.GetElementsByTagName("link");
                Stack nodesToRemove = new Stack();

                // loop over each link in the RSS
                foreach (XmlNode node in itemNodes)
                {
                    // we are only interested in links that reside in item nodes of the RSS feed
                    if (node.ParentNode.Name == "item")
                    {
                        // the link to the item is in the InnerText of the current node
                        string link = node.InnerText;
                        if (!string.IsNullOrEmpty(link))
                        {
                            // lets retrieve the sharepoint object identified by the link
                            Uri uri = new Uri(link);
                            using (SPSite site = new SPSite(uri.AbsoluteUri.ToString()))
                            {
                                // to avoid exceptions we need to cut the URL if we find a reserved name. Reserved names start with an "_" char.
                                string modUrlWithoutReservedNames = uri.AbsolutePath.Substring(0, (uri.AbsolutePath + "/_").IndexOf("/_"));
                                using (SPWeb web = site.OpenWeb(modUrlWithoutReservedNames, false))
                                {
                                    // there are two possible way to encode the URL in SharePoint RSS feeds depending on how it is configured:
                                    // using the ID and using direct item URL
                                    if (HttpUtility.ParseQueryString(uri.Query)["ID"] == null)
                                    { 
                                        // here we handle the items identified using a direct Url 
 
                                        // check if we can retrieve the object and verify if it is a List Item
                                        SPListItem item = web.GetObject(uri.GetComponents(UriComponents.PathAndQuery, UriFormat.UriEscaped)) as SPListItem;

                                        if (item != null)
                                        {
                                            DateTime modTime = (DateTime)item["Modified"];
                                            if (DateTime.Compare(modTime, DateTime.Now.AddHours(-1)) < 0)
                                            {
                                                nodesToRemove.Push(node);
                                            }
                                        }
                                    }
                                    else
                                    { 
                                        // here we handle the items identified by ID 

                                        // first we try to get the list
                                        SPList list = web.GetList(uri.AbsolutePath);
                                        if (list != null)
                                        {
                                            // to retrieve the list item we need to take the item id from the query string in the rss
                                            int itemId = int.Parse(HttpUtility.ParseQueryString(uri.Query)["ID"]);
                                            SPListItem item = list.GetItemById(itemId);

                                            // remove the item if it's last modified date is older than "If-Modified-Since"
                                            DateTime modTime = (DateTime)item["Modified"];
                                            if (DateTime.Compare(modTime, ifModSince) <= 0)
                                            {
                                                // we cannot remove it directly as this would modify the collection foreach is running over
                                                // so we buffer it
                                                nodesToRemove.Push(node);
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
                while (nodesToRemove.Count > 0)
                {
                    // get the item node we need to remove
                    XmlNode node = (nodesToRemove.Pop() as XmlNode).ParentNode;
                    node.ParentNode.RemoveChild(node);
                }
                content = doc.InnerXml;
            }
            catch (Exception e)
            {
                Console.WriteLine(e);
            }
            finally
            {
                // ensure to write the data we have to the wire 
                byte[] newBuffer = enc.GetBytes(nonXml + content);

                responseLength = newBuffer.Length;
                BaseStream.Write(newBuffer, 0, responseLength);
            }

            BaseStream.Flush();
        }

        public override int Read(byte[] buffer, int offset, int count)
        {
            throw new NotSupportedException();
        }

        public override long Length
        {
            get { throw new NotSupportedException(); }
        }

        public override long Seek(long offset, SeekOrigin origin)
        {
            throw new NotSupportedException();
        }

        public override void SetLength(long value)
        {
            throw new NotSupportedException();
        }

        public override long Position
        {
            get { throw new NotSupportedException(); }
            set { throw new NotSupportedException(); }
        }
    }

    public class RssControlAdapter : PageAdapter
    {
        protected override void Render(System.Web.UI.HtmlTextWriter writer)
        {
            if (HttpContext.Current.Request.Headers.AllKeys.Contains("If-Modified-Since"))
            {
                // add our custom stream filter if we receive an If-Modified-Since header
                base.Page.Response.Filter = new RSS_Filter(base.Page.Response.Filter, base.Page.Response.ContentEncoding);
            }
            base.Render(writer);
        }
    }
}

To implement this control adapter you need to do the following steps:

  1. create a new class library project in Visual Studio 2008
  2. replace the code in class.cs with the code listed above
  3. add a key file to allow signing of the generated assembly
  4. build the dll and add it to the Global Assembly Cache (GAC)

Afterwards you need to register this control adapter with your existing SharePoint web application. To do this you need to create a new file with the file extension ".browser" inside the App_Browsers directory (e.g. RssControlAdapter.browser) and add the following XML:

<browsers> 
  <browser refID="Default"> 
    <controlAdapters> 
      <adapter  
        controlType="Microsoft.SharePoint.ApplicationPages.ListFeed, Microsoft.SharePoint.ApplicationPages
        adapterType="StefanG.ControlAdapters.RssControlAdapter, AssemblyName, Version=1.0.0.0, Culture=neutral, PublicKeyToken=f180aea0269836ba
      /> 
    </controlAdapters> 
  </browser> 
</browsers> 

Be sure to adjust the Public Key Token f180aea0269836ba and the AssemblyName with the public key token of and name of your specific assembly

To ensure that the new .browser file is being used after the next application domain recycle you also need to delete the content of the following directory:

  • C:\windows\Microsoft.NET\Framework\v2.0.50727\Temporary ASP.NET Files\root

If this directory is not purged the additional .browser file is usually not used.

Important: Be careful when installing WSS August Cumulative Update

Last week we discovered a problem with the WSS August CU.

Due to schema and stored procedure changes coming with the August CU databases with a patch level older than the August CU cannot be upgraded to August CU through database attach method.

That means that you need to be careful when upgrading your WSS or MOSS farm to August CU as the usual recommended steps to detach the content database before installing the hotfix will cause problems later when you are going to reattach the databases. In case you need to install the August CU now to resolve an issue that is covered by the fix you should avoid detaching the content databases during the hotfix installation. If the databases are attached the hotfix installation will correctly upgrade the database when running PSCONFIG.

In case you need to detach the databases it would be required to upgrade the database in a separate farm before attaching them back to the production database. E.g. install a test farm with the previous build level the databases, attach the databases there and then upgrade the test farm to August CU. Afterwards you can detach the upgraded databases from the test farm and attach them back to the production farm.

Also be aware that this will also affect database backups taken before installing August CU.

The product group is currently actively investigating this issue to identify a more elegant method to avoid this problem.

[Update: the WSS hotfix has been rereleased on October 1st. The version of the WSS August 2009 CU availble now does no longer have this problem.]

How to deal with invalid characters in SOAP responses from ASP.NET web services

ASP.NET webservices use XML 1.0 which restricts the character set allowed to the following chars:

[2]    Char    ::=    #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
(Source: http://www.w3.org/TR/REC-xml/#charsets)

As you can see several characters below 0x20 are not allowed with XML 1.0. This also includes characters like the vertical tab (0x0B) which is used pretty frequently.

For backward compatibility reasons .NET webservices do not support XML 1.1 which would allow these character as explained in the following article:

http://msdn.microsoft.com/en-us/xml/bb291067.aspx
W3C Recommendations NOT Supported at This Time
...
XML 1.1 - Microsoft has deliberately chosen not to support the XML 1.1 Recommendation unless there is significant customer demand.

Usually the limitation of XML 1.0 is not hurting - except if the XML response sent back to the client would include one of the forbidden characters like the vertical tab.

An interesting tidbit is that the web service stub (server) routines implemented in .NET framework do not bother about the invalid characters when encoding the XML response. They encode the invalid characters as numeric character reference like &#11; for the vertical tab char. The problem occurs in the web service proxy (client) routines. These raise an exception when an entity is returned which is not allowed in XML 1.0:

There is an error in XML document (8, 1314).
   at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle, XmlDeserializationEvents events)
   at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle)
   at System.Web.Services.Protocols.SoapHttpClientProtocol.ReadResponse(SoapClientMessage message, WebResponse response, Stream responseStream, Boolean asyncCall)
   at System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(String methodName, Object[] parameters)
   ...

So there is a slight discrepancy between the handling of the invalid characters on the web service proxy and stub side.

It would be great if the ASP.NET webservice classes would avoid sending invalid numeric character references to the client at all - but that is not implemented in .NET framework. As the client also cannot get hands on the content before the exception is raised it would be required to fix the issue in the application logic of the web service itself.

Means each web service method would have to replace the invalid characters before sending the XML content to the client.

The problem here is that usually the automatic XML serialization of managed objects is used to create the XML response. So fixing the issue inside the web service is also not trivial.

This issue also affects the standard sharepoint web services which allow access to SharePoint content.

To overcome this problem it would be required to remove the invalid characters from the XML response (e.g. replace them with a space char) after they have been serialized to XML and before they are sent over the wire to the caller of the web service.

The good message is: ASP.NET has indeed a way to achieve this: SoapExtensions. A SoapExtension allows to consume and modify the SOAP message sent from client to server and vice versa.

Below is a SoapExtension which replaces the invalid control characters with a blank character (0x20) using Regular Expressions:

///
///  This source code is freeware and is provided on an "as is" basis without warranties of any kind, 
///  whether express or implied, including without limitation warranties that the code is free of defect, 
///  fit for a particular purpose or non-infringing.  The entire risk as to the quality and performance of 
///  the code is with the end user.
///

using System;
using System.IO;
using System.Web;
using System.Web.Services;
using System.Web.Services.Protocols;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;

namespace StefanG.SoapExtensions
{
    public class XmlCleanupSoapExtension : SoapExtension
    {
        private Regex replaceRegEx;

        private Stream oldStream;
        private Stream newStream;

        // to modify the content we redirect the stream to a memory stream to allow 
        // easy consumption and modifcation
        public override Stream ChainStream(Stream stream)
        {
            // keep track of the original stream
            oldStream = stream;

            // create a new memory stream and configure it as the stream object to use as input and output of the webservice
            newStream = new MemoryStream();
            return newStream;
        }

        public override object GetInitializer(LogicalMethodInfo methodInfo, SoapExtensionAttribute attribute)
        {
            // the module is intended to look at all methods. Not on methods tagged with a specific attribute
            throw new Exception("The method or operation is not implemented.");
        }

        public override object GetInitializer(Type serviceType)
        {
            // create a compiled instance of the Regular Expression for the chars we would like to replace
            // add all char points beween 0 and 31 excluding the allowed white spaces (9=TAB, 10=LF, 13=CR)
            StringBuilder RegExp = new StringBuilder("&#(0");
            for (int i = 1; i <= 31; i++)
            {
                // ignore allowed white spaces
                if (i == 9 || i == 10 || i == 13continue;

                // add other control characters
                RegExp.Append("|");
                RegExp.Append(i.ToString()); 

                // add hex representation as well 
                RegExp.Append("|x");
                RegExp.Append(i.ToString("x")); 
            }
            RegExp.Append(");");
            string strRegExp = RegExp.ToString();

            // create regular expression assembly 
            Regex regEx = new Regex(strRegExp, RegexOptions.Compiled | RegexOptions.IgnoreCase);

            // return the compiled RegEx to all further instances of this class
            return regEx;
        }

        public override void Initialize(object initializer)
        {
            // instance initializers retrieves the compiled regular expression
            replaceRegEx = initializer as Regex;
        }

        public override void ProcessMessage(SoapMessage message)
        {
            if (message.Stage == SoapMessageStage.AfterSerialize)
            {
                // process the response sent back to the client - means ensure it is XML 1.0 compliant
                ProcessOutput(message);
            }
            if (message.Stage == SoapMessageStage.BeforeDeserialize)
            {
                // just copy the XML Soap message from the incoming stream to the outgoing
                ProcessInput(message);
            }
        }

        public void ProcessInput(SoapMessage message)
        {
            // no manipulation required on input data
            // copy content from http stream to memory stream to make it available to the web service

            TextReader reader = new StreamReader(oldStream);
            TextWriter writer = new StreamWriter(newStream);
            writer.WriteLine(reader.ReadToEnd());
            writer.Flush();

            // set position back to the beginning to ensure that the web service reads the content we just copied
            newStream.Position = 0;
        }

        public void ProcessOutput(SoapMessage message)
        {
            // rewind stream to ensure that we read from the beginning
            newStream.Position = 0;

            // copy the content of the stream into a memory buffer
            byte[] buffer = (newStream as MemoryStream).ToArray();

            // shortcut if stream is empty to avoid exception later
            if (buffer.Length == 0return;

            // convert buffer to string to allow easy string manipulation
            string content = Encoding.UTF8.GetString(buffer);

            // replace invalid XML entities using regular expression
            content = replaceRegEx.Replace(content, "&#32;");

            // convert back to byte buffer
            buffer = Encoding.UTF8.GetBytes(content);

            // stream byte buffer to the client app
            oldStream.Write(buffer, 0, buffer.Length);
        }

    }
}

The above code should be compiled into a C# class library project and signed with a strong name to allow placing the DLL into a GAC.

Afterwards the SoapExtension can be registered in the web.config of the affected web service. For SharePoint webservices this would be the web.config in the following directory:

C:\Program Files\Common Files\Microsoft Shared\web server extensions\12\ISAPI\

The following entry needs to be added to the web.config:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<configuration>
    <system.web>
        <webServices> 
            <soapExtensionTypes>
                <add type=
"StefanG.SoapExtensions.XmlCleanupSoapExtension, XmlCleanupSoapExtension, Version=1.0.0.0, Culture=neutral, PublicKeyToken=0e15300fe8a7b210" priority="1" group="0" />
            </soapExtensionTypes>
 
...
        </webServices>
...
    </system.web>
...
</configuration>

Afterwards all responses from webservice methods in the affected web application will automatically be cleaned up.

The complete source code can also be downloaded from here: http://code.msdn.microsoft.com/XmlCleanupSoapExtens

 

Interesting content deployment problem when doing partial deployment - hyperlinks in the User Information List

A colleague (Patrick Heyde) recently contacted me on an interesting content deployment issue he had.

His customer configured a content deployment job that only should deploy only the public part of his site collection. E.g. consider this setup:

root
  +-- private
         +-- subsites
  +-- public
         +-- subsites 

The job was configured to export the root site (you cannot prevent this) and the public subsite including all subsites.

During import the deployment job failed due to the fact that it tried to import content that belongs to the subsite of the private site which had not been deployed. First we thought about the common issue that the customer had links from the public content to the private content which would explain why the private content was exported as dependency. But the customer ensured us that this does not happen.

After some research we found the problem. And indeed the customer did not have any links from content in the public site to the private tree.

What was causing the problem was that the customer has created some sharepoint group to administrate content as in the private site. In the "About me" field he entered a comment like "this rights group has authoring rights on http://servername/private/subsite".

Unfortunatelly the "About Me" field which resembles internally as Notes field is a RichText field which allows hyperlinks. That caused a forward link to be created for the list item related to this sharepoint group in the User Information List.

This list resides in the root of the website and will be exported and deployed as well. That on the other hand causes all dependencies of the items in the user information list to be deployed - including the /private/subsite site.

To resolve the problem it was necessary to edit all affected sharepoint groups and remove hyper links to the parts of the site which should not always be exported.

Updated Version of SharePoint SP2 is now available which contains a fix for the Trial issue

As discussed earlier the original SP2 release reverted the MOSS license to trial during installation. Microsoft now released an updated version of SP2 which does not have this problem.

More details can be found on the SharePoint Team blog.

For those of you who already installed SP2 and would like to get the separate fix for this issue can download it from here: http://support.microsoft.com/kb/971620

Posted by Stefan_Gossner | 0 Comments
Filed under:

The June cumulative update for WSS V3 and MOSS 2007 has been released yesterday

As discussed by the Office Sustained Engineering group cumulative updates for all Office Products including WSS 3.0 and SharePoint 2007 will be released every second month.

Yesterday we released the so called June Cumulative Update. The so called "Uber" packages for MOSS will be released in a couple of days. Currently you can download the individual global and localized fixes for your specific language packs.

This is the second Post-SP2 hotfix! That means it is highly recommended to install Service Pack 2 before installing the June CU.

As usual you can find the details about the fixes and the download locations on the blog my colleague Joerg Sinemus published:

http://blogs.msdn.com/joerg_sinemus/archive/2009/07/01/moss-and-wss-june-cu.aspx

More Posts Next page »
 
Page view tracker