Friday, September 14, 2012

SharePoint Deployments Cause Disruptions for "Real World" scenarios

[Here is research I did a few years ago on WSP deployments in MOSS 2007.  I needed to understand which deployments would disrupt users.]

I’ve concluded a series of tests to determine if it’s possible to deploy a SharePoint Solution Package (WSP) for a given web application without disrupting other web applications in a SharePoint farm. I looked at the behavior of various stsadm deployment commands (AddSolution, DeploySolution, Retractsolution, DeleteSolution, ActivateFeature, DeactivateFeature), as well as components within the Solution Packages (WSP files). The results were not promising. In almost all tests, the deployment was disruptive - due to an IISRESET (the most disruptive) or App Pool recycling (less disruptive than an IISRESET).

The following actions were verified to cause disruptions:
  • Feature Activations/Deactivations with SPWebConfigModification methods recycle all App Pools.
  • Globally Deployed Solutions recycle all App Pools.
  • Retractions (of any Global or Web Application Targeted WSP) trigger an IISRESET.
  • Web Application targeted Solutions deployed or retracted using the -allcontenturls switch trigger an IISRESET.
  • Web Application targeted Solutions deployed with GAC DLLs recycles the App Pool associated with the Web Application.
  • Running stsadmin -o upgradesolution triggers an IISRESET.
I looked at 4 Solution Package scenarios:

Scenario 1 represents a minimal solution package, to emphasize the how the stsadm commands behave when a WSP contains almost nothing. Scenario 2 is more realistic, and reflects the typical WSP files we deploy at BigCompany. Here is more detail:

Scenario 1a - The WSP is minimal, containing a single ‘Custom Action” that adds a menu item to the Site Actions menu. No DLL, no web.config changes, no feature receiver or event handlers.

Scenario 1b - Same as Scenario 1a, but added a “do nothing” DLL that gets deployed to the GAC.

Scenario 2a - This represents a “codeless” deployment of a Master Page and related UI elements - with no DLL, no event handlers, no feature receiver. This is a typical BigCompany WSP scenario

Scenario 2b - Another typical BigCompany WSP scenario  - like scenario 2a, but added DLL for feature receiver code that updates web.config.
 
Test Results
The test results are shown in the two tables below. The only cases not causing disruptions were the initial Add/Deploy of a Web Application targeted solution package, and Feature Activation/Deactivation that does not update web.config via SPWebConfigModification methods. Of course, feature receivers can always perform an IISRESET through code, but that is not a recommended practice.



Disruptions Defined
In IIS 6 worker process isolation mode (the mode used in all BigCompany SharePoint farms), each IIS App Pool contains a single worker process (w3wp.exe). An App Pool hosts one or more ASP.NET web applications, which are isolated from each other by running in separate App Domains (a .NET concept) within the w3wp process.

When a web.config file is changed, the associated web application shuts down, and all new HTTP requests to the “shut down” web application are queued by IIS until a new instance of the web application is started within the w3wp process. The w3wp process keeps running, along with other web applications running in the w3wp process. Disruptions perceived by users are minimal…  delayed responses (not sure if http requests awaiting response from WFE are dropped).

 When an App Domain shuts down due to a web.config change, this event log message is generated:





 
When an App Pool is Stopped - the w3wp.exe process (and all ASP.NET apps running within it) is destroyed, and users see “the Service Unavailable” browser error message until the app pool is restarted again. Disruptions to users is more likely to be noticed. Recycling the App Pool is  a “kinder, gentler” version of a stoppage - because IIS will launch a new w3wp process, allow the “old” w3wp process to drain it’s requests, and redirect new HTTP requests to the “new” w3wp process. Users will likely see a delay, but without the “Service Unavailable” message.


IISRESET is similar to an App Pool Stop - but on a broader scale. All App Pools on the IIS machine are immediately stopped, and users will see “Service Unavailable” in their browser.

(note that while the diagram above shows the effect of IISRESET on a single WFE, IISRESETs generated by WSP Deployment timer jobs are done on every WFE).

Finally, the diagram below shows how SPWebConfigModification is unintentionally recycling all the App Pools – an unintended “nuclear option”.





Saturday, July 2, 2011

Multi-Tenant SharePoint 2007 Farm Deployments - Part 1

Introduction
Hosting a SharePoint 2007 farm with multiple tenants presents unique challenges. Multi-tenancy amplifies the everyday challenges of a single-tenant farm – with increased resource demands, more frequent code updates, and larger numbers of web.config files to manage. By “multiple tenants”, I mean different customers (or brands), each with its own MOSS 2007 Web Application within a single MOSS farm. I had the pleasure of working with a multi-tenant farm recently for a Fortune 100 Corporation I'll refer to as BigCompany.
The BigCompany Public-facing Internet farm hosts several MOSS 2007 Web Applications for many well-known household brands. Each brand is a tenant - with its own MOSS 2007 Web Application, extended into three zones (one public-facing Internet zone, and two internal zones), with one IIS web site per zone.

One of the biggest issues with any multi-tenant farm is ensuring that code deployments for one brand don't disrupt the other brands. This is no small feat, since SharePoint's stsadm.exe commands for code deployments typically cause AppDomain restarts and App Pool recycling, causing widespread disruptions across the farm. 

So how can we isolate SharePoint tenants in a way that minimizes deployment disruptions?  Put each tenant in their own App Pool?  Combine them in some ideal fashion? Let's explore our options.

Wearing your SharePoint Architect hat, suppose you have 16GB of RAM available on each WFE for Web Applications. How many App Pools are ideal to host 15 Brand sites? It depends on your criteria, of course. Consider the three scenarios depicted in the figure below:
Figure 1 - Three App Pool scenarios for Multi-tenant SharePoint Web Applications
















Scenario A – 15 App Pools (one Brand per App Pool) offers the best isolation from other Brand sites, but the worst relative performance (due to cpu context switching between multiple App Pools). Scenario B – a single App Pool - is the opposite, with the best performance/worst isolation. And Scenario C is somewhere in-between. BigCompany opted for Scenario A, because it’s the best for isolating each brand, to help us diagnose issues and minimize App Pool recycling-related disruptions. Or so we thought… until phone calls poured in from a beleaguered SharePoint Operations Team, informing us that web.config changes to one brand site were disrupting all the brand sites by causing all app pools to recycle. Huh? How is that possible? Our investigation found a bug (how wonderful) in a SharePoint API used for web.config modifications.  Read on, dear reader, and I’ll explain.

One of the primary ways to configure a SharePoint Web Application is thru its web.config file(s). When managing only a few files, notepad (and a good memory of your earlier changes) may suffice. Beyond that, you quickly realize that notepad (and your over-taxed memory) is not a viable option.

The BigCompany farm hosts 15 brands load-balanced across six web front ends (WFEs), which translates into 45 IIS web sites per WFE, or 216 IIS web sites in the farm. If nothing else, that's a lot of web.config files to manage.

Microsoft anticipated this managment issue with web.config files, and provided the SPWebConfigModification class to help you.

In keeping with SharePoint’s “all WFEs are clones of each other” philosophy, it’s essential that web.config changes are propagated to every WFE in the farm. The SPWebConfigModification class helps with this propagation by storing the modifications in the Configuration Database. Every SharePoint farm, you’ll recall, has one Configuration DB, which contains all of the settings and configurations that define your farm and the components within it. This architecture allows SharePoint to treat all WFEs as clones of each other, and propagate configurations stored in the Configuration DB to new WFEs as they are added to the farm.
In a perfect world, you would have perfect knowledge all of your configurations ahead of time – before you deploy a web app. That way, you could package all your web.config modifications into one or more SharePoint Features and deploy and activate them, and you’re done. In fact, many custom SharePoint Features – including those provided by the MS Commerce Server 2009 Web Part Extensibility Kit - use this technique. But in the real world, you’ll modify web.config files many times throughout the lifecycle of a Web Application, so using Features as your update mechanism gets expensive because it requires developers to perform the upgrades. A cheaper alternative is using a GUI tool that anyone can use. Unfortunately, Microsoft doesn’t provide one.
With this background, we are almost prepared to understand the bug I alluded to earlier. You see, for all the wonderfulness that the SPWebConfigModification class provides, it comes with a nasty secret. It messes up your nice App Pool isolation strategy by “changing” every web.config file in the farm, regardless of your intentions. And as every ASP.NET developer knows, whenever a web.config file is changed, the App Domain housing it immediately recycles [Technically, SPWebConfigModification doesn’t actually change the contents of every web.config file, but rather, it opens and closes every one. But this is sufficient to trigger ASP.NET to recycle the App Domains]. There are several ways you can get an App Pool to recycle – causing intended as well as unintended disruptions to a Web Applications. Let’s illustrate with a sample farm with two WFEs and three Web Applications and three App Pools, as shown below.

The first three diagrams show well-known (intended) ways to recycle the Web Applications, in varying degrees of granularity:

Our final diagram shows how SPWebConfigModification is unintentionally recycling all the App Pools – an unintended “nuclear option”.

My sources tell me Microsoft was working on a bug fix for SPWebConfigModification. In the meantime, the BigCompany team is looking at workarounds. One option is to schedule a maintenance window for all updates. A more difficult option is to use a rolling update procedure where WFEs are removed from the load-balancing pool, updated individually, and returned to the pool. Stay tuned.