Saturday, July 2, 2011

Multi-Tenant SharePoint 2007 Farm Deployments - Part 1

Introduction
Hosting a SharePoint 2007 farm with multiple tenants presents unique challenges. Multi-tenancy amplifies the everyday challenges of a single-tenant farm – with increased resource demands, more frequent code updates, and larger numbers of web.config files to manage. By “multiple tenants”, I mean different customers (or brands), each with its own MOSS 2007 Web Application within a single MOSS farm. I had the pleasure of working with a multi-tenant farm recently for a Fortune 100 Corporation I'll refer to as BigCompany.
The BigCompany Public-facing Internet farm hosts several MOSS 2007 Web Applications for many well-known household brands. Each brand is a tenant - with its own MOSS 2007 Web Application, extended into three zones (one public-facing Internet zone, and two internal zones), with one IIS web site per zone.

One of the biggest issues with any multi-tenant farm is ensuring that code deployments for one brand don't disrupt the other brands. This is no small feat, since SharePoint's stsadm.exe commands for code deployments typically cause AppDomain restarts and App Pool recycling, causing widespread disruptions across the farm. 

So how can we isolate SharePoint tenants in a way that minimizes deployment disruptions?  Put each tenant in their own App Pool?  Combine them in some ideal fashion? Let's explore our options.

Wearing your SharePoint Architect hat, suppose you have 16GB of RAM available on each WFE for Web Applications. How many App Pools are ideal to host 15 Brand sites? It depends on your criteria, of course. Consider the three scenarios depicted in the figure below:
Figure 1 - Three App Pool scenarios for Multi-tenant SharePoint Web Applications
















Scenario A – 15 App Pools (one Brand per App Pool) offers the best isolation from other Brand sites, but the worst relative performance (due to cpu context switching between multiple App Pools). Scenario B – a single App Pool - is the opposite, with the best performance/worst isolation. And Scenario C is somewhere in-between. BigCompany opted for Scenario A, because it’s the best for isolating each brand, to help us diagnose issues and minimize App Pool recycling-related disruptions. Or so we thought… until phone calls poured in from a beleaguered SharePoint Operations Team, informing us that web.config changes to one brand site were disrupting all the brand sites by causing all app pools to recycle. Huh? How is that possible? Our investigation found a bug (how wonderful) in a SharePoint API used for web.config modifications.  Read on, dear reader, and I’ll explain.

One of the primary ways to configure a SharePoint Web Application is thru its web.config file(s). When managing only a few files, notepad (and a good memory of your earlier changes) may suffice. Beyond that, you quickly realize that notepad (and your over-taxed memory) is not a viable option.

The BigCompany farm hosts 15 brands load-balanced across six web front ends (WFEs), which translates into 45 IIS web sites per WFE, or 216 IIS web sites in the farm. If nothing else, that's a lot of web.config files to manage.

Microsoft anticipated this managment issue with web.config files, and provided the SPWebConfigModification class to help you.

In keeping with SharePoint’s “all WFEs are clones of each other” philosophy, it’s essential that web.config changes are propagated to every WFE in the farm. The SPWebConfigModification class helps with this propagation by storing the modifications in the Configuration Database. Every SharePoint farm, you’ll recall, has one Configuration DB, which contains all of the settings and configurations that define your farm and the components within it. This architecture allows SharePoint to treat all WFEs as clones of each other, and propagate configurations stored in the Configuration DB to new WFEs as they are added to the farm.
In a perfect world, you would have perfect knowledge all of your configurations ahead of time – before you deploy a web app. That way, you could package all your web.config modifications into one or more SharePoint Features and deploy and activate them, and you’re done. In fact, many custom SharePoint Features – including those provided by the MS Commerce Server 2009 Web Part Extensibility Kit - use this technique. But in the real world, you’ll modify web.config files many times throughout the lifecycle of a Web Application, so using Features as your update mechanism gets expensive because it requires developers to perform the upgrades. A cheaper alternative is using a GUI tool that anyone can use. Unfortunately, Microsoft doesn’t provide one.
With this background, we are almost prepared to understand the bug I alluded to earlier. You see, for all the wonderfulness that the SPWebConfigModification class provides, it comes with a nasty secret. It messes up your nice App Pool isolation strategy by “changing” every web.config file in the farm, regardless of your intentions. And as every ASP.NET developer knows, whenever a web.config file is changed, the App Domain housing it immediately recycles [Technically, SPWebConfigModification doesn’t actually change the contents of every web.config file, but rather, it opens and closes every one. But this is sufficient to trigger ASP.NET to recycle the App Domains]. There are several ways you can get an App Pool to recycle – causing intended as well as unintended disruptions to a Web Applications. Let’s illustrate with a sample farm with two WFEs and three Web Applications and three App Pools, as shown below.

The first three diagrams show well-known (intended) ways to recycle the Web Applications, in varying degrees of granularity:

Our final diagram shows how SPWebConfigModification is unintentionally recycling all the App Pools – an unintended “nuclear option”.

My sources tell me Microsoft was working on a bug fix for SPWebConfigModification. In the meantime, the BigCompany team is looking at workarounds. One option is to schedule a maintenance window for all updates. A more difficult option is to use a rolling update procedure where WFEs are removed from the load-balancing pool, updated individually, and returned to the pool. Stay tuned.




No comments:

Post a Comment