As we began work on a new CMS for msnbc.com, one of the major pain points with the existing system that we wanted to address was its database oriented architecture – or DOA. No, not the popular video game (searching for DOA is borderline NSFW). Rather, “dead on arrival” – the morbid term for an ambulance patient who doesn’t survive the trip to the hospital.
A multitude of dependencies on a single central database is, indeed, DOA. First, it doesn’t scale well up to billions of page views per month, or make it easy to achieve redundancy across data centers. Doing so requires all sorts of magic with clusters, mirroring, replication, etc. – in other words, expensive hardware and even more expensive people. (The reason it has worked up to now is our top-notch operations team.)
Second, all of those dependencies mean that changes to the database schema are effectively impossible.
A CMS like ours isn’t a single application. It’s an ecosystem of applications, built by different teams over a long period of time and performing a wide variety of functions: filtering and aggregating hundreds of inbound news wires in different formats, encoding all the video from one of the largest news organizations in the world, supporting journalists on tight deadlines around the clock, and – oh, yeah – serving up all that content to millions of people a day.
Most of those applications create or manipulate content that is stored in the central database. Thus, any change to the database is likely to have side-effects. A change to support one application, simple on its own, turns out to require a major refactoring effort in an unrelated application. Fixing bugs becomes a game of whack-a-mole; new features become more expensive and take longer and longer to release. Worse, a great new feature may not get built at all.
This isn’t really a problem with databases, of course. It’s a problem with dependency management. But it is exacerbated by the fact that accessing a database is inherently a blocking operation – you have to wait for the result. Any single resource that must be available all the time in order for any other part of the system to function correctly is going to cause these types of problems.
For all of these reasons, we opted instead for a service oriented architecture (SOA). That doesn’t mean simply exposing everything as a web service. It means partitioning the system into “business services” – large grained, autonomous sets of functionality. Being autonomous includes having one’s own private database (assuming there is some data for that service to persist). Private means private – the database is not shared with other services.
SOA is not exactly a radical choice, but it does require a radical change in thinking – for developers, for admins, even for end users. It requires letting go of 100% consistency 100% of the time in favor of eventual consistency, letting go of “one database to rule them all,” and accepting that maintaining multiple, service-specific versions of the data won’t be the end of the world.
It takes a lot of time and constant effort to completely change your world view. It’s not unlike a procedural developer being introduced to object orientation. It takes a lot of practice, and dozens of “Aha!” moments to cross the conceptual chasm. As an organization, we’re still in mid-air. I expect we’ll eventually land safely on the other side, and not wind up DOA.
In a future post, I’ll write about how we drew the boundaries between our business services.