Poor documentation is the root of many problems in IT operations. Without formal processes and procedures that are well-documented and accessible, even highly-skilled professionals can struggle to achieve consistent desired results.
This guide takes you inside an effective runbook in the network operations center (NOC) to see how high-performing support teams document their processes. These lessons apply to runbooks used by just about any team responsible for IT service management, whether it’s a formalized NOC or not.
Having spent the last 20+ years crafting runbooks for use in hundreds of support environments, we’ll deconstruct the “anatomy” of our own runbooks to provide a model to assess against your own. Use this guide to articulate your need for better runbooks, create your own, or refine your needs to bring to an external service provider who can develop effective runbooks for you.
- What a NOC runbook is
- What makes them effective
- What a runbook should include
- How to maintain them
- Tips for writing them
- Outsourcing runbook development to third-party NOC experts
Need expert-driven runbook development? We can help. Learn more about our runbook services and our broader NOC operations consulting services, and get in touch with us to schedule a free NOC consultation to explore possible solutions.
Before we jump in, let’s level-set on the basics.
What is a NOC Runbook?
A NOC runbook is a set of standardized documents, references, and procedures used to describe common ITSM tasks carried out in the NOC. A runbook walks staff through the steps necessary to accomplish a specific task or troubleshoot a particular issue.
Runbooks are useful both for seasoned professionals and novice engineers. They can refresh one’s memory for a task they haven’t encountered in a while or provide the critical step-by-step guidance a new engineer needs to execute processes they’re not familiar with.
At INOC, we view the runbook as the single source of truth for everyone inside and outside the NOC. It provides clarity into processes and galvanizes teams to coordinate around the same sets of instructions so actions are consistent no matter who is executing them.
What Makes a NOC Runbook Effective?
Runbooks are tools, and to be effective, they have to do a few things well:
In short, a good runbook ensures everything is fully documented and presented for clear, consistent, quick action.
What Should a NOC Runbook Include?
Before getting into the contents of a NOC runbook, we should acknowledge that every runbook is different. The best practices we prescribe here won’t apply in every situation. Approach this guide as a generic framework to mold around your specific needs.
Broadly speaking, an effective NOC runbook typically has four key parts:
A few critical pages within the runbook
A NOC runbook typically contains multiple pages, each having its own purpose as a reference. Again, any specific runbook might contain different pages based on what’s needed, but generally speaking, most of our runbooks include the following pages.
“You almost always need to have a page for escalation contacts in one spot. And then either one or multiple pages that provide clear instructions for the kinds of alarms the NOC will receive. The key questions the runbook should answer with respect to troubleshooting are, What are we getting? How are we reacting to it? And what does the NOC need to do with it?”
— Skylar Carlino, INOC
Here's a simplified example runbook page for alarm response that visualizes the points we've been covering with some key call-outs:
Maintaining a NOC Runbook
Change is a constant in almost every IT environment, so processes have to change, too.
Because the NOC engineers are the ones using these processes every day, they’re usually the first to notice something is out of date, broken, or changed. In that way, runbook maintenance is always reactive to some degree, so an open line of communication to make sure changes are made is key.
NOC engineers should be invited to pass observed changes off to those responsible for runbook management and have an easy way to do so.
Expiration dates and regularly scheduled reviews can go a long way in catching these changes proactively. Here at INOC, we establish annual expiration dates on each knowledge article we develop with a report that notifies us when something is set to expire a month ahead of time. This way, we have a rolling review program across all our runbooks.
Tips for Writing Excellent Runbooks
Here are a few additional pieces of advice from our runbook team.
1. Anticipate possible outcomes and write them into your processes.
One of the hallmarks of a truly effective runbook is the absence of any “dead-ends” in the processes. A process anticipates every possible outcome of a given action or status and provides instructions accordingly.
“If the node is up, do this; if the node is down, do this.”
Every reasonable possibility is accounted for so engineers don’t get stuck without instruction—possibly extending an outage.
“It can be a challenge to know what you’re going to see from a network at the outset of establishing support for it. That's where we draw on our existing knowledge to start mapping what might happen in that network based on our clients’ devices and what they want the NOC to do.
We often work closely with our Advanced Technology Services department. This team will actually comb through the network and determine what we need to be seeking to accomplish what we need to deliver. That includes cataloging the important alarm information and filtering out truly unactionable noise.”
— Eric Idler, Director of Shared NOC, INOC
2. Carefully consider format and platform.
Today’s digital knowledge base platforms are a far cry from the static, “unsophisticated” documentation systems from years ago. Of course, if a legacy system is working well, there may be no need to update it. But having seen countless IT teams dragged down by systems that get in the way of efficiency rather than enabling it, it’s important to realize that modern platforms solve many of the problems teams used to more or less be stuck with.
Here at INOC, for example, we’ve embraced the concept of the modern knowledge base and applied it to the way we structure, write, and manage our runbooks. Rather than forcing staff to frantically search long, unwieldy documents, our runbooks are shorter, more skimmable separate pages that use a simple one-click linking structure to make it fast and easy to navigate between them.
In addition to paginating our runbooks within a knowledge base system, we’ve also refined the format into the example shown above in this guide.
One of the most important elements of our runbook template is the “boilerplate” info boxes that appear at the top of each page within a given runbook. These are the same from page to page and present information we’ve found engineers routinely need at their fingertips no matter where they are in the runbook. Rather than constantly going back to a separate “info” page, that information is present on every page.
3. Run the process as you document it—and pressure-test it afterward.
If possible, give your process writer access to whatever they need to actually step through the process themselves so critical details can be captured that may not be otherwise. Tools and platforms are rife with quirks that need to be guided around—and there’s no better way of addressing these details than running the process.
Also, after a process is documented, don’t assume it can’t be immediately refined. Hand it off to someone else that would be responsible for actually executing it and have them pressure-test it. Use their feedback as input for improvement before it ships to the team.
INOC’s Skylar Carlino explains:
“If I'm doing an alarm response article, I want to be able to execute the process as I'm writing it to capture everything in detail. And when I think I'm done with an article, I'm running it by someone in the NOC to make sure that they can run it. I've worked in the NOC before, which gives me an advantage as a process writer, but processes should still be validated by those doing the work.”
— Skylar Carlino, INOC
4. Take the opportunity to eliminate “assumed knowledge.”
Whether you call it “tribal” knowledge, “assumed” knowledge, or any other term, we’re talking about one of the worst problems in IT operations: letting important knowledge live only inside someone’s head.
Some teams operate almost exclusively on assumed knowledge. There are no runbooks. Other teams document their processes but bake some assumed knowledge into them. They skip writing down the steps that are “obvious.”
This works—until it doesn’t. Whether it’s an employee leaving the team and taking that knowledge with them, or it’s time to outsource or augment some of the work to an external team, everything that wasn’t written down can very quickly become “missing information” and generate headaches for all involved. Simply put, don’t leave anything to assumption.
5. Deal with length thoughtfully.
While some runbooks are too “light” on info, others go overboard—including unactionable details in an effort to be exhaustive. A runbook that’s too long can get in its own way. Typically, runbooks that are too long contain peripheral information that someone thought might be useful in some cases—such as the history of a device—but most of the time, won’t be needed.
Rather than obstruct the process with that detail, stick it on the reference sheet so it’s there if and when it’s needed, but not imposed on everyone trying to execute a process.
Outsourcing Runbook Development to Third-Party NOC Experts
Here at INOC, we routinely deliver expert-driven runbook development as a professional service component of our NOC operations consulting services. We work closely with teams looking to radically improve their support workflows by understanding and documenting their processes into a single source of truth for everyone inside and outside of the NOC.
Clients turned up on our 24x7 NOC support service also receive detailed runbooks as part of that service—documenting all the work steps our NOC will execute for managing operations, troubleshooting, and escalations. (While primarily an internal-facing document, our NOC clients get full visibility into our processes and how our team and tools will interface with theirs.)
If you’re looking for expert-driven runbook development, here are a few tips that can make an engagement go smoothly:
Final Thoughts and Next Steps
The effectiveness of a NOC runbook boils down to a few key qualities:
- Key contents — An effective NOC runbook includes Infrastructure Documentation, Process Documentation, Links to Tools/Contacts/Data, and Alarm-to-Action Guides. This information is expressed across a few key pages, including Client Information, Escalations, Device Access, Maintenance Support, and Response.
- Intentional management and maintenance — Those responsible for keeping runbooks current should invite process changes from NOC engineers noticing changes happening across the supported environment and also work proactively to keep processes fresh through expiration dates and regularly scheduled reviews.
- Format and structure — The runbook platform and its layout should enable rather than inhibit UX.
Want to learn more about our NOC runbook services and how we can help you achieve peak performance while saving your team valuable time and resources? Contact us or use our consultation request form to tell us a little about yourself, your infrastructure, and your challenges. We'll follow up within one business day by phone or email to schedule a time to learn more and explore solutions.
Want a handy guide to solving the top challenges NOCs face today? Download our free white paper below.
FREE WHITE PAPER
Top 10 Challenges to Running a Successful NOC — and How to Solve Them
Download our free white paper and learn how to overcome the top challenges in running a successful NOC.