ITIL Service Operation and the NOC: A Quick-Guide and Checklist

The Information Technology Infrastructure Library (ITIL),* like other IT service management (ITSM) frameworks, is often applied to the Network Operations Center (NOC) to guide and document its processes, functions, and roles.

ITIL helps IT functions like the NOC organize and operationalize themselves. It offers a playbook of prescriptive and guiding practices that help teams answer questions like:

  • “How should we handle incidents?” 
  • “How should we handle events?”
  • “How should we handle configuration items or service requests?”

ITIL’s processes, procedures, tasks, and checklists are neither organization-specific nor technology-specific. Rather, they’re flexible, scalable, and versatile enough to offer generic instruction you can apply to develop your own strategies, deliver your own services, and maintain your own competencies.

While the NOC intersects to some degree with all five processes in the ITIL lifecycle, here we explore that intersection in greater detail for one process in particular: ITIL Service Operation.

Use this guide to understand how ITIL Service Operation is relevant to the NOC, what challenges typically hinder a NOC’s ability to execute ITIL’s processes, and which best practices to look for in a NOC when ITIL alignment is critical.

Contents:

Understanding the NOC’s Role in ITIL Service Operation

ITIL Service Operation’s objective overlaps neatly with the NOC’s: Ensuring services are delivered within the agreed-upon service levels.

ITIL offers a clear, consistent, and repeatable workflow NOCs can follow to achieve that objective—the “plays” in the playbook.

When looking for a NOC support partner capable of delivering on ITIL practices, ITIL’s “maturity” levels—grades that signal where an organization, team, or person falls on the ITIL Maturity Model—can be useful for two critical evaluative tasks:

  • Determining if a NOC support provider has integrated ITIL to the degree you need; and
  • Comparing prospective vendors based on the sophistication of their service operation in ways that directly impact the support delivered

ITIL defines five levels of maturity that correspond to a one- to five-level rating, with level zero indicating a process or function that is entirely absent:

  • Initial
  • Repeatable
  • Defined
  • Managed
  • Optimized

Since each maturity level corresponds with specific definitions and characteristics, this simple rating system can tell you a lot about NOC’s capabilities and its adherence to the ITIL framework.

Many enterprises and large service providers will find the most crucial distinction for a NOC between levels 3 and 4. This is the “dividing line” between an operation measuring and continually optimizing itself from one that is not. 

This distinction can be critical as it can significantly impact the quality of support you can expect a NOC to deliver. Simply put, NOCs operating at a level 3 or below can’t demonstrate that they have the systems in place to continually improve themselves and pass those improvements onto their clients through better service. As a result, these less “mature” NOCs typically remain “stuck in place,” unable to see their shortcomings and opportunities, let alone address them.

The takeaway here is especially important for enterprises and service providers: Without a continual improvement program, an immature NOC will likely struggle to meet its current service demands, much less any more stringent demands required of it in the future.

A more “ITIL-mature” NOC, by contrast, can demonstrate that its processes and functions are, in the case of level 4 maturity, “under constant improvement,” or in the case of level 5 maturity, subject to a “self-contained continuous process of improvement.” 

You can expect to find a robust Continual Service Improvement (CSI) program integrated into the organization at these higher maturity levels. Some providers even take this a step further, making CSI a component of an even larger Customer Experience Management (CEM) program that stretches continual improvement to all service dimensions.

We’ve broken down ITIL Service Operation into its component functions, explaining how the NOC applies to each, and perhaps more importantly, what to look for in a NOC to ensure it provides the capability and level of ITIL alignment you require.

Event Management

In large organizations, many events must be managed across the IT infrastructure each day. ITIL clearly states that event management’s objective is to detect these events, analyze them, and determine what action is necessary, if any. Event management also happens to be one of the NOC’s primary responsibilities.

Here, and with each appropriate section below, we identify the common challenges NOCs face and which best practices to look for in a potential NOC support partner.

Common NOC Challenges:

  • No “single pane of glass”: Perhaps the most significant problem preventing NOCs from optimal event management is the lack of a “single pane of glass” through which information and activities can be processed and managed. Engineers simply can’t do their best when they spend their precious time chasing down data from disparate sources.
  • A weak (or non-existent) configuration management database (CMDB): Another common problem hindering many ITIL functions (event management especially) is the lack of an effective CMDB. Simply put, ITIL’s success relies on a robust CMDB. The NOC must have a list of all assets it’s monitoring and managing and the relationships between them so the impact of any event can be anticipated. Creating and maintaining a robust CMDB is no small task and often requires the attention of experts.
  • Unnecessarily high event volume: High event volume is often a symptom of poor event correlation (which, ironically, can be due to a lack of a robust CMDB). When the NOC can’t appropriately correlate event information, either with human staff or AIOps, it can result in a deluge of unnecessary events that drags down morale and threatens the NOC’s ability to meet desired service levels.

NOC Best Practices Checklist:

  • All NMSs and EMSs integrated into a single platform/view: The NOC should be able to demonstrate it uses a centralized platform to receive, acknowledge, and process events all within your SLA windows.
  • Advanced event correlation: Especially at the enterprise and service provider level, your NOC should take every advantage of the advanced correlation, machine learning, and automation tools available today to improve the accuracy of its work, identify issues faster, and respond to incidents wherever possible to reduce resolution times. Read our white paper, The Role of AIOps in Enhancing NOC Support, for a deeper dive into these tools and capabilities.
  • A robust CMDB: Every ITIL-aligned NOC should have a robust CMDB capable of managing all asset relationships, assuring you that the impact of any change or activity will be known and addressed.

Incident Management

ITIL defines an incident as an unplanned interruption to a service, a reduction in the quality of a service, or an event that has not yet impacted the service to the customer or user.

ITIL also states that incident management aims “to minimize the negative impact of incidents by restoring normal service operations as quickly as possible.” It’s another core function of the NOC, which relies on recognizing and following incident management best practices to fix what is broken as quickly as possible.

Common NOC Challenges:

  • No knowledge base: Many NOCs simply “wing it” instead of having a good knowledge base their staff can reference when an incident is received. Without a knowledge base, hard-won lessons learned through experience can’t be captured and shared for all to use.
  • A lack of runbook automation: Absent or inadequate runbook automation can force human engineers to waste valuable time retrieving basic information, thereby delaying incident processing. Runbook automation allows machines to step in and do the data-digging so humans can focus on analyzing the issue.
  • A lack of incident management workflow: Simply put, many NOCs don’t have clear processes to follow, even for essential functions like incident management.

NOC Best Practices Checklist:

  • A robust knowledge base: “Robust” in this case means more than just having a knowledge base. It should be a centralized source for all knowledge and documentation. It should be accessible to the entire team. It should be continuously updated as new experiences bring new lessons that can be documented and shared for future reference.
  • Alarm-to-action guides: The NOC’s runbooks should group the top incident-generating alarms and make the process for resolution clear. Here at INOC, for example, we often prioritize the top 90% within the volume of alarms. Most of these issues can be handled at Tier 1, thereby lightening the load on advanced engineers. More on that here.
  • Effective and efficient incident management workflows: A high-performing incident management workflow deserves a whole discussion onto itself, but in short, a NOC should be able to demonstrate that, first, a clear incident workflow has been established, and second, that the workflow is as streamlined and effective as it needs to be to achieve or exceed service levels.

Request Fulfillment

Request fulfillment generally falls outside of the NOC’s purview and instead to the Help Desk or a similar function. Everyday tasks here include resetting passwords, upgrading laptops, and taking on other typical IT requests.

However, NOCs do handle some specific request fulfillment functions, such as carriers’ reasons for outages (RFOs). Following the resolution of an incident, for example, a NOC will often be requested to provide a root-cause analysis or RFO, which requires a call to the carrier. This is a request for additional information about why something happened and a prime example of the very narrow types of request fulfillment that fall to the NOC.

Access Management

Access Management is another area with a somewhat limited intersection with the NOC. 

There are certain tightly-compliant financial and government agencies where, depending on the nature of a support relationship, the NOC must be the conduit for these bodies to access a particular service. In these situations, access management is handled through the NOC.

Problem Management

Effective problem management is an ITIL function that provides greater long-term stability and solves trending incidents at their root. 

Although problem management is less “visible” than incident management, it’s just as critical. Very often, problem management isn’t referred to by name but instead asked for by function: “We’re seeing the same set of incidents over and over again—can you help?”

Problem management is an advanced support service you can expect to find with a level 4 or 5 ITIL-mature NOC, one that offers it as a part of a broader CSI or CEM program. Again, this capability is precious, especially at the enterprise and service provider levels as it proactively reduces the number of incidents from occurring in the future.

Common NOC Challenges:

  • No problem management function: Many NOCs, especially the less ITIL-mature, simply don’t offer problem management, and therefore aren’t equipped to address trending problems that manifest as repeat incidents.
  • A lack of deep expertise: Among the NOCs that do offer problem management, differences often boil down to technical expertise. Even the best problem management processes won’t work unless they’re carried out by highly-skilled Tier 3 engineers who have seen and solved many problems and know how to identify and remediate genuine root causes.

NOC Best Practices Checklist:

  • A NOC structure that can support effective problem management: Most NOC activities, including problem and incident management, are 24×7 activities that require dedicated resources. A NOC’s operational support structure should enable managers to assign routine activities to lower-cost first-level teams and enable higher-level technical teams to focus on more advanced issues, like diagnosing and resolving problems.
  • A proactive approach to problem management: A level 4 or 5 ITIL-mature NOC may likely be practicing proactive problem management regularly to reveal new opportunities for improvement. A high-performing NOC should be able to demonstrate investment in proactive problem management.
  • An investment in NOC staff: Technical training and career progression opportunities are both indicators that a NOC’s problem management function is top-notch. A training program should include both initial onboarding and ongoing training and clear paths for employees to advance from one level to the next or move into other departments within the organization.

Read also: How Problem Management Benefits NOC Support

IT Operations Management

IT Operations Management is mostly outside the NOC’s scope, falling instead to a systems infrastructure team or similar function.

The NOC monitors and controls IT operations, but in an efficient operation with well-defined roles, the work of patching and maintenance is handled by other teams.

Technical Management Function

ITIL’s Technical Management objective is to provide technical expertise and support for managing the IT infrastructure. This is where a NOC with Tier 2 and 3 staff offers something distinct from Tier 1-only NOCs.

But merely having advanced technical expertise doesn’t necessarily confer any enhanced capabilities. An operational support structure must be in place to ensure those resources are utilized when and where they should be. 

Here at INOC, for example, we’ve developed a NOC operations structure to radically transform where and how support activities are managed—both by tier and category. In a matter of months, the value of a useful operational framework like this becomes abundantly clear as support activities steadily migrate to their appropriate tiers, lightening the load on advanced engineers while working and resolving issues faster and more effectively.

Common NOC Challenges:

  • No advanced technical expertise: Many NOCs simply don’t staff advanced engineers to carry out higher-tier support.
  • An absent or inadequate operational structure: Even when technical expertise is present, many NOCs aren’t properly operationalized to utilize them properly. Advanced engineers are busy working on issues that, with the right systems in place, Tier 1 staff can handle.

NOC Best Practices Checklist:

  • A deep bench of technical expertise: To effectively carry out technical management commensurate with your IT infrastructure’s demands, your NOC should offer a full range of technical knowledge.
  • A proactive approach to problem management: A NOC should offer a multi-tier operational support structure that enables managers to leverage the lower-cost first-level or Tier 1 team to perform routine activities, so high-level technical teams are free to focus on more advanced support issues. By following an operational methodology that utilizes a tiered support structure in full alignment with the ITIL framework, the NOC can rapidly respond to incidents and events and continue to implement changes as needed.

Final Thoughts and Next Steps

Like other proven ITSM frameworks, ITIL helps teams design and deliver services as effectively and efficiently as possible. Whether you’re implementing ITIL internally or integrating with an ITIL-aligned external support function, achieving success requires participation at every level of the organization and guidance from experts with extensive first-hand experience.

By working with a NOC service provider that has seen and solved many ITIL implementation challenges, you can radically improve the performance of your in-house NOC function or turn up outsourced support on a NOC that consistently delivers outstanding service results.

Here at INOC, we help organizations with both of these critical needs through award-winning outsourced NOC support and NOC operations consulting services.

  • NOC Support Services: Our NOCs monitor tens of thousands of infrastructure elements around the clock. High-level NOC management expertise and custom-built systems ensure you and your customers achieve the infrastructure performance and availability needed to grow and thrive no matter how your IT environment evolves or what new challenges arise. By following an operational methodology that utilizes a tiered support structure in full alignment with the ITIL framework, our NOC can rapidly respond to incidents and events and continue to implement changes as needed, all under a more cost-effective service model.
  • NOC Operations Consulting: We also deliver comprehensive best practices consulting for designing and building new NOCs and helping existing NOCs significantly improve the support provided to you and your customers. Our approach to high-quality support aligns and integrates each function of NOC support operations to enable more informed, consistent decision-making in line with the ITIL framework.

Interested in learning more about ITIL-aligned NOC operations support? Contact us to see how we can help you improve your IT service strategy and NOC support or download our free white paper below.

A Practical Guide to Running a NOC


FREE WHITE PAPER

A Practical Guide to Running an Effective NOC

Download our free white paper and learn how to build, optimize, and manage your NOC to maximize performance and uptime.

Download

 

 


*Originally developed by the UK government’s Office of Government Commerce (OGC) - now known as the Cabinet Office - and currently managed and developed by AXELOS, ITIL is a framework of best practices for delivering efficient and effective support services.

SUBSCRIBE TO RECEIVE NEW POSTS IN YOUR INBOX

Let's talk NOC

Book a free NOC consultation and explore support possibilities with a Solutions Engineer.

Book now

RECOMMENDED CONTENT

Put our expertise to work for you

Let's unlock the full potential of your infrastructure and keep it running 24X7. Tell us a little about yourself, your infrastructure, and your challenges. We'll follow-up within one business day by phone or email.

Contact Us