Efficient processes are the backbone of an effective NOC. They’re one of the first things companies must consider when building or scaling a support operation.
A well-defined process framework offers much-needed consistency to NOC operations by providing a specific set of procedures for handling various support situations.
Years ago, a NOC’s performance was largely determined by the company’s services; for example, whether it was an enterprise or service provider type of organization. Today, the concept of IT service management based on the Information Technology Infrastructure Library (ITIL*), has provided a standard set of best practices for developing, maintaining, and improving IT services for any type of NOC operation.
Here, we discuss the 5 most essential processes presented by the ITIL framework that makes for a functional NOC. Use the links below to jump around.
- Event Monitoring and Management
- Incident Management
- Problem Management
- Capacity Management
- Change Management
📄 This post is excerpted and adapted from our free white paper: A Practical Guide to Running an Effective NOC
The Importance of an Efficient Operational Framework, Like ITIL
Operational frameworks like ITIL are used to guide and document processes, functions, and roles in order to organize and operationalize your NOC, providing a playbook of prescriptive and guiding practices.
While there are several process and management frameworks to choose from, such as MOF and FCAPS, ITIL is widely used and is useful in achieving the ISO 20000 certification, an international standard for IT service management best practices.
ITIL is neither organization-specific nor technology-specific. Instead, it provides flexible, scalable, and versatile instruction that can be applied to developing your own strategies, delivering your own services, and maintaining your own competencies.
This post offers an overview of each of the five key processes. Check out our other post for a much deeper dive into applying ITIL to the NOC, specifically: ITIL Service Operation and the NOC: A Quick-Guide and Checklist.
1. Event Monitoring and Management
Event Monitoring and Management is the process by which a NOC monitors the performance of infrastructure and systems, detects interruptions in service or other issues (events) that must be resolved, and processes them.
“Events,” in this context are communications of issues that can take the form of system alarms, calls, emails, or chats. Once an event is detected, it’s evaluated, correlated, and acknowledged, and if further management is needed, it’s logged into an incident or ticket.
To manage events, the NOC uses tools such as Network Management Systems (NMSs), Element Management System (EMSs), Application Performance Management (APM) tools that filter messages from infrastructure using protocols such as SNMP, TL1, WMI, and, more recently, gRPC and gNMI.
- High-performing NOCs are capable of consolidating alarm sources and event information as well as integrating essential runbooks, documentation, customer portals, knowledge bases, and other tools into single views, such as a dashboard.
- This makes it much easier to report on SLA performance metrics and optimize performance compared to managing multiple screens and manually collecting information from different platforms.
In building your own NOC, consider integrating alarm monitoring, ticketing systems, and communications systems—and how you can leverage the power of machine learning and automation via AIOps to achieve significant efficiencies.
2. Incident Management
ITIL states that Incident Management aims “to minimize the negative impact of incidents by restoring normal service operations as quickly as possible.” In short, effective incident management enables the NOC to fix what is broken as quickly as possible.
Incident Management involves the creation and processing of tickets. When an alarm is sounded in the NOC, a ticket is created in the service management platform or ticketing system containing details about the incident, providing support when a network, system, or application event requires action. Engineers in the NOC try to resolve the issue and may request further action from colleagues using email, phone, or messages.
Typically, NOCs contain engineers of different levels or “tiers” of skill and experience, and tickets may be passed from lower to higher levels when less-skilled engineers are unable to resolve the issue. Operationally, these tickets not only serve to assign work to different individuals but act as a record of completed work that can be analyzed later in reports in order to manage and optimize operations.
👉 Read our other posts for a deeper dive into Incident Management in the NOC, including the process’s lifecycle and a few best practices you can implement yourself:
- Incident Management: The Foundation of a Successful NOC
- 5 ITIL Incident Management Best Practices [+ Checklist]
3. Problem Management
Problem Management is the process of diagnosing the root causes of incidents and requesting changes to resolve these issues. It can be easy to confuse with incident management. However, where incident management seeks to resolve specific incidents, problem management seeks to investigate the underlying problems of incidents in order to prevent future incidents.
Problem Managers analyze data for trends, search logs for likely causes of failure, create plans to prevent incidents in the future, and keep track of problems and workarounds for incident managers. Such sophisticated work requires individuals with high levels of expertise and analytical ability. In this way, problem management serves to optimize your NOC support by reducing incidents before they occur.
It’s important to recognize that Incident Management and Problem Management are different.
- The goal of Incident Management is to restore services as quickly as possible.
- Problem Management aims to determine and address the root cause of an incident or a series of incidents by identifying, tracking, and resolving the underlying problems.
Problem Management tends to be less visible than incident management. Whereas users feel the direct impact of incidents, they are unlikely to be aware of problem management work, because the ultimate objective is to stop incidents before they happen. This lack of visibility can cause organizations to spend the bulk of their NOC resources resolving incidents, rather than investing in problem management to prevent incidents from occurring.
👉 Read our other post to explore more about Problem Management in the NOC: How Problem Management Benefits NOC Support
4. Capacity Management
Capacity Management is the process of ensuring that SLAs are met, by ensuring that the NOC’s business, service, and component capacity needs continue to be met.
This often involves reviewing reports of alarm thresholds while considering desired business outcomes and utilization and ensuring capacity needs are addressed.
While it may seem like a relatively minor point, capacity and staff utilization are frequent problems for many NOCs, due to staffing shortages and turnover, as well as varying levels of incident volume on certain days or times.
Analyzing performance metrics to optimize efficiency and measure how long it takes to resolve an incident as well as identifying recurrent busy or slow times can help NOCs meet capacity needs, which helps to avoid burning out employees and prevent avoidable turnover.
5. Change Management
Change Management serves to reduce risk from inevitable changes in the supported infrastructure environment, from routine or planned changes like password resets or operating system upgrades to emergency changes like rerouting network traffic after a primary WAN uplink becomes unstable.
It’s also a systemic approach used to stabilize planned changes such as shifts in goals, processes, technologies, or insights gained from incident or problem management.
It may not surprise you that change management requires acceptance from every part of the organization in order to be effective.
Final Thoughts and Next Steps
Like other proven ITSM frameworks, ITIL helps teams design and deliver services as effectively and efficiently as possible. As you’re implementing ITIL internally, achieving success requires participation at every level of the organization and guidance from experts with extensive first-hand experience.
By working with a NOC service provider that has seen and solved many ITIL implementation challenges, you can radically improve the performance of your in-house NOC function or turn up outsourced support on a NOC that consistently delivers outstanding service results.
Here at INOC, we help organizations with both of these critical needs through award-winning outsourced NOC support and NOC operations consulting services.
- NOC Support Services: Our NOCs monitor tens of thousands of infrastructure elements around the clock. High-level NOC management expertise and custom-built systems ensure you and your customers achieve the infrastructure performance and availability needed to grow and thrive no matter how your IT environment evolves or what new challenges arise. By following an operational methodology that utilizes a tiered support structure in full alignment with the ITIL framework, our NOC can rapidly respond to events and incidents and continue to implement changes as needed, all under a more cost-effective service model.
- NOC Operations Consulting: We also deliver comprehensive best practices consulting for designing and building new NOCs and helping existing NOCs significantly improve the support provided to you and your customers. Our approach to high-quality support aligns and integrates each function of NOC support operations to enable more informed, consistent decision-making in line with the ITIL framework.
Interested in learning more about ITIL-aligned NOC operations support? Contact us to see how we can help you improve your IT service strategy and NOC support or download our free white paper below.
*Originally developed by the UK government’s Office of Government Commerce (OGC) - now known as the Cabinet Office - and currently managed and developed by AXELOS, ITIL is a framework of best practices for delivering efficient and effective support services