Despite being critical to the success of a technical support operation, many network operations centers (NOCs) fail to meet desired service levels.
Rather than delivering meaningful ROI by maximizing infrastructure uptime and performance, ineffective NOCs consume management and financial resources, leaving business services vulnerable.
Most of the time, the root cause of an underperforming NOC is a lack of a centralized support framework that incorporates best practices and implements them. Such a structure is essential to the success of a NOC, as it makes decisions and actions consistent across the people, processes, and platforms that comprise it.
Without an authoritative blueprint for operating, costly inefficiencies are inevitable and serious risks will continue to pose constant threats to performance and availability—a problem that will only worsen as business services and the technologies they rely on scale in size and complexity.
Here, we explore ten NOC best practices for keeping even the largest, most complex infrastructure environments up and running at peak performance 24/7.
đź“„ Download our white paper, "Top 10 Challenges to Running a Successful NOC," for a handy reference that discusses each of these best practices and their corresponding challenges in greater depth.
đź“„ Read our other white paper, "A Practical Guide to Running an Effective NOC," for a set of actionable steps you can take to put these best practices to use.
Need help putting these best practices into action? Let's talk NOC. Schedule a free NOC consultation and connect with our Solutions Engineers about improving your current support operation or getting exactly the level of third-party support you need.
One of the biggest hurdles to success is organizing your NOC activities and workflows based on your specific technologies and skill levels.
Once you’ve cleared this hurdle, however, you’ll almost certainly be able to handle events and service requests and resolve incidents at the appropriate tier, and faster than before. Based on data collected across our NOCs, we found this structure can enable a NOC to resolve 65% to 75% of incidents at the Tier 1 level while reserving Tier 2 and 3 staff for more advanced issues.
Classifying NOC activities is often the first step in implementing a tiered structure. Use the following model for developing your own classification system:
Figure 1 below illustrates a well-organized tiered NOC support structure in action. Here, the Tier 1 team uses monitoring tools and interacts with end-user help desks, as well as Tier 2 and 3 engineers and third parties. Information flows between the various entities within a well-defined process framework.
Having such a structure for properly managing your workflow can prevent your NOC from being overwhelmed by the “wall of red” NOC teams strive to avoid at all costs. In most NOCs, issues should be prioritized and organized into a set of queues, so the appropriate group can handle each of them.
Download our white paper, "Top 10 Challenges to Running a Successful NOC," for a set of example workflow queues you can use to break up issues and assign them to groups based on skillset.
This best practice addresses the following problem indicators. Talk to us to explore a NOC solution if you're experiencing any of them:
Modern NOC tools make it easy to generate metrics, but tracking meaningful metrics takes diligent work. Metrics are essential for continuous improvement from both a technical and motivational standpoint—helping teams recognize successes and keep morale high.
Anyone who works in a NOC likely hears things like, “We’re always busy,” or “I feel like we can never catch up,” or “My coworkers are not pulling their weight.” These sentiments are understandable given the fast-paced environment of a NOC and the constant multitasking that is required of those who work in it.
To ensure accomplishments are recognized, it’s important to set performance objectives and evaluate them on a daily, weekly, and monthly basis. Since the amount of data available to a NOC is daunting, choose the most applicable and actionable metrics to your specific operation. These should reflect the size and scale of your operation and the key performance indicators (KPIs) that measure performance against relevant organization objectives.
KPIs to consider include first-call resolution, percentage of abandoned calls, mean time to restore, and the number of tickets and calls handled.
Aside from KPIs, there's another category of metrics that is often a complete blindspot for many NOCs: utilization metrics.
These metrics reveal when and why the NOC is or isn't busy—and what it's busy with—so staffing levels can be fine-tuned for peak efficiency.
Read our post for a deep dive on these metrics: NOC Performance Metrics: How to Measure and Optimize Your Operation
This best practice addresses the following problem indicators. Talk to us to explore a NOC solution if you're experiencing any of them:
Running a 24x7 NOC requires staffing three shifts a day, 365 days a year. Consider the following factors when developing a staffing strategy:
Effectively staffing a 24x7 NOC starts with a well-organized structure. The tiered NOC support structure and workflow queues discussed in our first best practice are good starting points for determining the skill level required of your NOC staff.
Download our white paper, "Top 10 Challenges to Running a Successful NOC," for an example of a skills-based NOC structure that can support 24x7 NOC requirements and provide a growth plan to maximize employee retention.
Consider the overall activity of your NOC, including the volume of calls, emails, and alarms handled by hour of day, day of week, and type of support engineer, as well as the duration of incidents.
Consider the benefits that your company provides for employees in the context of the needs of your operation to ensure understaffing isn’t a risk. For example, if your company provides 10 holidays and four weeks of PTO per employee, these hours need to be accounted for to ensure that your NOC runs smoothly.
A NOC training program should cover initial onboarding as well as ongoing training. A truly comprehensive training program can take up to six months of various classes and on-the-job instruction before an engineer is ready to take on NOC support responsibilities. After work has begun, monthly or quarterly training sessions should be scheduled to keep engineers’ skills fresh and to update the support team on new types of services, new customer requirements, and new equipment.
Based on your historical data and industry standards, a certain attrition rate within the NOC should be taken into account. Factors that affect retention rates include company culture and NOC organization (i.e., whether there’s a clear path for employee growth from one level to the next or to other departments within the organization).
By making these calculations, you can better plan for staffing and training needs. For example, assuming that a typical engineer works five years in your NOC (a retention rate of 80%), you’d need to hire an additional 20% of staff each year.
Read our post for a deeper discussion on staffing a NOC: Staffing a 24x7 NOC: Costs, Challenges, and Key Considerations
This best practice addresses the following problem indicators. Talk to us to explore a NOC solution if you're experiencing any of them:
Inconsistency is one of the main reasons NOCs don’t perform at optimal levels. Being reliably consistent requires a standardized process framework that arms your NOC with specific procedures for handling various support situations.
There are several process and management frameworks to choose from, including MOF, FCAPS, and ITIL. The ITIL* (IT infrastructure library) service framework, in particular, has become immensely popular as it’s useful in achieving the ISO 20000 certification and provides its own set of best practices to follow when delivering technology support services. It also offers the flexibility to include your organization’s custom procedures under its umbrella of lifecycle stages.
Process frameworks can be overwhelming when considered in their entirety. When aligning with one of these standards, we recommend first tackling the specific areas that are your organization's biggest challenge. Typically, these are incident management, problem management, and the service desk. Once these functions are standardized, you can move on to other priority areas, such as change management and service continuity management.
It’s critical to get your whole organization involved in implementing the process framework and in ongoing education. Training is essential to get all staff talking the same language and following the same guidelines. Comprehensive information and training are available for ITIL, ISO 20000, FCAPS, and other methodologies.
This best practice addresses the following problem indicators. Talk to us to explore a NOC solution if you're experiencing any of them:
A business continuity plan (BCP) is essential for managing risk in your NOC operations—a fact made very clear by the COVID-19 pandemic, which exposed serious shortcomings in many firms’ readiness.
(Download our white paper, "A 5-Step Strategy for NOC Business Continuity Planning in Response to COVID-19," for a comprehensive guide to ensuring your BCP is prepared for such a disruption.)
The BCP provides a blueprint for NOC staff and management to follow when recovering from a disaster or other adverse situation. When properly executed, it ensures that operations recover quickly and effectively so any negative impact on the business is minimized.
Without an effective BCP in place, your NOC will almost certainly remain vulnerable to the following problems if a disaster or significant workforce disruption affects your operation:
Key representatives from a cross-section of the organization need to be involved in creating a BCP. This may include outside vendors. Whether you’re developing one from scratch or want to evaluate an existing BCP, make sure it contains the following at the bare minimum:
This best practice addresses the following problem indicators. Talk to us to explore a NOC solution if you're experiencing any of them:
NOC teams must measure service quality and provide quality assurance continuously or risk damaging customer satisfaction and compromising the NOC’s reputation. Effectively and consistently executing a runbook (i.e., processes and procedures) is paramount to meeting a NOC’s service level requirements.
The detailed monitoring of key network and IT assets and services is core to meeting these objectives. This monitoring, data collection, and correlation—typically accomplished using a variety of protocols and tools—is the entry point into incident handling and problem management processes.
Other sources of data include calls and emails (among still others). The NOC runbook, created during onboarding and updated regularly, is key to what follows next. Documenting agreed-upon processes and procedures for the specific customer environment provides the NOC team with an essential operational reference.
A good quality control program monitors and measures primary aspects of your NOC service via its KPIs. These KPIs provide much-needed visibility into NOC support activity, responsiveness, and effectiveness. NOC management can use this information to ensure, for instance, that stated objectives for event-to-action times and first-level incident resolution are being met for each customer.
Quality control also detects chronic issues so management can find appropriate solutions—for example, correcting relevant runbook procedures, ensuring complete documentation is available to the NOC, or providing additional staff training. A monthly audit of a subset—say, 10%—of all tickets created is an important part of an ongoing review. Staff mentoring is also key to quality control and helps ensure high levels of customer satisfaction.
A quality or service assurance program enables your NOC to identify and resolve problems before they significantly impact customers or the business.
A quality assurance review begins when a customer reports dissatisfaction with any aspect of the NOC service. NOC management follows up with an internal review of the service, evaluating responsiveness metrics, adherence to runbook procedures, customer interaction, and technical troubleshooting, to name a few.
Such quantitative and qualitative measures and the resulting feedback lower the chance of the same problem recurring. Monthly and quarterly reviews of the service with stakeholders ensure that customer expectations remain met.
This best practice addresses the following problem indicators. Talk to us to explore a NOC solution if you're experiencing any of them:
NOCs operating at peak efficiency can receive and process alarm or event information from multiple sources and present it in a consolidated view for staff to act on. This consolidated view is commonly called a “single pane of glass.”
Most NOCs need to bring voice, email, text, customer portals, knowledge bases, documentation, and workflow management tools into the NOC—each potentially with its own platform. Without proper integrations connecting these tools and platforms, NOC personnel are faced with tracking and managing multiple screens for event information, manually collecting information from multiple sources for documentation, notification, and escalation, and then attempting to manage workflow toward service restoration.
This makes monitoring and reporting on SLA metrics nearly impossible, let alone optimizing performance. The results inevitably include operational inefficiencies, missed SLAs, and undue stress on staff.
Download our white paper, "Top 10 Challenges to Running a Successful NOC," to see the various tools that are required for a NOC to function and that should be integrated into a single NOC platform.
This best practice addresses the following problem indicators. Talk to us to explore a NOC solution if you're experiencing any of them:
Documentation is essential to a NOC’s ability to function well over the long term. This process includes building runbooks, documenting workflow processes, creating structured databases for storing and retrieving information, and recording business results for analysis and optimization.
Too often, however, services are added, or changes are made without proper documentation to support them. This limits the ability of the NOC to resolve an issue when it arises—wasting time and creating avoidable risks.
Poor documentation often stems from a lack of resources and the expertise required to map out processes and create work instructions and documents. Instead, key people simply “know what to do” and new staff learns by “seeing and doing” alongside an experienced mentor.
NOC teams also often overlook performance metrics that can be obtained from network and monitoring systems, ticketing systems, and back-office tools. These metrics are critical for analyzing performance, predicting failure, and laying the groundwork for ongoing quality control and process improvement.
Without an understanding of alarm activity, ticket activity, and common causes for outages and trends, management is limited to responses that are reactive and tactical, rather than proactive and strategic.
Beginning with the service catalog, it is necessary to document the tools and procedures needed to deliver NOC services successfully. Technical writers can often be invaluable in this process.
This best practice addresses the following problem indicators. Talk to us to explore a NOC solution if you're experiencing any of them:
A NOC’s scalability is a measure of its ability to handle a growing amount of work without compromising the level of service. Typically, business plans include initial funding, sales and marketing, system build-out, operations support, and the business guidance needed to meet the projected growth. What business plans sometimes don’t consider is predictable growth and process planning.
Often, for example, sales for a young company take off, with key managers focused on new clients and getting technical services delivered to meet service launch dates. The same technical and operations resources are then tasked with the ongoing support of these services—severely impeding the organization’s ability to manage its growth. The result is predictable: customer dissatisfaction.
The ability to grow or absorb expansion requires careful consideration of the following factors:
This best practice addresses the following problem indicators. Talk to us to explore a NOC solution if you're experiencing any of them:
There are several components that make up the cost of running a 24x7 NOC. When budgeted appropriately, these items combine into a powerful investment that has the potential to deliver a value that far exceeds its cost.
All of these components present a formidable operating expense but have to be considered in building a successful NOC. Too often, NOCs are built considering only a subset of the above components, and as a result, they struggle to scale and deliver on the required service and financial objectives of the organization.
This best practice addresses the following problem indicators. Talk to us to explore a NOC solution if you're experiencing any of them:
For years, top-tier NOCs have started applying automation to take on the repetitive, low-risk tasks that pull technical specialists away from more important (and frankly more exciting) work. Only more recently, however, have NOCs started arming themselves with vastly better data processing and machine learning power to augment and replace more—and more complex—manual tasks traditionally handled by humans.
Perhaps the most impactful recent advancement is AI-driven event correlation. NOCs can now let machines correlate event data much faster than humans ever could and identify the subtle indicators of approaching issues within a torrent of otherwise noisy data. The outcome can be measured in significantly faster and more proactive response rates—and thus, happier customers and end-users.
Here are the major ways we've implemented AIOps within our NOC to unlock dramatically better efficiency and performance:
This combination of automation and machine learning brings the power and promise to genuinely transform how IT operations teams organize and operate. And as time goes on, automation will steadily continue to replace even more manual activities better suited for machines.
Given the complexity of integrating machine learning and automation into a NOC operation, there are no convenient action items to point to here; such an undertaking, while transformative, requires a ton of highly-specialized work and significant investment — moreso than what what would make sense for most teams to build internally.
Here at INOC, our clients simply inherit the powerful AIOps capabilities we've spent years and many resources building and refining. The core of our alarm and event management system is our AIOps engine, which utilizes machine learning to automate low-risk tasks and extract actionable insights from the vast amounts of data gathered across clients' supported environments.
While it’s easy to talk about best practices, it’s another thing entirely to bring those practices to life within your organization. Success requires careful planning and care, which is why expertise is so critical at the outset of building or optimizing your NOC
Here at INOC, we help organizations with these critical needs through award-winning outsourced NOC support (sometimes referred to as NOC as a Service) and NOC operations consulting services.
Hal Baylor, Solutions Executive at INOC, recently sat down with JSA TV on the ITW 2022 expo floor to talk about INOC's growing NOC Operations Consulting service. Watch the full interview below.
Want to learn how to put these best practices to use in your NOC? Contact us or schedule a free NOC consultation with our Solutions Engineers to see how we can help you improve your IT service strategy and NOC support download our free white paper below.
Download our free white paper and learn how to build, optimize, and manage your NOC to maximize performance and uptime.
*Originally developed by the UK government’s Office of Government Commerce (OGC) - now known as the Cabinet Office - and currently managed and developed by AXELOS, ITIL is a framework of best practices for delivering efficient and effective support services.