Companies experience many of the same issues when managing a network operations center (NOC). Fortunately, these common problems are very solvable. This guide offers 10 tips for managing the modern NOC that we’ve gathered over more than 20 years of doing just that for enterprises, service providers, and OEMs.
Want to talk NOC? Schedule a free NOC consultation with our Solutions Engineers for a focused conversation on aligning NOC support with your specific business needs.
Operational design is the key to unlocking a NOC’s full capability and value. It provides a central framework with informed, documented guidance for each operational decision and action. A look under the hood of any top-performing NOC will reveal such a framework. Most often, it’s the factor that separates truly outstanding NOCs—those that have a measurable business impact—from those that only deliver basic, reactive support.
The first consideration for developing a NOC operational framework is determining how such a framework will be developed. Rather than putting a team together to hash out a framework from scratch, the resulting design will typically be far more robust and effective when an experienced NOC design team starts the development process with a proven framework that can be shaped to fit the requirements of your business and its IT infrastructure.
Whether you’re taking on NOC operational design yourself or putting it in the hands of capable NOC experts, each component should take into account the three Ps:
Since these three Ps encompass everything that could be impacted by the NOC, this approach ensures nothing is left out of each operation you develop.
A tiered ops support model delegates tasks based on skill levels or tiers, helping ensure that lower-cost, less experienced engineers handle simple break/fix type issues, while more complex problems are escalated to more experienced engineers instead of having the same individuals handle everything. It also takes into account SLAs, technology, and urgency, helping to organize and prioritize alarms in order to prevent an overwhelming “wall of red” and help the NOC perform at peak efficiency.
This model is beneficial for a number of reasons.
When developing specific elements of the NOC operation, your framework should offer well-defined process flows and incorporate tools to support each type of input into the NOC, such as phone calls, emails, and events.
Phone and email tools should focus on helping the NOC achieve desired service levels for response time. Today, with an operational framework that clearly identifies issues and offers processes to work through them quickly and easily, most issues that arrive through phone and email should be handled and initially routed or resolved by Tier 1 NOC engineers—freeing high-tier engineers to focus their attention elsewhere.
Here are a few things to keep in mind when designing processes and tools for handling NOC inputs:
Download our white paper for more on setting up an operational framework in the NOC ⤵️
Many NOCs track metrics to meet their SLAs with clients, but not all choose KPIs that provide visibility into operations, reflect its size and scale, and clearly demonstrate performance in relation to a set of organization-wide objectives, such as first-call resolution, percentage of abandoned calls, mean time to restore, and the number of tickets and calls handled.
Instead, many service providers' clients may be unsatisfied with a NOC’s performance despite meeting SLAs due to a lack of satisfying results.
Even for enterprise NOCs outside the OEM and support provider spaces, a lack of concrete KPIs can negatively affect staff morale. They have difficulty benchmarking their performance and that of their peers, leading to feelings of relentless busyness and falling behind without the reassurance of metrics that quantify achievements.
To remedy this, a NOC must choose the most relevant and meaningful metrics for its environment and evaluate these daily, weekly, and monthly.
Consider tracking and reporting on the following metrics to measure the NOC's utilization and efficiency:
And consider tracking the following commonly-ignored KPIs to measure outward NOC performance:
Once you’ve established a tiered operational structure and meaningful performance metrics, you can create a data-driven staffing plan.
Assigning each engineer to a tier helps you keep track of the number of employees you have at each skill level, while metrics help you identify key areas and times when you are short on staff.
Together, this data can help you identify:
It’s also important to consider days off, such as PTO and holidays, for scheduling purposes to ensure your NOC is always appropriately staffed. Likewise, it’s wise to create a training regime that ensures your engineers are kept up to date.
📄 Read our other guide to learn more about staffing a 24x7 NOC team: Staffing a 24x7 NOC: Costs, Challenges, and Key Considerations
Different types of organizations require different outcomes from their NOC. Until recently, this meant incorporating different standards into the NOC framework depending on whether it was designed for an enterprise or service provider type of organization.
But consistency is key to peak performance, and the best way to get it is to implement a standardized process framework like ITIL, MOF, or FCAPS that provides a best practices “playbook” for operationalization and documenting your NOC’s processes, functions, and roles.
Today, the ITIL service framework has gained momentum. ITIL provides significant guidance for developing, maintaining, and improving IT services, which makes it particularly useful for designing any type of NOC operation. ITIL has proven effective in a variety of applications and industries, thereby making the need for separate standards for enterprises and service providers largely obsolete.
ITIL is a widely used framework useful in achieving ISO 20000 certification. It provides best practices for delivering technology support services and allows you to include your organization’s custom procedures under its umbrella of life cycle stages.
To use the framework, get everyone in your organization trained and involved in the process. You might try prioritizing implementing the framework in areas of your operation that challenge you the most before moving on to others to ease into it.
📄 Read our other guide to see some best practices for applying ITIL to your incident management process: 5 ITIL Incident Management Best Practices [+ Checklist] (2022)
A business continuity plan (BCP) is a formal plan for the management team to continue operations in the event of an emergency that interrupts service.
This could be anything from a short-term emergency, such as a regional power outage, to a fire that permanently destroys the NOC facility or a natural disaster that prevents access to the facility for a prolonged period of time.
A BCP should include the following:
BCPs should be rehearsed at least quarterly and regularly, audited for possible improvements, and include failover of all critical assets.
Maintaining a high-standard quality of service is critical for NOC service providers, particularly in maintaining a positive reputation and retaining customers. To do so, we recommend implementing a quality assurance program.
The key ingredients of a quality assurance program are up-to-date runbooks outlining procedures for handling customer complaints and other consistently carried-out processes, as well as accurate and effective reporting (as we discussed earlier) and monitoring.
Metrics drawn from monitoring activities can be used to identify chronic issues and provide quantitative evidence when the customer complains. Proactive measures, such as staff mentoring, regular audits, and quarterly stakeholder reviews, help identify problems before they worsen.
ITIL Continual Service Improvement (CSI) provides IT organizations with best practices and structures for improving their service and service management processes.
With it, teams can constantly re-examine what’s working and what’s not and make ongoing, incremental improvements to their processes while keeping service aligned with the business’s changing needs. An effective CSI program constantly looks for ways to improve process efficiency and cost-effectiveness throughout the entire ITIL Lifecycle.
📄 Read our other guide to learn about bringing a CSI program to life in the NOC: ITIL CSI: A Guide and Checklist for IT Support and the NOC
Wrangling disparate tools and platforms can quickly create a stressful mess that is not only challenging to use but also difficult to track and report on.
Engineers often find themselves tracking and managing multiple screens for event information, manually collecting information from multiple sources for documentation, notification, and escalation, and then attempting to manage workflow toward service restoration.
The more convenient, efficient, and less stressful alternative is consolidating all of these tools and platforms into one view: “a single pane of glass.” This includes bringing voice, email, text, customer portals, knowledge bases, documentation, and workflow management tools (and potentially their respective platforms) all into one convenient dashboard.
This can help NOCs not only perform their duties more efficiently but also ensure more accurate reporting and prevent missed SLAs.
Here are a few of our own capabilities as an outsourced NOC support partner that have proven to be massive value-adds for organizations struggling to make their tools work for them, rather than the other way around:
📄 Read our other guide for a look at some of the common tools used in the NOC and the operational considerations key for each: NOC Tools and Software in 2022: An Operational Perspective
Poor documentation is the source of many problems throughout ITOps. Without formal processes and procedures, even highly skilled professionals can struggle to achieve consistent desired results when outages occur.
While a common issue, out-of-date runbooks can negatively impact quality assurance, service improvement, and issue resolution while generally impeding a NOC’s performance. To address issues strategically, management must develop comprehensive runbooks and keep them updated as changes impact the NOC and the supported environment.
NOC teams should start by documenting the tools and procedures necessary to deliver quality NOC services for each service in their catalog with the aid of a competent technical writer. Runbooks should be the single source of truth for everyone inside and outside the NOC.
Need help developing or improving runbooks? We deliver expert-driven runbook development as a professional service and as a core component for our NOC support clients. We work closely with you to understand and document your processes, creating a single source of truth for everyone inside and outside your NOC.
Your NOC should be able to scale with your business. Scalability, or planning for an increase in work without compromising on quality of service, is something your NOC wants to consider before business growth affects performance (and results in unhappy customers).
Certain aspects of scalability will likely have been accounted for in your organization’s business plan, such as initial funding, sales and marketing, system build-out, operations support and the business guidance needed to meet the projected growth. However, predictable growth and process planning are often overlooked.
When planning for growth, consider these factors:
Shared NOC Support and the Economy of Scale
Here at INOC, our shared support model allows for service to scale across a large team of shared resources to meet periods of expected or unexpected demand—a capability that simply wouldn’t be possible in a dedicated support arrangement. This group of shared resources is sized to ensure roughly 65% utilization in order to provide a safe buffer of capacity to handle unexpected spikes in activity. Using company-wide metrics, changes in utilization are reflected in staffing decisions to ensure this balance is maintained at all times.
In short, our shared NOC support model enables organizations to benefit from economies of scale. Rather than being based on the number of resources, the shared support model is based on the number of assets (such as devices) and workload (the expected volume of NOC activity in a given period of time). The shared NOC is a timely and reliable resource pool that is constantly triaging and working through queues containing tickets from many clients. This model is tailored to offer standardized and templatized support. While service pricing will naturally fluctuate with significant changes in workloads, the increments are typically far more subtle compared to adding even one additional dedicated resource.
📄 Read our other guide to learn more about shared vs. dedicated NOC support models: Shared vs. Dedicated NOC Support: A Quick-Guide
Here are some general tips when setting budgets for your NOC:
Staffing and platform costs are two of the biggest financial factors when considering building and maintaining an in-house NOC vs. outsourcing it.
Given that most NOCs require, at minimum, a team of ten to provide reliable 24/7/365 support, comparing the total in-house human resource expenditures to a much smaller team of outsourced FTEs operating in a fully mature NOC environment can lead to a stark realization.
For most companies, staffing a NOC is often a needlessly high expenditure compared to outsourcing that support. A plan that doesn’t consider this opportunity might, for example, call for a staff of 12 full-time employees, when in fact, the same or likely better support could be provided through an outsourced service solution that takes full advantage of an economy of scale to provide far better service at a far lower cost.
Apart from staffing, the cost of acquiring, implementing, and integrating a full suite of NOC tools only further tips the scale in favor of outsourcing much of the time.
Monitoring, ticketing, knowledge centralization, and reporting are just a few essential NOC functions requiring tools. Together, these can constitute a massive expenditure even though, in most homegrown NOCs, their low utilization doesn’t justify their high price tags. More recent technologies like machine learning and automation (AIOps) only add to the balance sheet, not to mention the difficulty of implementation.
It’s not uncommon for companies to learn that given the payroll and overhead costs of building a NOC in-house, electing for outsourced support can cut their total cost of ownership in half.
There’s no getting around it—optimizing your NOC for peak performance is a lot of work, but implementing these best practices can pay dividends over time, or even in the short term, in the form of greater efficiency, higher employee and client satisfaction, and even cost reductions or reallocations toward more valuable investments.
There’s no getting around it—optimizing your NOC for peak performance is a lot of work, but implementing these best practices can pay dividends over time, or even in the short term, in the form of greater efficiency, higher employee and client satisfaction, and even cost reductions or reallocations toward more valuable investments.
Here at INOC, we help organizations with these critical needs through award-winning outsourced NOC support (sometimes referred to as NOC as a Service) and NOC operations consulting services.
Want to learn how to put these NOC management practices to use in your NOC? Contact us or schedule a free NOC consultation with our Solutions Engineers to see how we can help you improve your IT service strategy and NOC support, and download our free white paper below.