Despite being such a foundational component of the technical support operation that keeps organizations running, many NOCs, in both the service provider and enterprise markets, fail to deliver the desired service levels while consuming significant management and financial resources.
By far the biggest cause of this failure is the lack of any authoritative blueprint for the NOC to follow—a documented framework that establishes clear, standardized rules that govern how the NOC team should operate.
Since no two organizations share the same business strategies, technology infrastructures, tools, or service requirements, the factors that make a NOC successful or unsuccessful are different for each organization. Therefore, each NOC deserves its own operational blueprint that takes into account the specific conditions and requirements that pertain to it.
Although the lack of this operational blueprint is the most common root cause of most NOC problems, it’s certainly not the only one. We briefly summarize the ten NOC challenges we see most often below.
📄 Grab our free white paper to learn more about each of them along with expert solutions to each: Top 10 Challenges to Running a Successful NOC—and How to Overcome Them.
Challenge #1: Overutilized technology staff and exploding support costs due to a lack of a tiered organizational structure to manage workflows
This is the operational blueprint we just mentioned. Failing to organize NOC activities and subsequent workflows by technology and skill level is one of the biggest hurdles in building a successful NOC. When a NOC can’t manage its workflow, it often finds itself overwhelmed by the “wall of red.”
A tiered operational support structure enables managers to leverage the lower-cost first-level or Tier 1 team to perform routine activities and free up higher-level or Tier 2 and 3 technical teams to focus on more advanced support issues.
The figure below lays out the basics of tiered NOC support structure. Central to this structure is the Tier 1 team that uses monitoring tools and interacts with end-user help desks, Tier 2 and 3 engineers, and third parties. Information flows between the various entities within a well-defined process framework.
Issues coming into the NOC should also be prioritized and organized into a set of queues, each of which can be handled by the appropriate group. These can be organized by important variables such as service level agreements (SLAs), technologies, and technician skill levels.
The figure below shows how a set of issues can be broken up into queues and assigned to groups based on skillset.
These visuals are intended to be instructive in building a framework, but also in realizing the distance between what a NOC should have, and what it actually might have. Especially in NOCs supporting enterprises or communication service providers, the further operations are from a structure like this, the more value it can expect to gain from implementing one.
Challenge #2: Blindness to issues and opportunities due to insufficient operational metrics
Anyone working in a NOC is likely to hear statements like these on a routine basis:
- “Why are we always busy?”
- “I feel like we can never catch up,” and
- “My coworkers are not pulling their weight.”
These sentiments are understandable given the fast-paced environment of a NOC and the constant multitasking that is required. Meaningful operational metrics are vital not only in running a successful NOC but also in keeping staff morale high.
In many NOCs, however, not only are important metrics not being measured—the ones that are being measured aren’t being evaluated on a daily, weekly, and monthly basis. In the case of either or both problems, the early indicators of potential issues will almost certainly go ignored and allowed to evolve into more resource-intensive problems.
For a quick self-evaluation on this point, consider whether you’re tracking first-call resolution, percentage of abandoned calls, mean time to restore, and number of tickets and calls handled. If you have blindspots in any of these areas, we can almost guarantee there’s an operational vulnerability affecting the NOC’s efficiency or effectiveness. Be aware, however, that this shortlist is by no means exhaustive. Even a brief consult with our Solutions Engineering Team typically reveals a number of metric gaps companies can put on their radar.
Challenge #3: High turnover, low morale, and difficulty in hiring, training, and retaining staff due to a lack of a staffing strategy
Great NOCs are a function of great people. But very often, the absence of both a support structure and a skills-based structure can handicap a company’s ability to attract and retain great talent.
Consider the overall activity of your NOC, including the volume of calls, emails, and alarms handled by hour-of-day, day-of-week, and type of support engineer, as well as the duration of incidents. This data should be translated into a working schedule for each type of support engineer needed to satisfy the staffing requirements of your NOC. In addition to using your utilization metrics, benefits, training, and employee growth plans should all be in place.
Challenge #4: Inconsistent responsiveness to issues or difficulty troubleshooting due to a lack of a staffing strategy
A lack of consistency is one of the main reasons NOCs don’t perform at optimal levels.
The best way to achieve consistency is through a standardized process framework. Such a framework provides a NOC with a set of specific procedures for handling various support situations. There are several process and management frameworks to choose from, including MOF, FCAPS, and ITIL.
Process frameworks can be overwhelming when considered in their entirety, so it’s best to tackle areas that challenge the organization the most first. Usually, that’s incident management, problem management, and service desk.
Read more about the costs, challenges, and key considerations of staffing a 24x7 NOC.
Challenge #5: A constant state of vulnerability due to a lack of a business continuity plan
Many NOCs simply don’t have a documented plan that outlines the functions of the business, identifies the critical systems that enable the organization to run, and prescribes specific actions to maintain these systems during a disruption. Others have a plan—perhaps limited to disaster recovery only—but don’t adequately protect against all potential disasters and disruptions.
For a quick gap assessment, consider the following essentials for a NOC business continuity plan against your own:
- Infrastructure redundancy: Regardless of whether your NOC’s data centers are physical or virtual, the disaster recovery plan should include at least two identical data centers running identical software with fully synchronized databases.
- Operational redundancy: An effective disaster recovery plan should also ensure that the NOC can continue to operate in the event of a site failure.
- Technical redundancy: Your NOC’s disaster recovery plan should also prepare for a major technical outage that poses a disaster-level threat to NOC service despite affecting only a portion of your facility. Consider scenarios like loss of a single server or network element, loss of much or all of the data center, and loss of a network link.
Challenge #6: Recurring problems and an inability to emerge out of a reactive state due to a lack of quality management
Without continuous quality assurance, NOCs risk hurting customer satisfaction and compromising their reputation. There are two components of quality management in this case: quality assurance and quality control.
A good quality control program monitors and measures primary aspects of the NOC service—the key performance indicators referenced earlier. These KPIs provide much-needed visibility into NOC support activity, responsiveness, and effectiveness. NOC management can use this information to ensure, for instance, that stated objectives for event-to-action times and first-level incident resolution are being met for each customer.
A good quality or service assurance program allows the NOC to identify and resolve problems before they impact customers or the business in a significant way. A quality assurance review begins when a customer reports dissatisfaction with any aspect of the NOC service. NOC management follows up with an internal review of the service—responsiveness metrics, adherence to runbook procedures, customer interaction, and technical troubleshooting, to name a few.
Such quantitative and qualitative measures and the resulting feedback lower the probability of the same problem recurring. Monthly and quarterly reviews of the service with stakeholders ensure that customer expectations continue to be met.
Challenge #7: Lots of data, but little actionable insight due to disparate tools and platforms
Especially among enterprises and communication service providers, the NOC has to be able to receive and process alarm or event information from multiple sources and present it in a single, consolidated view for staff to act on—a “single pane of glass.”
Without integration between these tools and platforms, NOC personnel are faced with tracking and managing multiple screens for event information; manually collecting information from multiple sources for the purposes of documentation, notification, and escalation; and then attempting to manage workflow toward service restoration. This makes it nearly impossible to monitor and report on SLA metrics, let alone optimize performance. The results inevitably include operational inefficiencies, missed SLAs, and undue stress on staff.
Challenge #8: Persistent operational problems due to out-of-date documentation and runbooks
Failure to build runbooks, document workflow processes, create structured databases for storage and retrieval of information, and record business results for later analysis and optimization will severely impede the ability of a NOC to function well over the long term. Too often, services are added and changes are made without proper documentation. This limits the ability of the NOC to resolve an issue when it arises.
Poor documentation often stems from a lack of resources and the expertise required to map out processes and create work instructions and documents. Instead, key people simply “know what to do” and new staff learn by “seeing and doing” alongside an experienced mentor.
Challenge #9: Business growth stymied due to a rigid, unscalable NOC
Many NOCs aren’t designed to be scalable; that is, able to handle a growing amount of work as the company grows without compromising the level of service.
Typically, business plans include initial funding, sales and marketing, system build-out, operations support, and the business guidance needed to meet the projected growth. What business plans sometimes don’t take into consideration are predictable growth and process planning. The ability to grow or absorb expansion requires careful consideration of staffing, systems and network, tools, process standardization, and training.
Challenge #10: Unreasonably high operational costs
There are several components that make up the cost of running a 24x7 NOC. Take staff for example. The staff required to support a 24x7 NOC include not only front-line technicians and engineers but also back-end support groups such as systems and network engineering, service transition, human resources, and customer advocacy.
Resources also need to be allocated for training NOC staff when they are initially hired, as well as when onboarding new customers, and whenever changes are made to existing support or new technologies are introduced. Systems, network connectivity, and security controls need to be deployed in either a data center or the cloud to house the various tools and applications required by the NOC to operate. Resources for ongoing support need to be included.
All of these components present a formidable operating expense but have to be considered in building a successful NOC. Too often, NOCs are built considering only a subset of the above components, and as a result, they struggle to scale and deliver on the required service and financial objectives of the organization.