Despite being critical to the success of a technical support operation, most network operations centers (NOCs) fail to meet desired service levels.
Rather than delivering meaningful ROI by maximizing infrastructure uptime and performance, ineffective NOCs consume management and financial resources while leaving business services exposed and vulnerable.
Most of the time, the root cause of an underperforming NOC is a lack of a centralized support framework that incorporates best practices and puts them into action. Such a structure is essential to the success of a NOC, as it makes decisions and actions consistent across the people, processes, and platforms that comprise it.
Without an authoritative blueprint for operating, costly inefficiencies are inevitable and serious risks will continue to pose constant threats to performance and availability—a problem that will only worsen as business services and the technologies they rely on scale in size and complexity.
Here, we explore ten NOC best practices that keep even the largest, most complex infrastructure environments up and running at peak performance, 24x7.
Download our white paper, "Top 10 Challenges to Running a Successful NOC," for a handy reference that goes into greater depth into each of these best practices and their corresponding challenges. Read our other white paper, "A Practical Guide to Running an Effective NOC," for a set of actionable steps you can take to put these best practices to use.
1. Implement a Tiered Organization/Workflow
Organizing your NOC activities and workflows based on your specific technologies and skill levels is one of the biggest hurdles to success.
Once you’ve cleared this hurdle, however, you’ll almost certainly be able to handle events, service requests, and resolve incidents at the appropriate tier, and faster than before. Based on data collected across our NOCs, we found this structure can enable a NOC to resolve 65% to 75% of incidents at the Tier 1 level while reserving Tier 2 and 3 staff for more advanced issues.
Classifying NOC activities is often the first step in implementing a tiered structure. Use the following model for developing your own classification system:
- Monitoring events from technology infrastructure and facilities — e.g., Layer 1, 2 and 3 networks, circuits and servers (physical, virtual, and cloud), applications, databases, and power and building systems
- Managing support requests from customers and technical staff in the form of phone calls, emails and tickets
- Managing incidents resulting from events and support requests
- Managing configurations and changes, provisioning equipment, services and circuits, and maintaining documentation
- Reviewing periodic service reports
Figure 1 below illustrates a well-organized tiered NOC support structure in action. Here, the Tier 1 team uses monitoring tools and interacts with end-user help desks, as well as Tier 2 and 3 engineers and third parties. Information flows between the various entities within a well-defined process framework.
Having such a structure for properly managing your workflow can prevent your NOC from being overwhelmed by the “wall of red” NOC teams strive to avoid at all costs. In most NOCs, issues should be prioritized and organized into a set of queues, so each of them can be handled by the appropriate group.
Download our white paper, "Top 10 Challenges to Running a Successful NOC," for a set of example workflow queues you can use to break up issues and assign them to groups based on skill set.
2. Track Meaningful Operational Metrics
Modern NOC tools make it easy to generate metrics, but tracking meaningful metrics takes diligent work. Metrics are essential for continuous improvement from both a technical and motivational standpoint—helping teams recognize successes and keep morale high.
Anyone who works in a NOC likely hears things like, “We’re always busy,” or “I feel like we can never catch up,” or “My coworkers are not pulling their weight.” These sentiments are understandable given the fast-paced environment of a NOC and the constant multitasking that is required of those who work in it.
To ensure accomplishments are recognized, it’s important to set performance objectives and evaluate them on a daily, weekly, and monthly basis. Since the amount of data available to a NOC is daunting, choose the metrics that are most applicable and actionable to your specific operation. These should reflect the size and scale of your operation and the key performance indicators (KPIs) that measure performance against relevant organization objectives.
KPIs to consider include first-call resolution, percentage of abandoned calls, mean time to restore, and the number of tickets and calls handled.
3. Develop a Strategy for Hiring, Training, and Retaining Top Talent
Running a 24x7 NOC requires staffing three shifts a day, 365 days a year. Consider the following factors when developing a staffing strategy:
NOC Organization Structure
Effectively staffing a 24x7 NOC starts with a well-organized structure. The tiered NOC support structure and workflow queues discussed in our first best practice are good starting points for determining the skill level required of your NOC staff.
Download our white paper, "Top 10 Challenges to Running a Successful NOC," for an example of a skills-based NOC structure that can support 24x7 NOC requirements and provide a growth plan to maximize employee retention.
Consider the overall activity of your NOC, including the volume of calls, emails, and alarms handled by hour of day, day of week, and type of support engineer, as well as the duration of incidents.
Consider the benefits that your company provides for employees in the context of the needs of your operation to ensure understaffing isn’t a risk. For example, if your company provides 10 holidays and four weeks of PTO per employee, these hours need to be accounted for to ensure that your NOC runs smoothly.
A NOC training program should cover initial on-boarding as well as ongoing training. A truly comprehensive training program can take up to six months of various classes and on-the-job instruction before an engineer is ready to take on NOC support responsibilities. After work has begun, monthly or quarterly training sessions should be scheduled to keep engineers’ skills fresh and to update the support team on new types of services, new customer requirements, and new equipment.
A certain attrition rate within the NOC should be taken into account, based on your historical data and on industry standards. Factors that affect retention rates include company culture as well as NOC organization (i.e., whether there’s a clear path for employee growth from one level to the next or to other departments within the organization).
By making these calculations, you can better plan for staffing and training needs. For example, assuming that a typical engineer works five years in your NOC (a retention rate of 80%), you’d need to hire an additional 20% of staff each year.
4. Implement a Standardized Framework for Process Management
Inconsistency is one of the main reasons NOCs don’t perform at optimal levels. Being reliably consistent requires a standardized process framework that arms your NOC with specific procedures for handling various support situations.
There are several process and management frameworks to choose from, including MOF, FCAPS and ITIL. The ITIL* (IT infrastructure library) service framework in particular has become immensely popular as it’s useful in achieving the ISO 20000 certification and provides its own set of best practices to follow when delivering technology support services. It also offers the flexibility to include your organization’s custom procedures under its umbrella of lifecycle stages.
Process frameworks can be overwhelming when considered in their entirety. When aligning with one of these standards, we recommend first tackling the specific areas that are the biggest challenge for your organization. Typically, these are incident management, problem management, and the service desk. Once these functions are standardized, you can move on to other priority areas such as change management and service continuity management.
It’s critical to get your whole organization involved in the implementation of the process framework as well as in ongoing education. Training is essential to get all staff talking the same language and following the same guidelines. Comprehensive information and training are available for ITIL, ISO 20000, FCAPS and other methodologies.
5. Develop and Maintain a Business Continuity Plan
A business continuity plan (BCP) is essential for managing risk in your NOC operations—a fact made very clear by the COVID-19 pandemic, which exposed serious shortcomings in many firms’ readiness.
(Download our white paper, "A 5-Step Strategy for NOC Business Continuity Planning in Response to COVID-19," for a comprehensive guide to ensuring your BCP is prepared for such a disruption.)
The BCP provides a blueprint for NOC staff and management to follow when recovering from a disaster or other adverse situation. When properly executed, it ensures that operations recover quickly and effectively so any negative impact to the business is minimized.
Without an effective BCP in place, your NOC will almost certainly remain vulnerable to the following problems if a disaster or significant workforce disruption affects your operation:
- Loss of business
- Damage to reputation/brand
- Loss of customers
- Loss of staff
- Loss of or damage to property and premises
- Negative impact on insurance
Key representatives from a cross-section of the organization need to be involved in creating a BCP. This may include outside vendors. Whether you’re developing one from scratch or want to evaluate an existing BCP, make sure it contains the following at the bare minimum:
- An analysis of all organizational threats
- A list of action items required to maintain operations, both for short-term and long-term interruptions
- Easily accessible contact information for key stakeholders
- An explanation of where/how personnel should relocate if there is an interruption in operations
- The steps required to make the backup site(s) operational
- How all the areas within the organization need to collaborate in executing the plan
6. Develop an Effective Customer Experience Management Program
NOC teams must measure service quality and provide quality assurance on a continuous basis or risk damaging customer satisfaction and compromising the NOC’s reputation. Effectively and consistently executing a runbook (i.e., processes and procedures) is paramount to meeting a NOC’s service level requirements.
Core to meeting these objectives is the detailed monitoring of key network and IT assets and services. This monitoring, data collection and correlation—typically accomplished using a variety of protocols and tools—is the entry point into incident handling and problem management processes.
Other sources of data include calls and emails (among still others). The NOC runbook, created during the on-boarding process and updated regularly, is key to what follows next. Documenting agreed-upon processes and procedures for the specific customer environment provides the NOC team with an essential operational reference.
NOC Quality Control
A good quality control program monitors and measures primary aspects of your NOC service via its KPIs. These KPIs provide much-needed visibility into NOC support activity, responsiveness, and effectiveness. NOC management can use this information to ensure, for instance, that stated objectives for event-to-action times and first-level incident resolution are being met for each customer.
Quality control also detects chronic issues so management can find appropriate solutions—for example, correcting relevant runbook procedures, ensuring complete documentation is available to the NOC or providing additional staff training. A monthly audit of a subset—say, 10%—of all tickets created is an important part of ongoing review. Staff mentoring is also key to quality control and helps ensure high levels of customer satisfaction.
NOC Quality Assurance
A good quality or service assurance program enables your NOC to identify and resolve problems before they impact customers or the business in a significant way.
A quality assurance review begins when a customer reports dissatisfaction with any aspect of the NOC service. NOC management follows up with an internal review of the service, evaluating responsiveness metrics, adherence to runbook procedures, customer interaction, and technical troubleshooting, to name a few.
Such quantitative and qualitative measures and the resulting feedback lower the chance of the same problem recurring. Monthly and quarterly reviews of the service with stakeholders ensure that customer expectations continue to be met.
7. Develop Platform Integrations and Consolidate Data for Action
NOCs that operate at peak efficiency can receive and process alarm or event information from multiple sources and present it in a single, consolidated view for staff to act on. This consolidated view is commonly referred to as a “single pane of glass.”
Most NOCs need to bring voice, email, text, customer portals, knowledge bases, documentation, and workflow management tools into the NOC—each potentially with its own platform. Without proper integrations connecting these tools and platforms, NOC personnel are faced with tracking and managing multiple screens for event information; manually collecting information from multiple sources for the purposes of documentation, notification and escalation; and then attempting to manage workflow toward service restoration.
This makes it nearly impossible to monitor and report on SLA metrics, let alone optimize performance. The results inevitably include operational inefficiencies, missed SLAs, and undue stress on staff.
Download our white paper, "Top 10 Challenges to Running a Successful NOC," to see the various tools that are required for a NOC to function and that should be integrated into a single NOC platform.
8. Support Each NOC Function with Proper Documentation
Documentation is essential to a NOC’s ability to function well over the long term. This process includes building runbooks, documenting workflow processes, creating structured databases for storing and retrieving information, and recording business results for analysis and optimization.
Too often, however, services are added, or changes are made without proper documentation to support them. This limits the ability of the NOC to resolve an issue when it arises—wasting time and creating avoidable risks.
Poor documentation often stems from a lack of resources and the expertise required to map out processes and create work instructions and documents. Instead, key people simply “know what to do” and new staff learn by “seeing and doing” alongside an experienced mentor.
NOC teams also often overlook performance metrics that can be obtained from network and monitoring systems, ticketing systems, and back office tools. These metrics are critical for analyzing performance, predicting failure, and laying the groundwork for ongoing quality control and process improvement.
Without an understanding of alarm activity, ticket activity, and common causes for outages and trends, management is limited to responses that are reactive and tactical, rather than proactive and strategic.
Beginning with the service catalog, it is necessary to document the tools and procedures needed to deliver NOC services successfully. Technical writers can often be invaluable in this process.
9. Design Your NOC Operation for Scalability
A NOC’s scalability is a measure of its ability to handle a growing amount of work without compromising the level of service. Typically, business plans include initial funding, sales and marketing, system build-out, operations support, and the business guidance needed to meet the projected growth. What business plans sometimes don’t consider, is predictable growth and process planning.
Often, for example, sales for a young company take off, with key managers focused on new clients and getting technical services delivered to meet service launch dates. The same technical and operations resources are then tasked with the ongoing support of these services—severely impeding the organization’s ability to manage its growth. The result is predictable: customer dissatisfaction.
The ability to grow or absorb expansion requires careful consideration of the following factors:
- Staffing: It is essential to measure the staff utilization percentage derived from various NOC activities (described in our second best practice). Keeping this below 80% enables your NOC to absorb growth while allowing enough lead time for recruiting additional resources.
- Systems and Network: A distributed redundant architecture allows for systems to grow and expand. The ability to easily deploy additional server resources enables you to handle sudden spikes in growth. The performance of the systems and network (bandwidth, CPU, memory, etc.) needs to be monitored closely to make sure there is enough capacity to handle growth.
- Tools: Tools used by the NOC (e.g., monitoring tools, ticketing systems, knowledge base) to deliver the service must have additional capacity built into them to handle the projected growth. It’s not uncommon for tool performance to suffer dramatically if tools aren’t designed for growth that results in service-level degradation and a loss in productivity.
- Process Standardization and Training: A consistent process framework and methodology for delivering high-quality service is one of the key features of a scalable NOC. Management should choose and adopt a process standard that fits their product and industry needs. NOC staff can then be trained to follow the established company standards.
10. Budget Your NOC Operation Appropriately
There are several components that make up the cost of running a 24x7 NOC. When budgeted for appropriately, these items combine into a powerful investment that has the potential to deliver a value that far exceeds its cost.
- Staff: The staff required to support a 24x7 NOC include not only front-line engineers, but also back-end support groups such as systems and network engineering, service transition, human resources, and customer advocacy.
- Training: Resources need to be allocated for training NOC staff when they are initially hired, when on-boarding new customers, and whenever changes are made to existing support or new technologies are introduced.
- Quality Assurance: An objective quality assurance program is needed to address customer concerns and maintain service-level agreements.
- Systems, Networking and Security: Systems, network connectivity, and security controls need to be deployed in either data centers or the cloud to house the various tools and applications required by the NOC to operate. Resources for ongoing support need to be included.
- Software Licensing: A NOC requires various tools for monitoring, troubleshooting, and resolving issues. These include network and element management systems (NMS/EMS), trouble ticketing systems, knowledge bases, portals and configuration management databases (CMDBs).
- Infrastructure and Facilities: A NOC must be designed and maintained to enable smooth workflow and communication among staff. Redundancy and business continuity are essential to mitigate risk.
- Compliance: NOC services must comply with various regulatory and industry standard requirements.
All of these components present a formidable operating expense but have to be considered in building a successful NOC. Too often, NOCs are built considering only a subset of the above components, and as a result, they struggle to scale and deliver on the required service and financial objectives of the organization.
While it’s easy to talk about best practices, it’s another thing entirely to bring those practices to life within your organization. Success requires careful planning and care, which is why expertise is so critical at the outset of building or optimizing your NOC.
Want to learn how to put these best practices to use in your NOC? Contact us to see how we can help you improve your IT service strategy and NOC support or download our free white paper below.
FREE WHITE PAPER
A Practical Guide to Running an Effective NOC
Download our free white paper and learn how to build, optimize, and manage your NOC to maximize performance and uptime.
*Originally developed by the UK government’s Office of Government Commerce (OGC) - now known as the Cabinet Office - and currently managed and developed by AXELOS, ITIL is a framework of best practices for delivering efficient and effective support services.