Service Level Management Explained: Basics and Best Practices (2024)

This guide explains basic and advanced concepts in modern service level management presented through the lens of network operations professionals.

noc staff looking at data

A Brief Introduction to Service Level Management

Service Level Management (SLM) is a critical component within the IT Service Management (ITSM) framework, specifically as a part of the five key elements in the ITIL* Service Delivery area. Its role is pivotal in structuring, defining, and agreeing on the necessary service levels for business operations.

SLM is essential in creating Service Level Agreements (SLAs) and Service Level Objectives (SLOs), as well as in determining the costs of service delivery. Its primary focus is to design, monitor, and manage the agreed service levels to ensure that the quality of IT services meets business operation requirements.

Today, however, SLM has expanded to include more than just SLA management. It now fundamentally involves understanding customer business requirements, establishing appropriate expectations, and consistently meeting the established SLOs.

By adopting SLM processes, IT teams can deliver the agreed service levels more efficiently and cost-effectively. These processes help define the responsibilities and roles of both the business and IT departments, thereby promoting smooth, reliable business operations.

It's important to note that business units, not solely the IT department, bear the responsibility of justifying the required service levels to senior management in order to support their respective business processes. SLM integrates continuous improvement procedures to ensure that any changes in business requirements are promptly reflected in IT services. This guarantees a continuous alignment of IT services with the evolving needs of the business.

Be sure to drill deeper into SLM in our other guides:

1 Service Level Management Basics

NOC engineer smiling


It’s important to start any discussion of modern SLM by establishing definitions, specifically SLM vs. SLA vs. SLO, vs. SLI, or Service Level Indicator.

service level definitions

 

What is Service Level Management?

At a high level, Service Level Management is the practice of bringing everyone together in agreement with how IT service works. SLM ensures service levels are measured and reported. Practically speaking, SLM involves defining, documenting, and managing service levels.

What is a Service Level Agreement?

To carry out SLM, one has to define the performance measures in an SLA. An SLA, at its simplest level, is an agreement between an IT service provider—internal or external—and the customer that ensures certain characteristics that measure the performance of the service are defined. It also establishes the responsibilities, the means to measure, and reporting cadence on actual outcomes relative to those agreements.

An SLA can contain one or more performance measures (SLOs) for which the service provider is responsible. The SLA also contains reporting responsibilities, credits, and penalties.

What is a Service Level Objective?

SLOs specify the service, responsibilities, and service level targets that comprise an SLA. In other words, they’re the “substance” or component parts of an SLA.

Here’s an example: A NOC service provider may establish an SLO that sets the response time for phone calls. Here, a substantive SLO may be answering the phones in an average of 30 seconds measured over a month. Another SLO, this one for call handling, might indicate the maximum time a call can wait to be answered must be within five minutes. 

 

What is a Service Level Indicator?

SLIs are the components of an SLO. 

Here’s an example: 

(SLO) Time to Notify for alarms = (SLI) Time elapsed between alarm generation and alarm receipt + (SLI) Time delay until either automation or NOC engineer starts addressing the alarms + (SLI) Time duration to create the Incident record

Each SLI can be measured, and in total, they reflect the SLO. These measures provide actual insight into the performance level of the NOC to comply with the SLO.

A Few Key Considerations

  • Beyond managing SLAs, how well are you doing at understanding your customers' or end-users' business requirements, setting expectations, and establishing/consistently meeting SLOs?
  • How well is your SLM integrated within your overall IT Service Management framework, specifically service delivery?
  • Are the correct departments and business units in our organization involved in SLM processes? Is there clear communication and agreement on the required service levels?
  • How well are you measuring and tracking your service levels? Are you consistently meeting them? If not, what measures do you have in place to rectify the situation?

See some room for improvement? Get in touch and let's discuss possible solutions.

Talk with us

2Why Implementing Service Level Management is Critical in ITOps

NOC engineer working

 

Implementing SLM comes with an array of immediate benefits that can significantly improve the alignment of IT services with business needs and enhance overall service quality. 

Here are some of those key benefits—some perhaps more obvious than others—along with clarifying examples from our own experience in providing NOC support services.

A better understanding between IT and business units

Implementing SLM facilitates better communication and understanding between IT and business units. This results in a better alignment of IT services with business requirements, contributing to more effective operations and decision-making.

Example from INOC: Through SLM, we get a very complete understanding of our clients’ business needs so we can tailor our services accordingly. For instance, a client might require more support resources during peak business hours. Understanding this requirement, we can adjust our staffing and resource allocation accordingly. 

 

More accurate service quality expectations

SLM allows teams to set more precise expectations regarding service quality. It provides mechanisms for effectively measuring, monitoring, and reporting service quality, ensuring that services are consistently delivered at the required standards without misunderstandings that are often invited when there’s a lack of communication at the outset.

Example from INOC: SLM is critical for us to set accurate expectations with our clients regarding each critical metric and service level we deliver.

For instance, it’s not uncommon for the teams we work with to assume SLA/SLOs apply per ticket, which results in misunderstandings down the road. In a 24/7 NOC environment, particularly in the shared NOC model we use with many clients, ensuring all tickets are addressed within the SLA/SLO timeframe is impractical.

There may be situations when multiple high-priority incidents (P1s) occur simultaneously, but due to the limitations in staffing, not all can be attended to with equal urgency. In such scenarios, while every effort is made to respond swiftly, handling incidents at the same speed is not always feasible. Setting this expectation from the start is critical.

 

Clearer definition of roles and responsibilities

SLM processes clearly delineate the roles and responsibilities of all involved parties. This clarity aids in avoiding confusion, managing expectations, and ensuring that all parties are held accountable for their part in the service delivery.

Example from INOC: SLM clearly outlines each party’s responsibilities around how NOC support is delivered. For instance, we might be responsible for monitoring a client’s network and responding to incidents at Tier 1, while the client might want to take responsibility for Tier 2 and for providing access to systems.

 

Flexibility in business operations

SLM provides the flexibility businesses need to react quickly to changing market conditions. By effectively managing service levels, teams can adapt their IT services to meet evolving needs and opportunities.

Example from INOC: When one of our clients needs to scale up their operations rapidly due to growth, the flexibility built into SLM allows us to accommodate these changes and actually enable that growth. The NOC could ramp up resources or reconfigure its monitoring and other tooling to ensure that the network can handle the increased load.

 

Cost-efficiency

SLM helps avoid or mitigate the costs of excess or insufficient capacity. By accurately sizing the infrastructure and continuously monitoring service levels, businesses ensure they’re paying only for what they need, leading to significant cost savings.

Example from INOC: If a client's business is seasonal and has low network usage during certain months, the NOC can scale down services during those periods, reducing unnecessary costs.

 

A Few Key Considerations

  • Are you setting precise expectations about service quality? How well are you measuring, monitoring, and reporting on service quality?
  • Is SLM helping you avoid or mitigate costs of over- or under-capacity? Do you continuously monitor your service levels to ensure cost efficiency?
  • Are you actively seeking to improve your SLM processes? What areas have you identified for future improvement?

See some room for improvement? Get in touch and let's discuss possible solutions.

Talk with us

3Service Level Management Activities

Working in the NOC


Let's briefly unpack the activities that comprise SLM today:

  • Identifying business requirements
    This involves close communication with business units to understand their specific needs and expectations, including what IT services they require, when they require them, and the desired quality of these services.
  • Establishing Scope of Services
    Here, the specific services, their delivery timelines, operating hours, recovery mechanisms, and performance standards are defined. These aspects help establish clear expectations for the service provider and customer.
  • Translating business requirements into IT requirements
    This process involves converting the needs of the business into specific technical requirements that IT can implement and manage. This ensures IT services align with and support business needs.
  • Performing a capability gap analysis
    A gap analysis involves comparing current service capabilities with the business requirements to identify any gaps that need to be addressed. This helps to ensure that IT services can meet the needs of the business.
  • Determining service costs
    This activity involves assessing all costs associated with providing the services (such as labor, technology, and infrastructure costs), ensuring that service goals can be achieved at a price that the business considers affordable.
  • Drafting, negotiating, and refining SLAs
    SLAs are formal contracts between the IT service provider and the customer, outlining the service scope, performance standards, responsibilities, and pricing. Drafting and negotiating SLAs involves working closely with business units to ensure the agreement meets their needs and obtaining approval from all parties.
  • Implementing SLAs
    Once the SLA has been agreed upon, it's implemented. This may involve configuring IT systems, training staff, and communicating the terms of the SLA to all relevant parties.
  • Measuring SLA/SLO performance
    This involves monitoring and measuring the performance of the services against the standards set out in the SLA/SLO, reporting on these results, and adjusting service delivery as necessary to ensure continuous improvement. Regular reporting helps to ensure transparency and that all parties are aware of the service performance.

4Implementing Service Level Management

NOC engineer in red shirt


Implementing SLM involves a series of steps that require careful planning, execution, and monitoring.

Here’s a high-level look at how implementation works step-by-step.

1. Identify and gather service requirements

The first step in implementing SLM is to identify the service requirements of the business units or customers. This involves engaging with stakeholders to understand their needs and expectations, the business outcomes they want to achieve, and the IT services required to support these outcomes. This understanding forms the basis for defining and designing the IT services.

2. Establish IT services

Once the service requirements are identified, the next step is to define and document which specific IT services will be provided to meet those requirements. This involves outlining the scope, features, quality, and performance standards of each service. 

3. Establish service levels

Service levels define the expected performance of an IT service. These targets should be aligned with the business requirements identified in the first step and should be measurable, achievable, relevant, and time-bound.

4. Develop SLAs and SLOs

Again, an SLA is the contractual agreement between service providers and clients outlining the level and quality of services that will be provided. An SLO is an agreement between service providers and clients outlining the level and quality of services provided. These aren’t contractually binding and don’t include penalties. SLOs tend to be the component parts of SLAs.

5. Monitor and report on service performance

Once the SLAs and SLOs are in place and service is live, the next step is to continuously monitor the performance of the IT services against the agreed-upon service levels. This involves tracking KPIs, preparing service level reports, and providing these reports to the relevant stakeholders, ideally through regular reporting sessions—not just “as needed.”

Here at INOC for example, our reporting criteria are exhaustive—painting a complete picture of service delivery across all its dimensions:

  • Review tickets, incidents, and escalations
  • Review chronic alarms and incidents
  • Coordinate maintenance calendar
  • Report on service and operations metrics
  • Contractor assessments of performance
  • Forecast planned activities
  • Review quantity and causes for NOC support activities
  • Identify root causes and preventive actions
  • Review client escalations and open items
  • Review staffing and proficiency levels
  • Validate assumptions for contract
  • Identify and mitigate risks within infrastructure and operations
  • Review previous action items and status

Read our other guideNOC Service Level Reporting—for a deep dive into service level reporting, specifically.

 

6. Manage relationships with all stakeholders

Successful SLM also involves managing relationships with customers and stakeholders. This includes managing customer expectations, communicating effectively about service performance, and ensuring customer satisfaction.

Here at INOC, QA/QC is a dedicated function of our NOC services. Our quality control measures keep most concerns off everyone’s radar entirely. We pull large samples of both specific and random tickets to understand how they were created, worked, and closed to learn from successes, identify opportunities for improvement, and understand any potential impact on service.

We also evaluate team-to-client interactions to ensure each communication is professional, efficient, and actionable for both parties. Our dedicated quality team evaluates adherence to procedures, quality of work delivered, response times for notifications and escalations, and a variety of other criteria that factor into service. Each lesson, large and small, is applied to continuously improve our workforce, our platform, and our processes so we can pass the value onto our customers.

7. Run a Continuous Improvement Plan

While SLM never ends, the final ongoing component in implementation is the continuous improvement plan. This involves identifying opportunities for improving the IT services and the SLM process, implementing improvement actions, and monitoring the results to ensure that the improvements are effective.

IT environments are anything but static. We work with all of our customers to identify changing business needs and do everything we can to ensure satisfaction. That includes ongoing QC/QA analysis and reporting as well as in-depth quarterly business reviews to look back and forward.

5Addressing Common Service Level Management Challenges

Hands typing on keyboard

 

Implementing an SLM framework to manage NOC service levels isn’t a straightforward process. There’s no handy, universal guidebook to follow.

As a result, many teams face the following challenges:

Challenge #1: Limiting themselves to the most basic operational service levels

This is especially true in NOCs, but applies to ITOps and network operations more generally, too. Some teams only focus on the most basic operational service levels like uptime, response time, and resolution time. While these are certainly important, this approach neglects a wider array of potentially beneficial service levels that could better align IT services with business objectives.

For example: Service levels should consider customer satisfaction, the business impact of incidents, or the success rate of changes. By focusing too narrowly on basic metrics, teams might not fully optimize their services or meet the comprehensive needs of their clients or end-users. 

 

Challenge #2: Taking an approach where service level compliance doesn’t reflect the quality of the service

This is a common reason organizations switch their NOC service to us from another service provider: They were getting glowing reports while experiencing sub-par service. The reports didn’t align with the real world. SLM shouldn’t be abused to distract a company from quality problems simply by focusing on other metrics that don’t actually capture that quality.

For example: A NOC might meet its target of restoring service from 95% of incidents within an agreed timeframe but still deliver poor service if the same issues keep recurring or if solutions are temporary patches rather than permanent fixes. Here, although the NOC is technically in compliance with its SLAs, the service quality is poor. This challenge calls for a more balanced approach that values service quality as well as compliance with service level targets.

 

Challenge #3: Rendering themselves unable to monitor and report on service levels effectively

An effective SLM framework requires the ability to accurately monitor and report on service levels. However, some teams may struggle with this, possibly due to a lack of appropriate tools, skills, or processes. 

For example: A NOC might not have the right monitoring tools to track certain service level metrics, or it might lack the reporting capabilities to effectively analyze and communicate service level performance. This inability can lead to a lack of visibility into service performance, making it difficult to manage service levels effectively and demonstrate value to clients.

 

A more comprehensive approach to SLM enables teams to paint a complete and accurate picture of the quality of service provided, which enables CSI.

Applying an SLM framework at this level takes thoughtful planning and diligent management. Here at INOC, we’ve tuned this powerful framework to unleash its full potential not only for establishing SLAs but measuring and improving service proactively.

A Few Key Considerations

  • Are you limiting yourself to the most basic operational service levels (uptime, response time, resolution time), or are you considering a broader range of service levels that align better with business objectives? How could you expand your scope of service levels?
  • Does your current approach to SLM truly reflect the quality of your service? Have you ever experienced a situation where your reported service level performance did not match actual service quality?
  • Do you have the appropriate tools, skills, and processes in place to accurately monitor and report on service levels?
  • Are you addressing service issues by providing permanent solutions, or are you facing repeat incidents due to temporary fixes? Can you adjust your SLM approach to improve this?
  • Are you applying a comprehensive approach to SLM that provides a complete picture of your service quality? If not, what areas are you missing, and how can you improve?
  • Are you using SLM to identify areas for improvement and drive continuous service improvement?

See some room for improvement? Get in touch and let's discuss possible solutions.

Talk with us

6Taking Service Level Management to the Next Level

Here at INOC, we complement standard KPI reporting, which includes monthly SLA measurements, with an array of additional SLOs to better measure performance and keep both teams aligned on success.

In our view, limiting reporting to just a handful of rigid service levels rarely tells the full story about the quality of NOC service being provided. Limited reporting also ignores important operational signals that serve as inputs for continual improvement. 

We offer four SLA options with several SLOs within each SLA, as shown below to provide a real-world example.

SLA

SLO

Standard

  • Answer calls within 60 seconds (average) with a maximum wait time of 10 minutes
  • Acknowledge email requests within 120 minutes
  • Notify within 15 min for all priorities

Custom

  • Supports custom time to notification, time to action, dispatch times, follow-up times with carrier, field support, OEMs and other third parties.

 

There are also several additional SLOs we maintain described below. Keep in mind that variances in networks, types of alerts, vendors, contractors, and other factors can impact these definitions.

SLO

Targets

Time to Notify (Events)

The time elapsed from receiving an actionable Event as indicated in our systems until the time we send an automated electronic notification to a client or customer. The actionable Event results in creating an Incident record in our ticketing system. The automated notification can be in the form of email or SMS or auto-call with an option to speak to a NOC representative. The specific form depends on a client’s Service Catalog. This is measured monthly and averaged over all the Incidents.

Time to Notify (Email)

The time elapsed from the receipt of an actionable Email until the time we send an automated electronic notification to a client. The actionable Email results in the creation of an Incident record in our ticketing system. The automated notification will be in the form of an email. This is measured monthly and averaged over all the Incidents.

Time to Acknowledge (TTA)

"Response Time" is measured from the time an Incident record is created in our ticketing system until the time that a client is advised their issue is being addressed by assigned INOC personnel. The client will be contacted either by phone or email based on the client’s Service Catalog and the Incident marked "In Progress". Specifically, Response Time is measured from the time of Incident creation until the status is updated to "In Progress".  This is measured monthly and averaged over all the Incidents.

Response Time is the time it takes to acknowledge a customer's issue is being addressed in a non-automated way.

Mean Time to Impact Assessment (TTIA)

The time period commencing upon the creation of an Incident in our ticketing system and ending when we provide technical information isolating the probable cause of the fault, impacted services, and an action plan to restore. This will in many cases be equal to Time to Initial Investigation Completed. This is measured monthly and averaged over all the Incidents.

Update Frequency

Frequency of updates to tickets unless the NOC receives any external (non-NOC) update or if the Update Frequency is paused due to a calendar entry. If the NOC receives an external update, the ticket is placed into the active queue immediately and processed based on Priority.

Mean Time to Notify Carrier (TTNC)

The time elapsed from completing the impact assessment isolating the fault to a carrier until the time we create a ticket with the carrier. This is measured monthly and averaged over all Incidents that are carrier related. This is dependent on phone hold times with the carrier. If there is a carrier portal or integration with the carrier’s ticketing system, this can speed up things considerably. 

Mean Time to Restore 

Time to Restore is the time period commencing upon the creation of an Incident in our ticketing system and ending when we provide, as applicable (i) remote restoration or (ii) the technical information which, when implemented, will restore the affected service or site to usable level of functionality. The SLA of Mean Time to Restore is measured as an average across all Incidents during a calendar month.


NOC Time to Restore

The time segment of Time to Restore that is attributable to the NOC (where the ticket is assigned to the NOC) and includes at a minimum the time to create an Incident, perform initial investigation and diagnosis and remote restoration activity.


Client Time to Restore

The time segment of Time to Restore that is attributable to a client (where the ticket is assigned to the client). These times are tracked for reporting purposes only.


Third-Party Time to Restore (FLM, Equipment Vendor, Carrier, Power Company)

The time segment of Time to Restore that is attributable to the Third Party (where the ticket is assigned to a Third Party such as a carrier, FLM, equipment vendor, power company or other). These times are tracked for reporting purposes only.

 

So, how does this approach to SLM translate into tangible value for a client?

Put simply, it drives a constant state of continual improvement. We want to take every opportunity to make processes and activities as efficient as possible. That means closely examining each component of an SLO, spotting those opportunities, and for example, adding automation to make incremental improvements that contribute to greater availability and less downtime.

With this expanded approach to SLM, each monthly report we produce presents both precise reporting around key service levels as well as a big-picture perspective that can inform proactive enhancements and optimization. (Read our guide to SLM reporting here.)

Final Thoughts and Next Steps

SLM is instrumental in tracking, reporting, and reviewing these services' performance to understand better every dimension of service being provided and improve the performance for handling the next incident. 

For IT leaders considering SLM in the context of their own IT service, the following questions can be illustrative of the need for action:

  • When was the last time you reviewed your SLA agreement with your service providers?
  • Do you or your service provider’s current approach to SLM go far enough to paint a complete picture of performance?
  • Do you or your service provider’s current approach to SLM provide sufficient reporting to contribute to continual improvement activities?
  • Do you have enough of the correct SLOs in place to fully track the performance of your operation? Are there performance blindspots?
  • Do plans for future IT service projects require you to revisit and potentially enhance your approach to SLM?

Not satisfied with answers to these questions, or need help working through the correct Service Level Management components for your organization? Schedule a free NOC consultation below to see how we can help you improve your IT service strategy and NOC support.

Book a free NOC consultation

Connect with an INOC Solutions Engineer for a free consultation on how we can help your organization maximize uptime and performance through expert NOC support.

Our NOC consultations are tailored to your needs, whether you’re looking for outsourced NOC support or operations consulting for a new or existing NOC. No matter where our discussion takes us, you’ll leave with clear, actionable takeaways that inform decisions and move you forward. Here are some common topics we might discuss:

  • Your support goals and challenges
  • Assessing and aligning NOC support with broader business needs
  • NOC operations design and tech review
  • Guidance on new NOC operations
  • Questions on what INOC offers and if it’s a fit for your organization
  • Opportunities to partner with INOC to reach more customers and accelerate business together
  • Turning up outsourced support on our 24x7 NOC
BOOK A FREE NOC CONSULTATION

Contact us

Have general questions or want to get in touch with our team? Drop us a line.

GET IN TOUCH

Free white paper

Download our free white paper and learn how to overcome the top challenges in running a successful NOC.

Download

*Originally developed by the UK government’s Office of Government Commerce (OGC) - now known as the Cabinet Office - and currently managed and developed by AXELOS, ITIL is a framework of best practices for delivering efficient and effective support services.

 

Contributors to this guide

 

Prasad Ravi
Co-Founder/CEO, INOC
Prasad Rao
Co-Founder/President/COO, INOC
Jim Martin
VP of Technology, INOC
Hal Baylor
Director of Business Development, INOC
Ben Cone
Senior Solutions Engineer, INOC
Liz Jones-Queensland
Communications and Learning Manager, INOC

 

Let's talk NOC.

Book a free NOC consultation and explore support possibilities with a Solutions Engineer.

BOOK NOW →