The platform of tools and systems for delivering service is a crucial factor in the success of any high-performance NOC. It’s what enables the NOC’s speed, accuracy, and consistency. At INOC, we’ve significantly invested in developing and refining our platform to deliver comprehensive NOC services to clients and partners 24x7x365.
Our latest and third major iteration of the platform is INOC Ops 3.0.
This platform serves as our operating system—the intersection of technology, operations, and service delivery. Its design enables us to ingest alarm feeds from various sources, auto-correlate those events into a single ticket, and present that ticket through a single pane of glass for efficient Incident, Problem, and Capacity Management.
In short, the Ops 3.0 platform enables our team to increase its accuracy and speed while reducing delays in human involvement. It frees NOC engineers to spend less time in the runbook and more time on strategic client projects.
Before digging into the details, let's start by showing what Ops 3.0 is.
Below is a high-level schematic of the platform and how it connects to — and integrates with — a client's infrastructure and existing NMS, ITSM, and communications tooling.
Open larger image in new tab »
The workflow generally moves from the left to the right of the diagram as monitoring tools output alarm and event information from a client NMS or ours into our platform, where a number of tools process and correlate that data, generate incidents and tickets enriched with critical information from our CMDB, and triage and work them through a combination of machine learning and human engineering resources. ITSM platforms are integrated to bring activities back into the client's support environment and the system is integrated with client communications.
Our own Jim Martin puts is much more elegantly:
While our platform has many parts, its overall function can be summed up in just a few sentences:
Here's a simplified visual representation of the platform to reflect the incident lifecycle:
Our platform's five standout capabilities translate into immense value for our supported customers and partners.
Here's a brief overview of each of them.
Our platform uses machine learning to significantly reduce the time from initial alarm to incident ticket creation. This ensures that tickets are created for every event deserving of one, and nothing is missed. Our alarm experts carefully and continually monitor and fine-tune the platform’s machine learning, making us faster and more accurate when identifying issues as we gather more data.
Our platform ingests alarms via our AIOps engine and enriches and correlates them with similar alarms to create a single incident ticket. Then, automated workflows assign or attach impacted CIs and differentiate the affected services or CIs from the likely cause of an incident. The system also attaches relevant knowledge articles for incident resolution. This level of incident automation is available on day one of service. Over time, we further automate repeatable tasks within the alarm-to-action guides and runbooks as we collect more actionable information, such as interface and log data.
Our Service Portal offers several self-service functions, enabling you to interact directly with our platform. Ops 3.0 customers can request changes to their CIs and CMDBs, request portal accounts and updates to knowledge articles and procedures, and submit Scheduled Maintenances. We're continuously exploring additional opportunities to empower you to make direct changes to your information and data where appropriate.
Momentary disruptions can cause incidents that quickly resolve themselves, which can be a useless distraction for NOC engineers. To address this, we've added automation that automatically resolves any ticket that has alarms within a few minutes. This provides faster updates to our supported clients and reduces non-productive work for the NOC, allowing us to focus on critical ongoing issues. For clients utilizing our Problem Management service, all alarms that fall under this category are still reported on and reviewed by our Advanced Technical Services team as part of Problem Management.
Ops 3.0 boasts an advanced multi-tenant architecture that provides customized solutions for each client's security needs. This architecture ensures the strict isolation of client data and network access, enabling efficient resource usage, swift deployment and updates, and a consistent client experience. Adhering to ISO 27001 standards, our platform offers a security-first approach, ensuring that our clients' data is efficiently managed and securely guarded.
Our multi-tenant architecture offers several advantages:
Here's a brief dissection of each major component that makes up the Ops 3.0 platform.
Our platform allows for the ingestion of alarm and event information from your NMS infrastructure, enabling us to receive alarms from a simple network monitoring tool or a whole suite of monitoring tools (everything from application management to traditional network management to optical or physical layer management systems).
If you don’t currently use an NMS or aren’t satisfied with your instance, hosted solutions like LogicMonitor, New Relic, or iMonitor (our headless alarm management platform) are available.
Integrating these NMS tools with the AIOps platform ensures seamless alarm and event management—a key service differentiator.
Our platform integrates seamlessly with all major NMS platforms, including:
The core of our alarm and event management system is our AIOps engine, which utilizes machine learning to automate low-risk tasks and extract actionable insights from the vast amounts of data gathered across clients' supported environments.
Our AIOps tools correlate alarms from multiple sources, perform deep inspection of those alarms, and enrich them with additional metadata from our CMDB to expedite informed action.
Importantly, you can seamlessly integrate your existing infrastructure monitoring system with our AIOps toolset to retain the systems you already use while further maximizing those investments. This integration feeds alarms from your NMSs (or ours if needed) into our platform, streamlining alarm correlation, enrichment, and automatic ticket creation.
After a ticket is generated, our platform automatically identifies and attaches CIs from our CMDB, giving NOC engineers clear direction for investigation before any human intervention. The platform also supplies relevant knowledge articles and runbooks, facilitating fast, accurate diagnosis and action plan development.
The INOC CMDB is the backbone of our NOC support service, enabling operational workflows and platform automations. Unlike basic CMDBs that contain only device or asset data, the INOC CMDB includes all essential information for AIOps to correlate events, create incident tickets, assess impact, and guide NOC engineers in resolving issues. This is why we prioritize creating and maintaining it throughout the service lifecycle.
Learn more about our CMDB in our explainer.
We ensure seamless integration of our CMDB with our clients' configurations, allowing us to associate the necessary meta information with every alarm and ticket. This equips our NOC engineers with the actionable information they need to make informed decisions. Additionally, we leverage our years of experience to enhance our clients’ existing CMDB structures and capabilities, further improving efficiency and effectiveness.
Our platform's ITSM component, primarily ServiceNow, enhances automation capabilities by attaching CIs and records from the CMDB to incident tickets created by AIOps. This process automates initial impact assessment and provides NOC engineers with a likely set of issues and impacted service areas even before they touch the ticket.
The platform also automatically attaches knowledge articles and runbooks, enabling the NOC engineer to quickly access reference material for diagnosis and action plans, further increasing their efficiency.
TicketViewer is an essential queue management system designed specifically for NOC engineers, helping them efficiently identify pending tasks in the queue and understand the corresponding SLAs in place.
TicketViewer also provides NOC management with a comprehensive overview, enabling effective resource allocation and ensuring that critical incidents receive the attention they deserve through a color-coded staging system that highlights ticket urgency.
The system also empowers NOC management with the capability to monitor and manage the performance of NOC engineers directly. It provides insights into individual and team response rates and problem-solving efficiency, enabling managers to identify bottlenecks and implement targeted improvements.
In addition to integrating with your existing monitoring infrastructure, our platform also integrates with your existing ITSM tools, which improves the efficiency of incident resolution by identifying and attaching relevant configuration items and knowledge articles.
We integrate ticketing platforms bidirectionally so incidents can be passed from our platform to yours, where they can be actioned if needed. This creates a seamless, integrated support experience, allowing you to interact with tickets in your platform while receiving critical updates from ours.
We also integrate with communication tools to inform you of high-priority tickets and other critical updates.
All data collected from our AIOps engine, NMS instances, and ITSM toolset are sent to our data platform, where systems like Snowflake and Amazon Athena process and normalize it.
This data is then displayed in a client portal powered by our bespoke ServiceNow, with embedded Tableau reports providing actionable insights and comprehensive visibility into NOC performance and other related metrics.
Our runbooks integrate various functions and features of support operations, making them repeatable and easy to follow. These runbooks include process flow diagrams and detailed work instructions for various processes such as incident management, scheduled maintenance, change enablement, problem management, etc. Documentation about sites, infrastructure, connectivity, and other tools supports operational support processes and the CMDB.
We also develop alarm-to-action guides that focus on top alarms the NOC will receive, creating consistent knowledge articles and repeatable processes based on those alarms and alarm signatures. These resources enable us to achieve a high percentage of Tier 1 incident resolution.
As our support operation matures and learns more about each supported client environment, we’re continuously improving our automation capabilities—turning alarm-to-action guides and runbook procedure steps into automatable tasks that the platform can perform independently. Automating tasks such as collecting interface statistics or log data can further improve impact assessment and incident resolution times.
We also recognize that documentation is not static and must be updated as circumstances, personnel, regulations, business needs, and technologies change. That’s why we prioritize keeping our runbooks and documentation up-to-date, reflecting the organization's changing business and technology environments in a cyclical process to ensure effective and efficient support operations.
In addition to our platform, one of the key value components of our approach is our structured NOC. Our Tier 1 Service Desk handles inbound and outbound calls and emails and works towards resolving issues. But unlike other service providers, we position an Advanced Incident Management team (AIM), upstream to perform initial troubleshooting, even for Tier 1 service.
The AIM team is an extended function of our Tier 1 Service Desk, comprising senior troubleshooting staff within the Tier 1 team. This team conducts initial troubleshooting and creates an action plan after completing an investigation and impact analysis. This action plan drives everything that happens downstream. Every update we perform in the ticket is effectively its own handoff of that incident.
By filtering everything through the AIM team, we avoid mistakes that may be made by less experienced engineers and prevent our Tier 1 staff from potentially spending hours trying to troubleshoot an issue that requires a more advanced engineer. The AIM team's role is critical in ensuring that incidents are managed effectively, and our approach ensures that issues are resolved without mistakes and that resources are allocated appropriately.
The graphic below illustrates how the Structured NOC radically transforms where and how support activities are managed—both by tier and category. In a matter of months, the value of an effective operational framework becomes abundantly clear as support activities steadily migrate to their appropriate tiers, lightening the load on advanced engineers while working and resolving issues faster and more effectively.
The onboarding process is the foundational step towards ensuring that our platform aligns perfectly with our clients' unique needs and operational frameworks. We typically get clients up and running in just six to nine weeks.
At a high level, this process involves:
Here's an expanded look at each step of our typical onboarding process. Read our onboarding guide for more details.
A well-populated CMDB is crucial for effective incident management. It provides our engineers with necessary insights into each component's role within the client's IT environment, enabling quicker and more accurate issue resolution.
Our CMDB enriches alarm data with critical metadata, allowing for a more nuanced understanding of incidents, and leading to more informed and effective responses. Understanding the business impact of each component and service allows the platform to prioritize incidents based on their potential impact on the client's operations, ensuring that critical issues are addressed promptly and efficiently.
The integration of business impact analysis within the CMDB framework ensures that every incident is not just seen through a technical lens but also from a business perspective. This approach ensures that the response to each incident is aligned with its potential impact on the client's business operations and objectives.
Our CMDB is not static; it evolves continuously with the client's changing IT landscape. Regular updates and reviews ensure that the CMDB accurately reflects the current state of the client's IT environment, maintaining the efficacy of the incident response and resolution processes.
Here are just a few actual client results we've achieved with Ops 3.0 to date.
The Ops 3.0 Platform actively monitors infrastructure and applications in real time, 24x7, to identify faults, make actionable correlations, and prepare tickets for resolution without disrupting your team, your business, or your customers.
By applying AIOps to the NOC operations environment, we’ve removed the filters required to make analysis manageable for human engineers. With vastly superior data processing and machine learning power, we can instantly understand issues and identify the subtle indicators of approaching problems within a torrent of noisy data. For the first time, we’re able to start listening to what all of your data has to say about your environment and use those insights to deliver genuinely proactive NOC support.
Want to learn more about how the INOC Platform can help you achieve peak performance while saving your team valuable time and resources? Use our contact form or schedule a NOC consult to tell us a little about yourself, your infrastructure, and your challenges. We'll follow up within one business day by phone or email.
No matter where our discussion takes us, you’ll leave with clear, actionable takeaways that inform decisions and move you forward. Here are some common topics we might discuss: