Network Operations Centre (NOC) Scope of Work

SERVER MANAGE

1.0       INTRODUCTION  

 

This document specifies the scope of the services delivered to the end customer by Reality Bytes NOC Services Team

 

2.0       PURPOSE

 

The purpose of this document is to provide a detailed description of the services delivered by the RBI NOC by SERVER MANAGE Services. As a requirement to start services, this document must be signed by the client to serve as an agreement for the scope and deliverables of the services under the RBI NOC service.

 

3.0       SCOPE

 

This document specifies the scope and schedule of the services delivered to the end customer: detailing managed devices, services provided for the devices and the delivery schedule under the Reality Bytes Managed services package. 

 

Any items not explicitly covered within this document are considered out-of-scope.  

 

RBI MANAGED Service Package includes:

  • Monitoring and alerting for servers
  • Troubleshooting and fix on servers
  • Preventive maintenance on servers
  • Monitoring and alerting of network devices
  • Troubleshooting and fix of network issues

 

Note: This scope of work is subject to change and modification as needed and without notice.

 

4.0        DETAILED SCOPE OF MANAGE SERVICES

4.1       WINDOWS SERVER SERVICES

4.1.1              24X7 SERVER MONITORING  

 

Below is the standard monitoring included in Windows server monitoring.  

  • Server up/down monitoring
  • Windows Service(s) availability
  • Performance monitoring: CPU, Memory, Paging File & Disk Space monitoring
  • Event log monitoring
  • Website up/down monitoring

 

  • All monitoring templates applied to the servers are customized by the NOC. Any new performance counters, event ID’s and service monitors that are found outside of the existing default templates will be researched by the NOC and applied to all of the client’s servers as appropriate.
  • Monitors might vary depending on capability of the hardware vendor

 

 

Sample Monitoring Thresholds & frequency

 

Monitor Name

Default threshold Value

Polling Frequency

Availability(Ping)

100% packet loss or timeout

Every 5 minute

CPU

Critical – 90%

Every 15 minutes 

Memory

Critical  -   90%

Every 15 minutes

Disk 

Warning – 90%

Critical  --  98% or < 1GB free

Every 15 minutes

Event log monitoring

Specific event IDs per OS and application

Every 30 minutes

Service/Process Availability

Specific services per Application

Every 5 minutes

Web Site Availability

Unavailable or Timeout

Every 5 minutes

 

 

MSP Responsibility

It is our responsibility to verify that the following items have been completed at each supported client site to ensure that services are delivered uninterrupted. 

  • On site team should approve and assist the NOC in resolving any WMI, SNMP, Group Policy issues in the network.
  • The MSP should inform the RBI NOC On-Boarding team about the services, ports, event ID’s, and any interfaces that need to be monitored additional to the monitors applied.
  • The RBI NOC team will validate and add additional monitors if appropriate. The RBI NOC may reject the request based on the availability of technical expertise or priorities.

 

RBI NOC SCOPE / SLA (Servers): 

Issues will be alerted to the MSP escalation contacts based on the priority of the incidents.

  • P0 incidents are responded within thirty minutes.
  • P1 incidents are responded within two hours.
  • P2 incidents are responded within four hours.
  • P3 incidents are responded within twenty-four hours.

 

 

4.1.2                  REMOTE TROUBLESHOOT & FIX (WINDOWS SERVERS)

 

The RBI NOC will remotely troubleshoot and fix issues or alerts that are generated for parts of an existing configuration on the server. Below are some of the examples of the incidents.

 

Domain Account Maintenance (Active Directory)

Move, Add and change user accounts in Active Directory, Issues on AD replication.

 

Windows Server

Active Directory, Exchange, operating system is supported in SBS 2003/Windows 2003 and above

 

Print & File sharing

Server Printer Issues (Queues), Access/restrictions to shared folders

 

Email Issues

Exchange Send/Receive issues, Exchange database size and other issues, mail box size/ quota.

 

Backup

The RBI NOC will monitor the backup job failures, analyze issues (backup job configurations, or source and target access issues, etc.) and make sure that the next backup job schedule runs successfully. 

 

Media not available during backup run time or cleaning tapes or bad drive issues will be escalated to appropriate team for further action.

 

Root Cause Analysis (RCA)

RCA is done by the RBI NOC for unexpected shutdown/reboot of servers only. The RBI NOC will mark the ticket for RCA when unexpected shutdown/reboot is identified during monitoring of the server. The RBI NOC would start RCA after server is back online and reachable remotely.

 

RBI NOC RCA includes findings from event logs

  • Findings from memory dump (if generated)
  • Findings from hardware diagnostic logs

 

The RBI NOC would update the findings in the ticket created for unexpected reboot or the ticket that requested for RCA. 

 

 

RBI NOC SCOPE / SLA: 

A ticket is opened up with the RBI NOC and the tasks are performed on a set schedule.  

  • P0 incidents are responded within thirty minutes.
  • P1 incidents are responded within Two hours.
  • P2 incidents are responded within four hours.
  • P3 incidents are responded within twenty four hours.

  

Out of Scope Items

  • Restorations from backups or server rebuilds are not part of the remediation and fix. E.g. Exchange database is corrupt; restoration of the database, exchange database reached maximum size and requires offline defragmentation, restoration of mailboxes.
  • SRs which are more than 8 hours in effort level go beyond service requests and are considered as projects beyond the scope of the monthly project. Customer authorization will be obtained for projects

 

OnSite Team Responsibility

It is the Reality Bytes on site team’s responsibility to verify that the following tasks have been completed at each supported client site to ensure services are delivered uninterrupted

  • The MSP will approve and assist the RBI NOC tools team in resolving any WMI, SNMP & Group Policy issues in the network
  • Backup job policies and configurations should be managed with the customer by the Prime MSP Contact
  • Backup product license and support agreement renewals should be managed by the Prime MSP Contact

 

Deliverable: 

  • The RBI NOC will create tickets for the alerts received from monitoring and incidents generated by the RMM tools. These tickets are assigned to the respective domain teams in the RBI NOC for resolution.

 

4.1.3               WINDOWS SERVERS MAINTENANCE

 

The RBI NOC will perform maintenance activities on the windows servers on a scheduled basis. Below is the list of maintenance activities. 

 4.1.3.1 WINDOWS PATCH MANAGEMENT

 

The RBI NOC will by default install security & critical patches immediately whenever possible. A maintenance window is to be determined based on the client’s needs for minor patches and fixes.

  • Define and approve reboot time for workstations and servers i.e. 10 PM till 5 AM PST etc.
  • It is important to note that a device will be rebooted following any patch which requires rebooting. Therefore, patching time windows and approvals now must anticipate the possibility of a device reboot
  • All end users are informed of the scheduled maintenance window to ensure the devices are online

 

RBI NOC SCOPE / SLA: 

  • Schedule the patch ticket submitted by RMM tools
  • If installation of the patch fails, a corrective action will be taken by the RBI NOC and the failed patches will be re
  • If the patch update event caused system related issues, the RBI NOC team will be engaged within the defined SLA.

 

Sanity Checks: 

The NOC would perform Sanity checks for the windows servers where Windows patches are installed by NOC by post Windows patch install reboot

 

Sanity Check Process

Sanity checks would start post Windows patches are installed and server is rebooted. 

Post Server reboot, sanity checks includes the following

  • Windows services: Checked for Startup type = Automatic & status = started. NOC would start the windows service if Startup type = Automatic & status = Stopped.
  • Event logs: Application & System event logs with severity level = error would be checked

 

NOC would wait for 30 Min if the server has not come online and escalate the incident with P0 priority to an on site contact.

 

4.1.3.2 ANTIVIRUS/ANTISPYWARE UPDATES  

 

Maintaining current knowledge of available definition updates for AV such as Trend, McAfee, Symantec and End point security antivirus/antispyware solutions and updating the definitions on a scheduled basis. 

 

MSP Responsibility:

The MSP will verify that the following items have been completed at the client environment to ensure these services above can be delivered during a defined schedule.

  • End customers are informed of the scheduled maintenance windows to ensure that the devices are online
  • The MSP should communicate to the end customers that the RBI NOC would login to the servers to resolve definition update issues, or any antivirus application corruption issues

 

RBI NOC SCOPE / SLA: 

Depending on the automation schedule, by default, Virus / Spyware definitions should be updated on a daily basis. 

  • Any issues (corruption or license expiry) that are observed with the Virus / Spyware application or definition update will be alerted via ticket. The RBI NOC will create a ticket and resolve the issue.
  • If the antivirus / spyware update event failed during the scheduled time, the RBI NOC team will check the desktops and laptops. If the machines have failed two (2) consecutive scheduled events or the definition versions are older than 2 days, then the RBI NOC staff will remedy the issues within the defined SLA. 
  • If the antivirus / spyware update event caused system related issues, the RBI NOC team will be engaged within the defined SLA. (Appendix A)
  • The endpoint Antivirus scans is scheduled to run daily at 3 A.M.
  • Endpoint Security Antivirus will run Real time scan on all active data in memory

 

Deliverable:

  • On-Demand reports are available in the RMM tools and MSP can view/export them when required.

 

                  

4.1.3.3 PRO-ACTIVE MAINTENANCE FOR EXCHANGE AND AD

 

The RBI NOC will run scheduled health checks on Exchange and Active Directory servers once every thirty (30) days to check for possible issues, and will alert the MSP if critical issues on the server are found. Based on the critical issues identified from the report, the RBI NOC will create a ticket and request approval from MSP to resolve the issue. If the issue requires more than one hour of effort to resolve, the ticket will then be assigned to the MSP.

 

Example of tasks that are not covered in MANAGE Scope:

  • Exchange Offline Defragmentation
  • Database corrupt
  • Any restorations

 

Exchange Health Check:  The RBI NOC will run ExBPA on the Exchange server once every thirty (30) days to identify any critical issues.  If any critical issues are identified, the RBI NOC will alert the client with details of the issues.

 

Note: The RBI NOC will install ExBPA on the Exchange server(s) if the application is not installed.

 

Active Directory (AD): The RBINOC will run AD Health check on the Active Directory server once every thirty (30) days to identify any critical issues. AD Health check task will check for AD Replication. Based on the issues identified, RBI NOC will create a ticket and request approval to resolve the issue.

 

Note: AD Replication will not be checked for Windows SBS Edition.

 

Reality Bytes On Site Team MSP Responsibility:

The MSP should verify the following items have been completed at the client environment to ensure that these services can be delivered during a defined schedule.

  • Support tools (E.g. ExBPA) are installed on the servers to execute these tasks.

 

RBI NOC SCOPE / SLA: 

  • Depending on automation schedule, by default, maintenance activities will run once every thirty (30) days. The RBI NOC will create a ticket and assign for resolution of the issue.

 

Deliverable:

A ticket will be created by the RBI NOC and worked on until the issue is resolved if the errors are found during the proactive maintenance task. 

 

4.2       VMWare VIRTUAL SERVERS

4.2.1               24/7 MONITORING AND ALERTING

 

24/7 monitoring, alerting, analysis and notification availability monitoring on VMware ESX servers & VC Servers

Ping monitor (Server availability)

Resource (CPU & Memory) utilization monitoring and analysis on VMware ESX Servers & VC Servers

Monitoring Virtual Machines Resources utilization

Fault Detection and notification 

Monitoring backup jobs of Virtual Machines.

4.2.2              ADMINISTRATIVE ACTIVITIES

 Administrative activities include the following

  • Deploying & Managing virtual Machines
  • Customization of Virtual Machines
  • Cloning virtual Machines
  • ESX Server troubleshooting on alerts
  • Performance analyzing and troubleshooting
  • Performing VMotion
  • Patch deployment
  • Virtual Machines Health Check
  • Performing Backups
  • Managing clusters

4.2.3              REMOTE TROUBLESHOOT AND FIX

 RBI NOC will remotely troubleshoot and fix any issues or alerts generated by VMware Servers which are managed by RBI within the SLA time mentioned in this SOW. 

 

MSP Responsibility

Access to servers should be provided to the NOC team

4.2.4             PROACTIVE MAINTENANCE

Patch Management

 

Tasks include

Patches will be reviewed and installed on all the VMware Servers.  Change Management Process will be followed for patch management.

 

Scope/SLA

See section 4.1.3.1 WINDOWS PATCH MANAGEMENT

 

MSP Responsibility

See section 4.1.3.1 WINDOWS PATCH MANAGEMENT

 

4.2.5                 VENDOR COORDINATION FOR TROUBLE TICKETS

 

RBI NOC will perform the following vendor coordination activities:

 

  • NOC will coordinate with vendors in case of Server Hardware issues. NOC will create tickets with the vendor and escalate to the customer as required.

 

  • NOC will also work with Storage vendors where there are issues with the hardware, for device fault isolation, upgrades, hardware replacement or for special configuration changes.

 

 

MSP & Client Responsibility: 

MSP & Client should have valid support contracts with vendors and must authorize RBI NOC to act on its behalf.

4.2.6 Remote System Administration (RSA) – Project Requests

 

RSA Project requests (SRs) are requests that are not due to disruption i.e. requests which are not due to any incidents, monitored event or change requests due to root cause analysis. Examples include:

  • Cloning VM’s
  • Deployment of additional VMs
  • VM VMotion Migration onto other ESX Servers
  • Network Configuration changes
  • Disk Expansion on VMs.
  • VM Customization
  • Adding Storage to ESX or VMs

 

Scope

Examples for Project Services

  • Server Consolidation - Physical to virtual, Virtual To virtual.
  • VMware Servers Up-gradation
  • Virtual Center Server Up-gradation
  • Migration from other Virtualization products.
  • Consolidation of VMware Environment.
  • Deploying ESX Servers
  • Deploying VMware Clusters
  • Performing Guided consolidation

4.3    LINUX SERVER SERVICES

4.3.1 24/7 MONITORING AND ALERTING 

 

Tasks include: NOC will monitor the Linux Servers 24/7 and alert the customer contact based on the priority of the alert and Server. At the same time, the NOC Linux team will troubleshoot and fix the reported issue remotely. 

 

Typical Monitors that will be enabled on all the Linux servers:

  • Uptime
  • CPU
  • Disk
  • Memory
  • Linux Services

 

Note:

All monitoring templates applied to the servers are customized by the NOC. Any new performance counters, event ID’s and service monitors that are found outside of the existing default templates will be researched by the NOC and applied to all of the client’s servers as appropriate.

Monitors might vary depending on capability of the RMM tool and/or hardware vendor

It is our responsibility to validate the following items at each supported client site to ensure services are delivered uninterrupted.

  • RBI on site team should approve and assist the NOC in resolving SNMP, WMI or Group Policy issues in the network.
  • RBI key contact should validate the list of managed devices and confirm that all required devices are being monitored and any special monitor requirements should be informed to NOC
  • The RBI key contact should inform the NOC onboarding team about the services, ports, event ID’s, and any interfaces that need to be monitored additional to the monitors applied. The NOC will validate and add if appropriate, the NOC may reject the request based on the technical feasibility.

 

RBI NOC SLA: 

 

A ticket is opened up with the NOC and the tasks are performed on a set schedule.  

  • P0 incidents are responded within thirty minutes.
  • P1 incidents are responded within Two hours.
  • P2 incidents are responded within four hours.
  • P3 incidents are responded within twenty four hours.

NOC Scope 

  • Issues will be alerted to the RBI escalation contacts based on the priority of the incidents.
  • Alerts will be first analyzed by the NOC and actionable alerts will be followed up for resolution with the RBI KEY CONTACT.

4.3.2 Remote Troubleshoot & Fix (Linux)

 

NOC will remotely troubleshoot and fix issues or alerts generated which is part of existing configuration on the server. Examples of the items covered under OS level troubleshooting are:

  • Server not responding/reachable
  • OS halt situation due to OS panic, resource crunch, hardware or application problems causing OS failures. (OS panics may need vendor assistance)
  • Network failures, physical connectivity issues (With the help of onsite technician from RBI)
  • Fault Isolation, identification of problem external to the managed server
  • System Resource Issues
  • Troubleshoot disk full issues          Application level issues
  • Sendmail/SMTP related issues such as SMTP connection port, Sendmail process issues, Mail queue issues like number of mails send and mails pending
  • DNS related issues and DNS client configuration settings issue

 

RBI Responsibility:

 

It is our responsibility to verify that the following tasks have been completed at each supported client site to ensure services are delivered uninterrupted

The RBI KEY CONTACT should approve and assist the NOC team in resolving any SNMP & Group Policy issues in the network

  • Backup job policies and configurations should be managed with the customer by the RBI KEY CONTACT
  • Backup product license and support agreement renewals should be managed by the RBI KEY CONTACT

 

NOC Out of scope:

  • Restorations from backups or server rebuilds are not part of the remediation and fix

 

Deliverable

  • NOC will create tickets for the alerts received from the monitoring and incidents generated by the RMM tools and Alerts and then assigned to Linux Engineers

4.3.3   Linux Project Services 

 

Requests which require less than 60 minutes of effort per request are included in the monthly contract. For requests exceeding the 60 min effort level, customer authorization will be obtained because these SRs will be considered as service requests and charged additionally to the customer.

  • SRs which are more than 8 hours in effort level go beyond service requests and are considered as projects beyond the scope of the monthly project. Customer authorization will be obtained for projects.
  • Virtualization of Servers.
  • Linux component installation & management (i.e. DNS, ZFS/AFS, etc.)
  • New Print Queue configurations
  • Linux OS reinstallation
  • Apache / Tomcat configuration as per the customer requirements
  • Application Installation provided by customer.
  • Application configurations as per the customer requirements
  • Disk quota implementation and management

5.0        SUPPORT BENCHMARKS

5.1       INCIDENT CLASSIFICATION

 

Priority

Response SLA 

Description

Critical

30 Min

This is an EMERGENCY condition

that significantly restricts the use of an application, system or network to perform any critical business function. This could mean that several departments of the client are impacted.  

High

2 Hours

The reported issue may severely restrict use of key devices in the network. This could mean that a single department is impacted but the overall network and servers are functioning 

Medium

4 Hours

The reported issue may restrict the use of one or more features of the system, but the business or financial impact is not severe.

Low

24 Hours

The reported anomaly in the system does not substantially restrict the use of one or more features of the product to perform necessary business functions.

5.2        SUPPORT WINDOW

 

Maintenance Activity

Frequency

Schedule

Delivery Mechanism

Business Hours

Monday to Friday

8:00 am to 6:00 pm

Remote

Off  Business Hours

Monday to Friday

6:00 pm to 8:00 am

Remote

Weekend Hours

Saturday &  Sunday

 

6:00 pm Friday to 8:00 am Monday

 

Remote

 Note: All hours mentioned are client local hours.

6.0          RBI NOC ESCALATION PROCEDURES

The RBI NOC will call escalation contacts as defined by the MSP till someone is reached on phone for a P0 issue. 

 

Priority

Phone

Email

Ticket

P0: Critical

YES

 YES

YES

P1: High

 

 

YES

P2: Medium

 

 

YES

P3: Low

 

 

YES

       

 

  

APPENDIX A – SLA

 

Task:  Server Monitoring

All issues will be alerted to the MSP escalation contacts based on the priority of the incidents.

 

Priority: Critical

Response Time: 30 Min  

Communication Method:  Phone call as per escalation matrix provided, Ticket and email 

 

Priority: High

Response Time: 2 hours  

Communication Method:  Ticket and email.

 

Priority: Medium

Response Time: 4 hours  

Communication Method:  Ticket and email.

 

Priority: Low

Response Time: 24 hours  

Communication Method:  Ticket and email.

 

Task: Antivirus / Spyware Updates (Servers)

If the antivirus / spyware update event failed during the scheduled time, the RBI NOC team will check the servers.  If the machines have failed two (2) consecutive scheduled events or the definition versions are older than 2 days, then the RBI NOC staff will remedy the issues with a Low priority.

Priority: Low

Response Time:  24 Hours

Communication Method:  Ticket & email.

 If the antivirus / spyware update event caused system related issues, the RBI NOC team will be engaged with a High priority.

Priority: High

Response Time: 2 Hours  

Communication Method:  Ticket & email

 

Task: Patch Management (Servers)

Depending on automation schedule, patches should be updated on a weekly basis.  If patches fail, an automated corrective action will be taken by the RBI NOC to re-install patches in the next schedule.

 If the patch update event caused system related issues, the RBI NOC team will be engaged within the defined SLA.

Priority: High

Response Time:  2 Hours

Communication Method: Ticket and email