Skip to main content

Tooling and Runbooks

Infrastructure Management Tools

Technicians rely on a suite of digital tools to manage the facility:

  1. DCIM: Platforms like Sunbird or Nlyte for tracking rack layouts, power consumption, and asset inventory
  2. Monitoring: Nagios, Zabbix, or Prometheus for real-time fault detection and alerting
  3. Ticketing: ServiceNow or Jira for work requests and incident tracking
  4. Remote Access: Tools like iDRAC or IPMI for diagnostics and coordination with remote engineers
  5. Knowledge Base: Confluence or SharePoint for storing Standard Operating Procedures (SOPs)

Standard Runbooks Examples

Hardware Replacement

  • Verify ticket details and approvals
  • Identify rack and server location via DCIM
  • Coordinate power-down with remote teams
  • Label and disconnect cables
  • Perform hardware swap and reconnect
  • Update ticket and facility documentation

Network Patching

  • Review port mapping diagrams
  • Confirm source and destination ports
  • Execute patching using correct cable types and labeling standards
  • Verify link lights and connectivity

Rack and Stack Deployment

  • Prepare rack space and power distribution
  • Mount equipment according to design specs
  • Connect power and network threads
  • Label all components and perform initial checks
  • Formal handover to the engineering team