PagerDuty
www.pagerduty.com
December 2016
April 2017
Development
Software Design

PagerDuty runs a SaaS platform for managing and delivering alerts related to various incidents triggered across an organization's systems. They provide a range of tools for configuring the set of services monitored, the escalation policies that determine who gets notified of each incident, and the notification preferences for each user configured within the platform. To extend the breadth of their platform, they published a request to have a bi-directional integration with the Cherwell Service Management platform developed, allowing PagerDuty incidents to be triggered by events within Cherwell and for updates to the PagerDuty incident to be reflected on the corresponding event within Cherwell.

The Cherwell Service Management platform is a suite of IT service management applications. It enables the rapid development and monitoring of data-centric applications, using a mixture of built-in configuration tools and custom application development supported within the platform.

Having used PagerDuty at multiple previous companies, I had a good sense for the capabilities of their platform but had never worked with or even heard of Cherwell. After reviewing the available documentation, I identified an approach for integrating these two systems that leveraged web service One-Step and REST API utilities built into the Cherwell platform and used an AWS Lambda function to translate these requests into the format expected within the PagerDuty platform. I constructed a proposal outlining this approach and submitted it for review.

When my proposal was selected, I produced a set of workflow and sequence diagrams to further validate and communicate my proposed design. The process of constructing these diagrams required me to dig deeper into each system and to understand the implementation of this integration in much greater detail. This gave me far greater confidence that I would be able to deliver the project in an efficient manner. This document also served to fully clarify the requirements for the end deliverable and was a key tool for managing expectations with all project stakeholders. This did not prevent the implementation from being adjusted as we continued to learn more about each system, but it did provide a solid foundation for us to build on as we addressed these changes.

To support the necessary communication from PagerDuty to Cherwell, I implemented a Node-based AWS Lambda function that was triggered on each change to a PagerDuty incident and relayed these updates into the Cherwell system. With built-in support for configuring these types of Lambda functions, this approach fit seamlessly into the PagerDuty platform architecture. The lambda function used the Cherwell REST API to make a series of HTTP requests, retrieving and updating the corresponding Cherwell incidents as needed. I included full automated test coverage for this functionality, which proved extremely valuable for maintaining high quality as new requirements arose late in the project.

To support the necessary communication from Cherwell to PagerDuty, I used the build-in Cherwell tools to construct a Blueprint for configuring a client's Cherwell instance with the modifications to support this functionality. These changes included updates to the Cherwell data models, additional event listeners to process updates to incidents within the Cherwell service, and custom functions to relay these updates into the PagerDuty REST APIs when appropriate.

All artifacts supporting this integration were completed and delivered, along with documentation for the steps to deploy and configure the Cherwell Blueprint. In addition, I provided a set of screencasts covering the setup and validation of the integration to provide a clear understanding of its operation. With this initial integration completed, the PagerDuty team was able to gather the necessary buy-in from both organizations to proceed further refine this integration before delivering it to their customers.