[Jenkins-infra] Post-Mortem - 11/09/2017

Olblak me at olblak.com
Fri Nov 10 08:53:58 UTC 2017


Yesterday from 3:37PM to 5PM (UTC) according the monitoring, four
Jenkins-infra services were down.
* jenkins.io
* accounts.jenkins.io
* plugins.jenkins.io
* repo-azure.jenkins.io
This was caused by a modification to the Loadbalancer in front of those
services that accidentally generated a new public IP and removed the one
that was used.
In order to avoid this issue to appear again in the future, we need to
improve two following points.
 1) We must ensure we assign a fixed and controlled public ip to those
 Loadbalancer
 2) We must ensure alerts are correctly reported by Pagerduty


More information about the Jenkins-infra mailing list