[Jenkins-infra] Post-Mortem - 11/09/2017
me at olblak.com
Fri Nov 10 08:53:58 UTC 2017
Yesterday from 3:37PM to 5PM (UTC) according the monitoring, four
Jenkins-infra services were down.
This was caused by a modification to the Loadbalancer in front of those
services that accidentally generated a new public IP and removed the one
that was used.
In order to avoid this issue to appear again in the future, we need to
improve two following points.
1) We must ensure we assign a fixed and controlled public ip to those
2) We must ensure alerts are correctly reported by Pagerduty
More information about the Jenkins-infra