[Jenkins-infra] Pluginsite: post-mortem 2017-11-29

R. Tyler Croy tyler at monkeypox.org
Thu Nov 30 17:17:56 UTC 2017

(replies inline)

On Thu, 30 Nov 2017, Olblak wrote:

> Hi,
> Yesterday, from 3:10PM UTC to 5:20PM UTC (according Datadog),
> 'plugins.jenkins.io' was down.
> The reason of this outage was due to an "un-catched" breaking change
> with the upgrade of the ingress controller.
> We upgraded the ingress container from
> nginx-ingress-controller:0.9.0-beta.15 to
> nginx-ingress-controller:0.9.0-beta.19 but started from
> nginx-ingress-controller:0.9.0-beta.18, annotation name changed
> from ingress.kubernetes.io to nginx.ingress.kubernetes.io. which had for
> consequence to break pluginsite redirect rules.
> It wasn't a big modification (and it was easy to rollblack), but
> unfortunately it tooks a lot of time to be detected.
> In order to avoid this situation to appear again in the futur, we need a
> better way to do kubernetes regression tests, and to improve downtime
> notification.

One thing that was interesting about this was that plugins.jenkins.io was
responding to requests but with a 404. I've added a monitor which should
hopefully help catch that in the future

I'm curious what kind of testing you think we could introduce into the Jenkins
Pipeline which would have caught this kind of issue?

- R. Tyler Croy

     Code: <https://github.com/rtyler>
  Chatter: <https://twitter.com/agentdero>
     xmpp: rtyler at jabber.org

  % gpg --keyserver keys.gnupg.net --recv-key 1426C7DC3F51E16F
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <http://lists.jenkins-ci.org/pipermail/jenkins-infra/attachments/20171130/3da113ac/attachment.asc>

More information about the Jenkins-infra mailing list