[Jenkins-infra] Postmortem [ci.jenkins.io, plugins.jenkins.io, wiki.jenkins.io] - 2019-12-17

Olblak me at olblak.com
Wed Dec 18 09:58:13 UTC 2019


Hi,

Before going into what went wrong, here some context

Yesterday I was working on the Jenkins-infra/azure <https://github.com/jenkins-infra/azure/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Amerged+updated%3A2019-12-17+> where multiple things needed to be done
Mainly updating DNS records related to INFRA-1797 including jenkins.io and more other things
https://issues.jenkins-ci.org/browse/INFRA-1797

What I missed while reviewing the changes, is that two months ago I manually update ci.jenkins.io VM size to have 32GB of RAM without changing the terraform code
So yesterday ci.jenkins.io was downsized to 16GB by accident which leads to
* The VM was restarted
* The Jenkins process couldn't start because it didn't have enough memory available
* plugins.jenkins.io stopped working because it depends on https://ci.jenkins.io/job/Infra/job/plugin-site-api/job/generate-data/lastSuccessfulBuild/artifact/plugins.json.gzip

To fix this issue, I updated the terraform code and then re-applied it

The second issue that happened at the same time is due to the way we define our DNS record.
We use a 'hack' in terraform to use loops, Terraform doesn't correctly keep track of the different resources and so when we add/delete DNS record, it also delete and recreate other DNS records, and if for some reasons something goes wrong before the record is re-created, then we just lose that DNS record and this is what happened to wiki.jenkins.io

So what could we do better

* plugins.jenkins.io should generate his data on his own and not having strong dependencies on ci.jenkins.io
 I would be happy to discuss it with someone willing to contribute to that service.
* DNS record, we have to test if the loop mechanism introduces in terraform 0.12 correctly handle the different resources generated based on an array
* wiki.jenkins.io, we should get rid of that service

Cheers
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.jenkins-ci.org/pipermail/jenkins-infra/attachments/20191218/7b8ba3a0/attachment.html>


More information about the Jenkins-infra mailing list