[Jenkins-infra] Postmortem [ci.jenkins.io, plugins.jenkins.io, wiki.jenkins.io] - 2019-12-17

Oleg Nenashev o.v.nenashev at gmail.com
Wed Dec 18 10:47:48 UTC 2019


+1 for making plugin site self-sufficient (read as: depends only on update
center). Wiki is being slowly migrated, including plugin docs and other
foundation documentation, contributions will be much appreciated.

Regarding the outage, it looks like ci.jenkins.io still does not build all
components (see changelog PRs for jenkins.io)


On Wed, Dec 18, 2019, 12:58 Olblak <me at olblak.com> wrote:

> Hi,
>
> Before going into what went wrong, here some context
>
> Yesterday I was working on the Jenkins-infra/azure
> <https://github.com/jenkins-infra/azure/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Amerged+updated%3A2019-12-17+>
> where multiple things needed to be done
> Mainly updating DNS records related to INFRA-1797 including jenkins.io
> and more other things
> https://issues.jenkins-ci.org/browse/INFRA-1797
>
> What I missed while reviewing the changes, is that two months ago I
> manually update ci.jenkins.io VM size to have 32GB of RAM without
> changing the terraform code
> So yesterday ci.jenkins.io was downsized to 16GB by accident which leads
> to
> * The VM was restarted
> * The Jenkins process couldn't start because it didn't have enough memory
> available
> * plugins.jenkins.io stopped working because it depends on
> https://ci.jenkins.io/job/Infra/job/plugin-site-api/job/generate-data/lastSuccessfulBuild/artifact/plugins.json.gzip
>
> To fix this issue, I updated the terraform code and then re-applied it
>
> The second issue that happened at the same time is due to the way we
> define our DNS record.
> We use a 'hack' in terraform to use loops, Terraform doesn't correctly
> keep track of the different resources and so when we add/delete DNS record,
> it also delete and recreate other DNS records, and if for some reasons
> something goes wrong before the record is re-created, then we just lose
> that DNS record and this is what happened to wiki.jenkins.io
>
> So what could we do better
>
> * plugins.jenkins.io should generate his data on his own and not having
> strong dependencies on ci.jenkins.io
>   I would be happy to discuss it with someone willing to contribute to
> that service.
> * DNS record, we have to test if the loop mechanism introduces in
> terraform 0.12  correctly handle the different resources generated based on
> an array
> * wiki.jenkins.io, we should get rid of that service
>
> Cheers
> _______________________________________________
> Jenkins-infra mailing list
> Jenkins-infra at lists.jenkins-ci.org
> http://lists.jenkins-ci.org/mailman/listinfo/jenkins-infra
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.jenkins-ci.org/pipermail/jenkins-infra/attachments/20191218/6d22b5f2/attachment.html>


More information about the Jenkins-infra mailing list