[Jenkins-infra] Postmortem [ci.jenkins.io, plugins.jenkins.io, wiki.jenkins.io] - 2019-12-17

Marky Jackson marky.r.jackson at gmail.com
Wed Dec 18 13:55:25 UTC 2019


I would be willing to contribute but the correct access will need to be granted for testing. 
Previous I wanted to onboard and help out but access was limited to much of the infrastructure and I ended up paying for my own infra to test and that became a burden financially.
So if we can figure that out I can more then help given my knowledge in this area.
Thanks kindly.

> On Dec 18, 2019, at 1:58 AM, Olblak <me at olblak.com> wrote:
> 
> Hi,
> 
> Before going into what went wrong, here some context
> 
> Yesterday I was working on the Jenkins-infra/azure <https://github.com/jenkins-infra/azure/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Amerged+updated%3A2019-12-17+> where multiple things needed to be done
> Mainly updating DNS records related to INFRA-1797 including jenkins.io <http://jenkins.io/> and more other things
> https://issues.jenkins-ci.org/browse/INFRA-1797 <https://issues.jenkins-ci.org/browse/INFRA-1797>
> 
> What I missed while reviewing the changes, is that two months ago I manually update ci.jenkins.io <http://ci.jenkins.io/> VM size to have 32GB of RAM without changing the terraform code
> So yesterday ci.jenkins.io <http://ci.jenkins.io/> was downsized to 16GB by accident which leads to
> * The VM was restarted
> * The Jenkins process couldn't start because it didn't have enough memory available
> * plugins.jenkins.io <http://plugins.jenkins.io/> stopped working because it depends on https://ci.jenkins.io/job/Infra/job/plugin-site-api/job/generate-data/lastSuccessfulBuild/artifact/plugins.json.gzip <https://ci.jenkins.io/job/Infra/job/plugin-site-api/job/generate-data/lastSuccessfulBuild/artifact/plugins.json.gzip>
> 
> To fix this issue, I updated the terraform code and then re-applied it
> 
> The second issue that happened at the same time is due to the way we define our DNS record.
> We use a 'hack' in terraform to use loops, Terraform doesn't correctly keep track of the different resources and so when we add/delete DNS record, it also delete and recreate other DNS records, and if for some reasons something goes wrong before the record is re-created, then we just lose that DNS record and this is what happened to wiki.jenkins.io <http://wiki.jenkins.io/>
> 
> So what could we do better
> 
> * plugins.jenkins.io <http://plugins.jenkins.io/> should generate his data on his own and not having strong dependencies on ci.jenkins.io <http://ci.jenkins.io/>
>   I would be happy to discuss it with someone willing to contribute to that service.
> * DNS record, we have to test if the loop mechanism introduces in terraform 0.12  correctly handle the different resources generated based on an array
> * wiki.jenkins.io <http://wiki.jenkins.io/>, we should get rid of that service
> 
> Cheers
> 
> -- 
> You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-dev+unsubscribe at googlegroups.com <mailto:jenkinsci-dev+unsubscribe at googlegroups.com>.
> To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/d6383938-16c3-4e1e-ae8f-b333efc7fb81%40www.fastmail.com <https://groups.google.com/d/msgid/jenkinsci-dev/d6383938-16c3-4e1e-ae8f-b333efc7fb81%40www.fastmail.com?utm_medium=email&utm_source=footer>.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.jenkins-ci.org/pipermail/jenkins-infra/attachments/20191218/a1259254/attachment-0001.html>


More information about the Jenkins-infra mailing list