[Jenkins-infra] Postmortem [ci.jenkins.io, plugins.jenkins.io, wiki.jenkins.io] - 2019-12-17
Oleg Nenashev
o.v.nenashev at gmail.com
Wed Dec 18 16:47:45 UTC 2019
@Olivier. CI does not seem to be triggered in some cases. See
https://github.com/jenkins-infra/jenkins.io/pull/2722 or
https://github.com/jenkins-infra/jenkins.io/pull/2720
On Wed, Dec 18, 2019, 14:55 Marky Jackson <marky.r.jackson at gmail.com> wrote:
> I would be willing to contribute but the correct access will need to be
> granted for testing.
> Previous I wanted to onboard and help out but access was limited to much
> of the infrastructure and I ended up paying for my own infra to test and
> that became a burden financially.
> So if we can figure that out I can more then help given my knowledge in
> this area.
> Thanks kindly.
>
> On Dec 18, 2019, at 1:58 AM, Olblak <me at olblak.com> wrote:
>
> Hi,
>
> Before going into what went wrong, here some context
>
> Yesterday I was working on the Jenkins-infra/azure
> <https://github.com/jenkins-infra/azure/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Amerged+updated%3A2019-12-17+>
> where multiple things needed to be done
> Mainly updating DNS records related to INFRA-1797 including jenkins.io and
> more other things
> https://issues.jenkins-ci.org/browse/INFRA-1797
>
> What I missed while reviewing the changes, is that two months ago I
> manually update ci.jenkins.io VM size to have 32GB of RAM without
> changing the terraform code
> So yesterday ci.jenkins.io was downsized to 16GB by accident which leads
> to
> * The VM was restarted
> * The Jenkins process couldn't start because it didn't have enough memory
> available
> * plugins.jenkins.io stopped working because it depends on
> https://ci.jenkins.io/job/Infra/job/plugin-site-api/job/generate-data/lastSuccessfulBuild/artifact/plugins.json.gzip
>
> To fix this issue, I updated the terraform code and then re-applied it
>
> The second issue that happened at the same time is due to the way we
> define our DNS record.
> We use a 'hack' in terraform to use loops, Terraform doesn't correctly
> keep track of the different resources and so when we add/delete DNS record,
> it also delete and recreate other DNS records, and if for some reasons
> something goes wrong before the record is re-created, then we just lose
> that DNS record and this is what happened to wiki.jenkins.io
>
> So what could we do better
>
> * plugins.jenkins.io should generate his data on his own and not having
> strong dependencies on ci.jenkins.io
> I would be happy to discuss it with someone willing to contribute to
> that service.
> * DNS record, we have to test if the loop mechanism introduces in
> terraform 0.12 correctly handle the different resources generated based on
> an array
> * wiki.jenkins.io, we should get rid of that service
>
> Cheers
>
> --
> You received this message because you are subscribed to the Google Groups
> "Jenkins Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to jenkinsci-dev+unsubscribe at googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jenkinsci-dev/d6383938-16c3-4e1e-ae8f-b333efc7fb81%40www.fastmail.com
> <https://groups.google.com/d/msgid/jenkinsci-dev/d6383938-16c3-4e1e-ae8f-b333efc7fb81%40www.fastmail.com?utm_medium=email&utm_source=footer>
> .
>
>
> _______________________________________________
> Jenkins-infra mailing list
> Jenkins-infra at lists.jenkins-ci.org
> http://lists.jenkins-ci.org/mailman/listinfo/jenkins-infra
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.jenkins-ci.org/pipermail/jenkins-infra/attachments/20191218/84c92c13/attachment.html>
More information about the Jenkins-infra
mailing list