[Jenkins-infra] Postmortem [ci.jenkins.io, plugins.jenkins.io, wiki.jenkins.io] - 2019-12-17

Olblak me at olblak.com
Wed Dec 18 12:40:59 UTC 2019


> Regarding the outage, it looks like ci.jenkins.io still does not build all components (see changelog PRs for jenkins.io)

To me, it seems to be working fine, so If you an error feel free to share it


---
gpg --keyserver keys.gnupg.net --recv-key 52210D3D
---


On Wed, Dec 18, 2019, at 11:47 AM, Oleg Nenashev wrote:
> +1 for making plugin site self-sufficient (read as: depends only on update center). Wiki is being slowly migrated, including plugin docs and other foundation documentation, contributions will be much appreciated.
> 
> Regarding the outage, it looks like ci.jenkins.io still does not build all components (see changelog PRs for jenkins.io)
> 
> 
> On Wed, Dec 18, 2019, 12:58 Olblak <me at olblak.com> wrote:
>> __
>> Hi,
>> 
>> Before going into what went wrong, here some context
>> 
>> Yesterday I was working on the Jenkins-infra/azure <https://github.com/jenkins-infra/azure/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Amerged+updated%3A2019-12-17+> where multiple things needed to be done
>> Mainly updating DNS records related to INFRA-1797 including jenkins.io and more other things
>> https://issues.jenkins-ci.org/browse/INFRA-1797
>> 
>> What I missed while reviewing the changes, is that two months ago I manually update ci.jenkins.io VM size to have 32GB of RAM without changing the terraform code
>> So yesterday ci.jenkins.io was downsized to 16GB by accident which leads to
>> * The VM was restarted
>> * The Jenkins process couldn't start because it didn't have enough memory available
>> * plugins.jenkins.io stopped working because it depends on https://ci.jenkins.io/job/Infra/job/plugin-site-api/job/generate-data/lastSuccessfulBuild/artifact/plugins.json.gzip
>> 
>> To fix this issue, I updated the terraform code and then re-applied it
>> 
>> The second issue that happened at the same time is due to the way we define our DNS record.
>> We use a 'hack' in terraform to use loops, Terraform doesn't correctly keep track of the different resources and so when we add/delete DNS record, it also delete and recreate other DNS records, and if for some reasons something goes wrong before the record is re-created, then we just lose that DNS record and this is what happened to wiki.jenkins.io
>> 
>> So what could we do better
>> 
>> * plugins.jenkins.io should generate his data on his own and not having strong dependencies on ci.jenkins.io
>>  I would be happy to discuss it with someone willing to contribute to that service.
>> * DNS record, we have to test if the loop mechanism introduces in terraform 0.12 correctly handle the different resources generated based on an array
>> * wiki.jenkins.io, we should get rid of that service
>> 
>> Cheers
>> _______________________________________________
>>  Jenkins-infra mailing list
>> Jenkins-infra at lists.jenkins-ci.org
>> http://lists.jenkins-ci.org/mailman/listinfo/jenkins-infra
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.jenkins-ci.org/pipermail/jenkins-infra/attachments/20191218/59a5206e/attachment.html>


More information about the Jenkins-infra mailing list