[Jenkins-infra] Confluence outage post mortem
Larry Shatzer, Jr.
larrys at gmail.com
Mon Apr 6 03:52:06 UTC 2015
I didn't install anything. I tried to access the wiki sometime today but was having other network problems so I gave up.
<div>-------- Original message --------</div><div>From: Kohsuke Kawaguchi <kk at kohsuke.org> </div><div>Date:04/05/2015 9:42 PM (GMT-07:00) </div><div>To: infra at lists.jenkins-ci.org </div><div>Subject: [Jenkins-infra] Confluence outage post mortem </div><div>
</div>I think Daniel (or maybe someone else) reported this afternoon that Confluence was down.
I then discovered in Datadog that eggplant went inaccessible around 7:55am PT. This didn't raise a pager duty because I had monitoring incorrectly setup to stay silent if data doesn't come (I've fixed this problem since then.)
eggplant was responding to ping, and SSH connections were accepted, but SSH wasn't doing handshake. I'm not sure exactly what happened to that box, but I've filed OSUOSL support ticket to reset the machine.
Once the machine came backup, I noticed that memory footprint of Confluence is lower than the normal level, and it's just not writing as much data as it normally does (thanks Datadog!) In browser, the response was indeed bit slower, but I was still able to see pages OK.
I've only realized much later that Confluence was actually not responding. Instead, it's the caching layers that were serving all the requests it can handle, which includes Wiki pages and static resources, hence the browser appeared to be loading pages.
Confluence was not responding because somebody (probably Larry) has installed Tomcat manager app, and this was trying to verify its plain-text LDAP connection to ldap.jenkins-ci.org, which was failing. We've disabled this for security reasons a week or so ago, and I didn't realize that would fail Confluence from starting, as it didn't affect a running Confluence instance.
--
Kohsuke Kawaguchi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.jenkins-ci.org/pipermail/jenkins-infra/attachments/20150405/5f45a921/attachment.html>
More information about the Jenkins-infra
mailing list