[Jenkins-infra] Confluence outage post mortem

Kohsuke Kawaguchi kk at kohsuke.org
Mon Apr 6 04:03:58 UTC 2015


All right, I'm sorry for wrong attribution! Then I must have installed it
long time ago.

Should we just remove Tomcat manager app? This increases the attack
surface, so if we are not using it I think it's better to remove it.

2015-04-05 20:52 GMT-07:00 Larry Shatzer, Jr. <larrys at gmail.com>:

> I didn't install anything. I tried to access the wiki sometime today but
> was having other network problems so I gave up.
>
>
> -------- Original message --------
> From: Kohsuke Kawaguchi
> Date:04/05/2015 9:42 PM (GMT-07:00)
> To: infra at lists.jenkins-ci.org
> Subject: [Jenkins-infra] Confluence outage post mortem
>
> I think Daniel (or maybe someone else) reported this afternoon that
> Confluence was down.
>
> I then discovered in Datadog that eggplant went inaccessible around 7:55am
> PT. This didn't raise a pager duty because I had monitoring incorrectly
> setup to stay silent if data doesn't come (I've fixed this problem since
> then.)
>
> eggplant was responding to ping, and SSH connections were accepted, but
> SSH wasn't doing handshake. I'm not sure exactly what happened to that box,
> but I've filed OSUOSL support ticket to reset the machine.
>
> Once the machine came backup, I noticed that memory footprint of
> Confluence is lower than the normal level, and it's just not writing as
> much data as it normally does (thanks Datadog!) In browser, the response
> was indeed bit slower, but I was still able to see pages OK.
>
> I've only realized much later that Confluence was actually not responding.
> Instead, it's the caching layers that were serving all the requests it can
> handle, which includes Wiki pages and static resources, hence the browser
> appeared to be loading pages.
>
> Confluence was not responding because somebody (probably Larry) has
> installed Tomcat manager app, and this was trying to verify its plain-text
> LDAP connection to ldap.jenkins-ci.org, which was failing. We've disabled
> this for security reasons a week or so ago, and I didn't realize that would
> fail Confluence from starting, as it didn't affect a running Confluence
> instance.
>
> --
> Kohsuke Kawaguchi
>



-- 
Kohsuke Kawaguchi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.jenkins-ci.org/pipermail/jenkins-infra/attachments/20150405/1796f26a/attachment-0001.html>


More information about the Jenkins-infra mailing list