[Jenkins-infra] Confluence, Nginx and 99 reasons Docker hates me: a report.

Larry Shatzer, Jr. larrys at gmail.com
Mon Jan 18 15:39:59 UTC 2016


And now it appears the wiki spam bot is dead, since there are probably
around a thousand spam pages, all ones that would normally be killed by it.

On Sun, Jan 17, 2016 at 8:43 PM, R. Tyler Croy <tyler at monkeypox.org> wrote:

>
> We had some wiki availability issues today, that were partially my fault
> and
> partially related to trying to bring "build13" of the docker confluence
> image
> into production.
>
> KK made the change earlier last week to disable LDAP caching but for some
> reason Docker wasn't pulling the new container properly. This is what I
> set out
> to fix about 6 hours ago.
>
>
> First, I discovered that newer versions of Docker had no problem pulling
> the
> docker container and we did not have consistent versions of Docker
> installed
> across our machines (1.5.0, 1.7.0 and 1.9.1 by my survey). With this
> commit[1]
> I ensured that we would have 1.9.1 consistently installed. This required
> some
> changes to the forked version of garethr-docker puppet module we use since
> it's
> been changed quite a bit to accomodate newer options in later Docker
> versions.
>
>
> COOL, surely that must have been the end of my day.
>
>
> Second, after rolling out the Docker changes the wiki became unavailable.
> Investigation led to two problems, one I have seen before with Docker a few
> times already in our infrastructure: stale IPTables routing rules. When
> Docker
> sets up its networking it will install some rules into a couple chains in
> the
> `filter` and `nat` tables, periodically it has failed to clean up these
> rules
> leading to requests not being routed between confluence-cache and
> confluence
> containers. The second problem I identified was that there was an internal
> IP
> address hard-coded for the confluence-cache container, which no longer
> existed,
> so naturally it wasn't finding the right confluence container. I addressed
> *that* with this[2] change.
>
>
> While debugging this, I noticed another cute behavior of docker with it's
> named
> containers support. Since we name our containers (e.g. `confluence`), the
> docker daemon will actually persist the tag and some of the options passed
> into
> the `docker run` invocation. I.e. `docker run -e SOME=foo --name
> bleepbloop rtyler/myimage`
> would persist the environment variable options (SOME=foo) until I stopped
> and
> removed the container (e.g. `docker rm bleepbloop`)
>
> To remedy this, I nuked all the previous incantations of named containers
> from
> the host running confluence. That finished, I could FINALLY run `build13`
> of
> the confluence container which had the LDAP cache setting change that KK
> made
> earlier. Bringing that up I discovered another issue..
>
>
> Third, lots of spammers and bots are regularly hitting the wiki which I
> suspected was causing confluence not to come online and stay online, so I
> made
> this commit[3] to deny those bots at the Apache proxy level (refresher,
> requests go: Apache (ssl termination) -> Nginx (cache) -> Confluence)
>
>
> All that said and done, it still does not appear that the current
> configuration
> of Confluence can sustain the traffic levels without LDAP caching enabled,
> so I
> unfortunately have pinned things back down to `build7`
>
>
>
> You may be asking yourself at this point of the email: "why is he writing
> all
> this out?" Welp, this is effectively what I spent my Sunday doing, and it
> would
> be a shame if nobody but me learned from this collosal waste of time. :)
>
>
> Anywho, that's that. Confluence is back online, and I'm probably not going
> to
> touch it for at least a few days, lest I go crazy.
>
>
> [1]
> https://github.com/jenkins-infra/jenkins-infra/commit/0107e79b0aa7b5bd9acd3d4d6b268c4178331beb
> [2]
> https://github.com/jenkins-infra/jenkins-infra/commit/f95c0e67803e9129c54a3f7fe8fce2940f7ad874
> [3]
> https://github.com/jenkins-infra/jenkins-infra/commit/675e4bdfc7bdd96b34046dc872f73f7f514e4e49
>
>
> Cheers
> - R. Tyler Croy
>
> ------------------------------------------------------
>      Code: <https://github.com/rtyler>
>   Chatter: <https://twitter.com/agentdero>
>
>   % gpg --keyserver keys.gnupg.net --recv-key 3F51E16F
> ------------------------------------------------------
>
> _______________________________________________
> Jenkins-infra mailing list
> Jenkins-infra at lists.jenkins-ci.org
> http://lists.jenkins-ci.org/mailman/listinfo/jenkins-infra
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.jenkins-ci.org/pipermail/jenkins-infra/attachments/20160118/192148c5/attachment.html>


More information about the Jenkins-infra mailing list