[Jenkins-infra] Confluence, Nginx and 99 reasons Docker hates me: a report.

R. Tyler Croy tyler at monkeypox.org
Mon Jan 18 15:44:56 UTC 2016


(replies inline)

On Mon, 18 Jan 2016, Larry Shatzer, Jr. wrote:

> And now it appears the wiki spam bot is dead, since there are probably
> around a thousand spam pages, all ones that would normally be killed by it.


Due to access constraints, I didn't touch the backend spam bot at all for what
it's worth. It may have cratered due to the connectivity issues though. I'll
ping KK as soon as I see him online about it.


> On Sun, Jan 17, 2016 at 8:43 PM, R. Tyler Croy <tyler at monkeypox.org> wrote:
> 
> >
> > We had some wiki availability issues today, that were partially my fault
> > and
> > partially related to trying to bring "build13" of the docker confluence
> > image
> > into production.
> >
> > KK made the change earlier last week to disable LDAP caching but for some
> > reason Docker wasn't pulling the new container properly. This is what I
> > set out
> > to fix about 6 hours ago.
> >
> >
> > First, I discovered that newer versions of Docker had no problem pulling
> > the
> > docker container and we did not have consistent versions of Docker
> > installed
> > across our machines (1.5.0, 1.7.0 and 1.9.1 by my survey). With this
> > commit[1]
> > I ensured that we would have 1.9.1 consistently installed. This required
> > some
> > changes to the forked version of garethr-docker puppet module we use since
> > it's
> > been changed quite a bit to accomodate newer options in later Docker
> > versions.
> >
> >
> > COOL, surely that must have been the end of my day.
> >
> >
> > Second, after rolling out the Docker changes the wiki became unavailable.
> > Investigation led to two problems, one I have seen before with Docker a few
> > times already in our infrastructure: stale IPTables routing rules. When
> > Docker
> > sets up its networking it will install some rules into a couple chains in
> > the
> > `filter` and `nat` tables, periodically it has failed to clean up these
> > rules
> > leading to requests not being routed between confluence-cache and
> > confluence
> > containers. The second problem I identified was that there was an internal
> > IP
> > address hard-coded for the confluence-cache container, which no longer
> > existed,
> > so naturally it wasn't finding the right confluence container. I addressed
> > *that* with this[2] change.
> >
> >
> > While debugging this, I noticed another cute behavior of docker with it's
> > named
> > containers support. Since we name our containers (e.g. `confluence`), the
> > docker daemon will actually persist the tag and some of the options passed
> > into
> > the `docker run` invocation. I.e. `docker run -e SOME=foo --name
> > bleepbloop rtyler/myimage`
> > would persist the environment variable options (SOME=foo) until I stopped
> > and
> > removed the container (e.g. `docker rm bleepbloop`)
> >
> > To remedy this, I nuked all the previous incantations of named containers
> > from
> > the host running confluence. That finished, I could FINALLY run `build13`
> > of
> > the confluence container which had the LDAP cache setting change that KK
> > made
> > earlier. Bringing that up I discovered another issue..
> >
> >
> > Third, lots of spammers and bots are regularly hitting the wiki which I
> > suspected was causing confluence not to come online and stay online, so I
> > made
> > this commit[3] to deny those bots at the Apache proxy level (refresher,
> > requests go: Apache (ssl termination) -> Nginx (cache) -> Confluence)
> >
> >
> > All that said and done, it still does not appear that the current
> > configuration
> > of Confluence can sustain the traffic levels without LDAP caching enabled,
> > so I
> > unfortunately have pinned things back down to `build7`
> >
> >
> >
> > You may be asking yourself at this point of the email: "why is he writing
> > all
> > this out?" Welp, this is effectively what I spent my Sunday doing, and it
> > would
> > be a shame if nobody but me learned from this collosal waste of time. :)
> >
> >
> > Anywho, that's that. Confluence is back online, and I'm probably not going
> > to
> > touch it for at least a few days, lest I go crazy.
> >
> >
> > [1]
> > https://github.com/jenkins-infra/jenkins-infra/commit/0107e79b0aa7b5bd9acd3d4d6b268c4178331beb
> > [2]
> > https://github.com/jenkins-infra/jenkins-infra/commit/f95c0e67803e9129c54a3f7fe8fce2940f7ad874
> > [3]
> > https://github.com/jenkins-infra/jenkins-infra/commit/675e4bdfc7bdd96b34046dc872f73f7f514e4e49
> >
> >
> > Cheers
> > - R. Tyler Croy
> >
> > ------------------------------------------------------
> >      Code: <https://github.com/rtyler>
> >   Chatter: <https://twitter.com/agentdero>
> >
> >   % gpg --keyserver keys.gnupg.net --recv-key 3F51E16F
> > ------------------------------------------------------
> >
> > _______________________________________________
> > Jenkins-infra mailing list
> > Jenkins-infra at lists.jenkins-ci.org
> > http://lists.jenkins-ci.org/mailman/listinfo/jenkins-infra
> >
> >

- R. Tyler Croy

------------------------------------------------------
     Code: <https://github.com/rtyler>
  Chatter: <https://twitter.com/agentdero>

  % gpg --keyserver keys.gnupg.net --recv-key 3F51E16F
------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://lists.jenkins-ci.org/pipermail/jenkins-infra/attachments/20160118/877a982b/attachment.asc>


More information about the Jenkins-infra mailing list