[Jenkins-infra] Confluence, Nginx and 99 reasons Docker hates me: a report.

Arnaud Héritier aheritier at gmail.com
Tue Jan 19 10:04:08 UTC 2016


The bot was fixed/restarted ?

On Mon, Jan 18, 2016 at 6:43 PM, Larry Shatzer, Jr. <larrys at gmail.com>
wrote:

> We need to get that running on a box more people have access to so we can
> kick it when it does this. It is becoming more critical to keep spam away.
> I do not want to spend all my day off deleting spam.
>
> On Mon, Jan 18, 2016 at 8:44 AM, R. Tyler Croy <tyler at monkeypox.org>
> wrote:
>
>> (replies inline)
>>
>> On Mon, 18 Jan 2016, Larry Shatzer, Jr. wrote:
>>
>> > And now it appears the wiki spam bot is dead, since there are probably
>> > around a thousand spam pages, all ones that would normally be killed by
>> it.
>>
>>
>> Due to access constraints, I didn't touch the backend spam bot at all for
>> what
>> it's worth. It may have cratered due to the connectivity issues though.
>> I'll
>> ping KK as soon as I see him online about it.
>>
>>
>> > On Sun, Jan 17, 2016 at 8:43 PM, R. Tyler Croy <tyler at monkeypox.org>
>> wrote:
>> >
>> > >
>> > > We had some wiki availability issues today, that were partially my
>> fault
>> > > and
>> > > partially related to trying to bring "build13" of the docker
>> confluence
>> > > image
>> > > into production.
>> > >
>> > > KK made the change earlier last week to disable LDAP caching but for
>> some
>> > > reason Docker wasn't pulling the new container properly. This is what
>> I
>> > > set out
>> > > to fix about 6 hours ago.
>> > >
>> > >
>> > > First, I discovered that newer versions of Docker had no problem
>> pulling
>> > > the
>> > > docker container and we did not have consistent versions of Docker
>> > > installed
>> > > across our machines (1.5.0, 1.7.0 and 1.9.1 by my survey). With this
>> > > commit[1]
>> > > I ensured that we would have 1.9.1 consistently installed. This
>> required
>> > > some
>> > > changes to the forked version of garethr-docker puppet module we use
>> since
>> > > it's
>> > > been changed quite a bit to accomodate newer options in later Docker
>> > > versions.
>> > >
>> > >
>> > > COOL, surely that must have been the end of my day.
>> > >
>> > >
>> > > Second, after rolling out the Docker changes the wiki became
>> unavailable.
>> > > Investigation led to two problems, one I have seen before with Docker
>> a few
>> > > times already in our infrastructure: stale IPTables routing rules.
>> When
>> > > Docker
>> > > sets up its networking it will install some rules into a couple
>> chains in
>> > > the
>> > > `filter` and `nat` tables, periodically it has failed to clean up
>> these
>> > > rules
>> > > leading to requests not being routed between confluence-cache and
>> > > confluence
>> > > containers. The second problem I identified was that there was an
>> internal
>> > > IP
>> > > address hard-coded for the confluence-cache container, which no longer
>> > > existed,
>> > > so naturally it wasn't finding the right confluence container. I
>> addressed
>> > > *that* with this[2] change.
>> > >
>> > >
>> > > While debugging this, I noticed another cute behavior of docker with
>> it's
>> > > named
>> > > containers support. Since we name our containers (e.g. `confluence`),
>> the
>> > > docker daemon will actually persist the tag and some of the options
>> passed
>> > > into
>> > > the `docker run` invocation. I.e. `docker run -e SOME=foo --name
>> > > bleepbloop rtyler/myimage`
>> > > would persist the environment variable options (SOME=foo) until I
>> stopped
>> > > and
>> > > removed the container (e.g. `docker rm bleepbloop`)
>> > >
>> > > To remedy this, I nuked all the previous incantations of named
>> containers
>> > > from
>> > > the host running confluence. That finished, I could FINALLY run
>> `build13`
>> > > of
>> > > the confluence container which had the LDAP cache setting change that
>> KK
>> > > made
>> > > earlier. Bringing that up I discovered another issue..
>> > >
>> > >
>> > > Third, lots of spammers and bots are regularly hitting the wiki which
>> I
>> > > suspected was causing confluence not to come online and stay online,
>> so I
>> > > made
>> > > this commit[3] to deny those bots at the Apache proxy level
>> (refresher,
>> > > requests go: Apache (ssl termination) -> Nginx (cache) -> Confluence)
>> > >
>> > >
>> > > All that said and done, it still does not appear that the current
>> > > configuration
>> > > of Confluence can sustain the traffic levels without LDAP caching
>> enabled,
>> > > so I
>> > > unfortunately have pinned things back down to `build7`
>> > >
>> > >
>> > >
>> > > You may be asking yourself at this point of the email: "why is he
>> writing
>> > > all
>> > > this out?" Welp, this is effectively what I spent my Sunday doing,
>> and it
>> > > would
>> > > be a shame if nobody but me learned from this collosal waste of time.
>> :)
>> > >
>> > >
>> > > Anywho, that's that. Confluence is back online, and I'm probably not
>> going
>> > > to
>> > > touch it for at least a few days, lest I go crazy.
>> > >
>> > >
>> > > [1]
>> > >
>> https://github.com/jenkins-infra/jenkins-infra/commit/0107e79b0aa7b5bd9acd3d4d6b268c4178331beb
>> > > [2]
>> > >
>> https://github.com/jenkins-infra/jenkins-infra/commit/f95c0e67803e9129c54a3f7fe8fce2940f7ad874
>> > > [3]
>> > >
>> https://github.com/jenkins-infra/jenkins-infra/commit/675e4bdfc7bdd96b34046dc872f73f7f514e4e49
>> > >
>> > >
>> > > Cheers
>> > > - R. Tyler Croy
>> > >
>> > > ------------------------------------------------------
>> > >      Code: <https://github.com/rtyler>
>> > >   Chatter: <https://twitter.com/agentdero>
>> > >
>> > >   % gpg --keyserver keys.gnupg.net --recv-key 3F51E16F
>> > > ------------------------------------------------------
>> > >
>> > > _______________________________________________
>> > > Jenkins-infra mailing list
>> > > Jenkins-infra at lists.jenkins-ci.org
>> > > http://lists.jenkins-ci.org/mailman/listinfo/jenkins-infra
>> > >
>> > >
>>
>> - R. Tyler Croy
>>
>> ------------------------------------------------------
>>      Code: <https://github.com/rtyler>
>>   Chatter: <https://twitter.com/agentdero>
>>
>>   % gpg --keyserver keys.gnupg.net --recv-key 3F51E16F
>> ------------------------------------------------------
>>
>
>
> _______________________________________________
> Jenkins-infra mailing list
> Jenkins-infra at lists.jenkins-ci.org
> http://lists.jenkins-ci.org/mailman/listinfo/jenkins-infra
>
>


-- 
-----
Arnaud Héritier
http://aheritier.net
Mail/GTalk: aheritier AT gmail DOT com
Twitter/Skype : aheritier
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.jenkins-ci.org/pipermail/jenkins-infra/attachments/20160119/5f2545d6/attachment.html>


More information about the Jenkins-infra mailing list