[Jenkins-infra] Confluence, Nginx and 99 reasons Docker hates me: a report.

Larry Shatzer, Jr. larrys at gmail.com
Tue Jan 19 15:32:28 UTC 2016


It appears to have been restarted. They are back on us with a vengeance,
and finding ways around this filter.

We need a better filter, on page creation side, as well as account creation
side.

On Tue, Jan 19, 2016 at 3:04 AM, Arnaud Héritier <aheritier at gmail.com>
wrote:

> The bot was fixed/restarted ?
>
> On Mon, Jan 18, 2016 at 6:43 PM, Larry Shatzer, Jr. <larrys at gmail.com>
> wrote:
>
>> We need to get that running on a box more people have access to so we can
>> kick it when it does this. It is becoming more critical to keep spam away.
>> I do not want to spend all my day off deleting spam.
>>
>> On Mon, Jan 18, 2016 at 8:44 AM, R. Tyler Croy <tyler at monkeypox.org>
>> wrote:
>>
>>> (replies inline)
>>>
>>> On Mon, 18 Jan 2016, Larry Shatzer, Jr. wrote:
>>>
>>> > And now it appears the wiki spam bot is dead, since there are probably
>>> > around a thousand spam pages, all ones that would normally be killed
>>> by it.
>>>
>>>
>>> Due to access constraints, I didn't touch the backend spam bot at all
>>> for what
>>> it's worth. It may have cratered due to the connectivity issues though.
>>> I'll
>>> ping KK as soon as I see him online about it.
>>>
>>>
>>> > On Sun, Jan 17, 2016 at 8:43 PM, R. Tyler Croy <tyler at monkeypox.org>
>>> wrote:
>>> >
>>> > >
>>> > > We had some wiki availability issues today, that were partially my
>>> fault
>>> > > and
>>> > > partially related to trying to bring "build13" of the docker
>>> confluence
>>> > > image
>>> > > into production.
>>> > >
>>> > > KK made the change earlier last week to disable LDAP caching but for
>>> some
>>> > > reason Docker wasn't pulling the new container properly. This is
>>> what I
>>> > > set out
>>> > > to fix about 6 hours ago.
>>> > >
>>> > >
>>> > > First, I discovered that newer versions of Docker had no problem
>>> pulling
>>> > > the
>>> > > docker container and we did not have consistent versions of Docker
>>> > > installed
>>> > > across our machines (1.5.0, 1.7.0 and 1.9.1 by my survey). With this
>>> > > commit[1]
>>> > > I ensured that we would have 1.9.1 consistently installed. This
>>> required
>>> > > some
>>> > > changes to the forked version of garethr-docker puppet module we use
>>> since
>>> > > it's
>>> > > been changed quite a bit to accomodate newer options in later Docker
>>> > > versions.
>>> > >
>>> > >
>>> > > COOL, surely that must have been the end of my day.
>>> > >
>>> > >
>>> > > Second, after rolling out the Docker changes the wiki became
>>> unavailable.
>>> > > Investigation led to two problems, one I have seen before with
>>> Docker a few
>>> > > times already in our infrastructure: stale IPTables routing rules.
>>> When
>>> > > Docker
>>> > > sets up its networking it will install some rules into a couple
>>> chains in
>>> > > the
>>> > > `filter` and `nat` tables, periodically it has failed to clean up
>>> these
>>> > > rules
>>> > > leading to requests not being routed between confluence-cache and
>>> > > confluence
>>> > > containers. The second problem I identified was that there was an
>>> internal
>>> > > IP
>>> > > address hard-coded for the confluence-cache container, which no
>>> longer
>>> > > existed,
>>> > > so naturally it wasn't finding the right confluence container. I
>>> addressed
>>> > > *that* with this[2] change.
>>> > >
>>> > >
>>> > > While debugging this, I noticed another cute behavior of docker with
>>> it's
>>> > > named
>>> > > containers support. Since we name our containers (e.g.
>>> `confluence`), the
>>> > > docker daemon will actually persist the tag and some of the options
>>> passed
>>> > > into
>>> > > the `docker run` invocation. I.e. `docker run -e SOME=foo --name
>>> > > bleepbloop rtyler/myimage`
>>> > > would persist the environment variable options (SOME=foo) until I
>>> stopped
>>> > > and
>>> > > removed the container (e.g. `docker rm bleepbloop`)
>>> > >
>>> > > To remedy this, I nuked all the previous incantations of named
>>> containers
>>> > > from
>>> > > the host running confluence. That finished, I could FINALLY run
>>> `build13`
>>> > > of
>>> > > the confluence container which had the LDAP cache setting change
>>> that KK
>>> > > made
>>> > > earlier. Bringing that up I discovered another issue..
>>> > >
>>> > >
>>> > > Third, lots of spammers and bots are regularly hitting the wiki
>>> which I
>>> > > suspected was causing confluence not to come online and stay online,
>>> so I
>>> > > made
>>> > > this commit[3] to deny those bots at the Apache proxy level
>>> (refresher,
>>> > > requests go: Apache (ssl termination) -> Nginx (cache) -> Confluence)
>>> > >
>>> > >
>>> > > All that said and done, it still does not appear that the current
>>> > > configuration
>>> > > of Confluence can sustain the traffic levels without LDAP caching
>>> enabled,
>>> > > so I
>>> > > unfortunately have pinned things back down to `build7`
>>> > >
>>> > >
>>> > >
>>> > > You may be asking yourself at this point of the email: "why is he
>>> writing
>>> > > all
>>> > > this out?" Welp, this is effectively what I spent my Sunday doing,
>>> and it
>>> > > would
>>> > > be a shame if nobody but me learned from this collosal waste of
>>> time. :)
>>> > >
>>> > >
>>> > > Anywho, that's that. Confluence is back online, and I'm probably not
>>> going
>>> > > to
>>> > > touch it for at least a few days, lest I go crazy.
>>> > >
>>> > >
>>> > > [1]
>>> > >
>>> https://github.com/jenkins-infra/jenkins-infra/commit/0107e79b0aa7b5bd9acd3d4d6b268c4178331beb
>>> > > [2]
>>> > >
>>> https://github.com/jenkins-infra/jenkins-infra/commit/f95c0e67803e9129c54a3f7fe8fce2940f7ad874
>>> > > [3]
>>> > >
>>> https://github.com/jenkins-infra/jenkins-infra/commit/675e4bdfc7bdd96b34046dc872f73f7f514e4e49
>>> > >
>>> > >
>>> > > Cheers
>>> > > - R. Tyler Croy
>>> > >
>>> > > ------------------------------------------------------
>>> > >      Code: <https://github.com/rtyler>
>>> > >   Chatter: <https://twitter.com/agentdero>
>>> > >
>>> > >   % gpg --keyserver keys.gnupg.net --recv-key 3F51E16F
>>> > > ------------------------------------------------------
>>> > >
>>> > > _______________________________________________
>>> > > Jenkins-infra mailing list
>>> > > Jenkins-infra at lists.jenkins-ci.org
>>> > > http://lists.jenkins-ci.org/mailman/listinfo/jenkins-infra
>>> > >
>>> > >
>>>
>>> - R. Tyler Croy
>>>
>>> ------------------------------------------------------
>>>      Code: <https://github.com/rtyler>
>>>   Chatter: <https://twitter.com/agentdero>
>>>
>>>   % gpg --keyserver keys.gnupg.net --recv-key 3F51E16F
>>> ------------------------------------------------------
>>>
>>
>>
>> _______________________________________________
>> Jenkins-infra mailing list
>> Jenkins-infra at lists.jenkins-ci.org
>> http://lists.jenkins-ci.org/mailman/listinfo/jenkins-infra
>>
>>
>
>
> --
> -----
> Arnaud Héritier
> http://aheritier.net
> Mail/GTalk: aheritier AT gmail DOT com
> Twitter/Skype : aheritier
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.jenkins-ci.org/pipermail/jenkins-infra/attachments/20160119/f9da06af/attachment.html>


More information about the Jenkins-infra mailing list