[Jenkins-infra] Post-mortem analysis of website outage yesterday

Kohsuke Kawaguchi kk at kohsuke.org
Fri Sep 21 16:17:42 UTC 2012


-------- Original Message --------
Subject: Post-mortem analysis (Re: fuck this)
Date: Thu, 20 Sep 2012 19:26:01 -0700
From: Kohsuke Kawaguchi <kk at kohsuke.org>
To: R. Tyler Croy <tyler at monkeypox.org>

So here is what I think has happened.

I think I run "sudo apt-get install apache2" on this system from
console (there's no record of this AFAICT but I'm pretty sure.)

What dpkg did in response can be seen in /var/log/dpkg.log.fiasco (I
copied the file into this name to save it.) You see that
libapache2-mod-php5 is getting removed without a newer version
installed. I don't know why it was removed instead of replaced, but my
best guess is that there was a change in the dependencies of packages
or something and that got apt-get confused.

This robbed Apache of PHP support, so at this point the top page
started serving /usr/share/drupal6/index.php as a static file. while
this file isn't exactly 1248 bytes (we've seen this magic number under
the transfer bytes in apache access log), I later experimented and
found out that when you serve this file from
http://javadoc.jenkins-ci.org/, apache records the transfer bytes as
1235, which is close enough to 1248. I assume the minor fractuation
comes from header difference or something.

While I was doing this, puppet must have run and started apache. When
I tried to kill haproxy and bring up apache to verify this hypothesis,
I noticed that apache is already running and serving requests to the
public.

I did "curl https://jenkins-ci.org/" and verified that it is index.php
that's being downloaded. I then installed libapache2-mod-php5,
reloaded apache, and that brought back the functionality.

Given the timing of the apache update and the records in the apache
access log, I think it's safe to conclude that this is my mess up, not
an intrusion. This is consistent with the guy who reported the issue
who saw the content of index.php.

The only mystery is the FLV file. I eventually watched it in my
Windows VirtualBox, but it was just 3 or 4 seconds of completely black
movie. The file is too small and the file just doesn't have enough
entropy in it that it's unlikely this is a trojan horse of some sort.
I'm not sure where that comes from, but it sounded like you had some
unusual connection setup, so I'm willing to believe that
jenkins-ci.org had never served it. This is also consistent with your
not finding this file anywhere in the file system nor the database on
cucumber.

So I take the full responsibility. My apologies. Here's hoping that a
round of beer would make up for this.

-- 
Kohsuke Kawaguchi





More information about the Jenkins-infra mailing list