[Jenkins-infra] cucumber outage

R. Tyler Croy tyler at monkeypox.org
Thu Jan 7 18:32:56 UTC 2016


(replies inline)

On Wed, 06 Jan 2016, Arnaud H?ritier wrote:

>   This morning I received some alerts about a full disk but there was no
> detail from which server.
>   This afternoon ldap crashed, the wiki was unavailable and I found that
> the full disk was / on cucumber

I'm sorry you had to deal with this, I suppose I don't wake up to 2am pages the
way I used to :-/


> aheritier at cucumber:~$ df -h
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda1             895G  850G     0 100% /
> none                  3.9G  216K  3.9G   1% /dev
> none                  3.9G     0  3.9G   0% /dev/shm
> none                  3.9G   75M  3.8G   2% /var/run
> none                  3.9G     0  3.9G   0% /var/lock
> none                  3.9G     0  3.9G   0% /lib/init/rw
> 
> Strangely Used < Size
> 
> I stopped jenkins service and remove 3Gb of logs because it was recording
> many exceptions about the no space left on device



Andrew, Daniel and I have been cleaning up more disk space on cucumber which
was previously wasted.

We did get disk usage alerts from Datadog but they only fire when disks hit the
200MB threshold which is clearly too close to the knife's edge. I've updated
the monitor in Datadog to alert at 1GB instead of 200MB.


Our log rotation needs to be updated to rotate and delete after some time
(https://issues.jenkins-ci.org/browse/INFRA-541)



I'm not sure there's much more to do at this time past the exploratory work
that abayer has already been doing trying to hunt unnecessary disk usages on
cucumber


- R. Tyler Croy

------------------------------------------------------
     Code: <https://github.com/rtyler>
  Chatter: <https://twitter.com/agentdero>

  % gpg --keyserver keys.gnupg.net --recv-key 3F51E16F
------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://lists.jenkins-ci.org/pipermail/jenkins-infra/attachments/20160107/21aedefa/attachment.asc>


More information about the Jenkins-infra mailing list