[Jenkins-infra] cucumber outage

Arnaud Héritier aheritier at gmail.com
Thu Jan 7 18:50:03 UTC 2016


Hi

 no problem.
 Is it possible to update the monitor and PD ? to understand which server
has a problem ?

https://app.datadoghq.com/monitors#274883

  It's not critical now that Daniel show me this screen
https://app.datadoghq.com/dash/68504/free-disk-space?live=true&page=0&is_auto=false&from_ts=1452189005157&to_ts=1452192605157&tile_size=m

  But yesterday I had some difficulties to find that the outage was on
cucumber

  It was quicker today to find it was with spinash with Daniel's help

thx



On Thu, Jan 7, 2016 at 7:40 PM, Andrew Bayer <andrew.bayer at gmail.com> wrote:

> Log rotation is really the biggest thing - I pruned some space by making
> sure all the Jenkins jobs were set to discard old builds, cleared out a
> couple workspaces, etc. Well, that and /srv/nexus, which so far as I can
> tell is utterly pointless now.
>
> A.
>
> On Thu, Jan 7, 2016 at 10:32 AM, R. Tyler Croy <tyler at monkeypox.org>
> wrote:
>
>> (replies inline)
>>
>> On Wed, 06 Jan 2016, Arnaud H?ritier wrote:
>>
>> >   This morning I received some alerts about a full disk but there was no
>> > detail from which server.
>> >   This afternoon ldap crashed, the wiki was unavailable and I found that
>> > the full disk was / on cucumber
>>
>> I'm sorry you had to deal with this, I suppose I don't wake up to 2am
>> pages the
>> way I used to :-/
>>
>>
>> > aheritier at cucumber:~$ df -h
>> > Filesystem            Size  Used Avail Use% Mounted on
>> > /dev/sda1             895G  850G     0 100% /
>> > none                  3.9G  216K  3.9G   1% /dev
>> > none                  3.9G     0  3.9G   0% /dev/shm
>> > none                  3.9G   75M  3.8G   2% /var/run
>> > none                  3.9G     0  3.9G   0% /var/lock
>> > none                  3.9G     0  3.9G   0% /lib/init/rw
>> >
>> > Strangely Used < Size
>> >
>> > I stopped jenkins service and remove 3Gb of logs because it was
>> recording
>> > many exceptions about the no space left on device
>>
>>
>>
>> Andrew, Daniel and I have been cleaning up more disk space on cucumber
>> which
>> was previously wasted.
>>
>> We did get disk usage alerts from Datadog but they only fire when disks
>> hit the
>> 200MB threshold which is clearly too close to the knife's edge. I've
>> updated
>> the monitor in Datadog to alert at 1GB instead of 200MB.
>>
>>
>> Our log rotation needs to be updated to rotate and delete after some time
>> (https://issues.jenkins-ci.org/browse/INFRA-541)
>>
>>
>>
>> I'm not sure there's much more to do at this time past the exploratory
>> work
>> that abayer has already been doing trying to hunt unnecessary disk usages
>> on
>> cucumber
>>
>>
>> - R. Tyler Croy
>>
>> ------------------------------------------------------
>>      Code: <https://github.com/rtyler>
>>   Chatter: <https://twitter.com/agentdero>
>>
>>   % gpg --keyserver keys.gnupg.net --recv-key 3F51E16F
>> ------------------------------------------------------
>>
>> _______________________________________________
>> Jenkins-infra mailing list
>> Jenkins-infra at lists.jenkins-ci.org
>> http://lists.jenkins-ci.org/mailman/listinfo/jenkins-infra
>>
>>
>


-- 
-----
Arnaud Héritier
http://aheritier.net
Mail/GTalk: aheritier AT gmail DOT com
Twitter/Skype : aheritier
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.jenkins-ci.org/pipermail/jenkins-infra/attachments/20160107/d3469274/attachment.html>


More information about the Jenkins-infra mailing list