[Jenkins-infra] cucumber outage

R. Tyler Croy tyler at monkeypox.org
Thu Jan 7 19:48:26 UTC 2016


(replies inline)

On Thu, 07 Jan 2016, Arnaud H?ritier wrote:

> Hi
> 
>  no problem.
>  Is it possible to update the monitor and PD ? to understand which server
> has a problem ?
> 
> https://app.datadoghq.com/monitors#274883
> 
>   It's not critical now that Daniel show me this screen
> https://app.datadoghq.com/dash/68504/free-disk-space?live=true&page=0&is_auto=false&from_ts=1452189005157&to_ts=1452192605157&tile_size=m


I find this dashboard to be more useful:
    https://app.datadoghq.com/dash/48569/disk-usage

That said, Datadog allows users to create their own dashboards so I recommend
poking around a bit, it's a pretty good tool! :)

> 
> On Thu, Jan 7, 2016 at 7:40 PM, Andrew Bayer <andrew.bayer at gmail.com> wrote:
> 
> > Log rotation is really the biggest thing - I pruned some space by making
> > sure all the Jenkins jobs were set to discard old builds, cleared out a
> > couple workspaces, etc. Well, that and /srv/nexus, which so far as I can
> > tell is utterly pointless now.
> >
> > A.
> >
> > On Thu, Jan 7, 2016 at 10:32 AM, R. Tyler Croy <tyler at monkeypox.org>
> > wrote:
> >
> >> (replies inline)
> >>
> >> On Wed, 06 Jan 2016, Arnaud H?ritier wrote:
> >>
> >> >   This morning I received some alerts about a full disk but there was no
> >> > detail from which server.
> >> >   This afternoon ldap crashed, the wiki was unavailable and I found that
> >> > the full disk was / on cucumber
> >>
> >> I'm sorry you had to deal with this, I suppose I don't wake up to 2am
> >> pages the
> >> way I used to :-/
> >>
> >>
> >> > aheritier at cucumber:~$ df -h
> >> > Filesystem            Size  Used Avail Use% Mounted on
> >> > /dev/sda1             895G  850G     0 100% /
> >> > none                  3.9G  216K  3.9G   1% /dev
> >> > none                  3.9G     0  3.9G   0% /dev/shm
> >> > none                  3.9G   75M  3.8G   2% /var/run
> >> > none                  3.9G     0  3.9G   0% /var/lock
> >> > none                  3.9G     0  3.9G   0% /lib/init/rw
> >> >
> >> > Strangely Used < Size
> >> >
> >> > I stopped jenkins service and remove 3Gb of logs because it was
> >> recording
> >> > many exceptions about the no space left on device
> >>
> >>
> >>
> >> Andrew, Daniel and I have been cleaning up more disk space on cucumber
> >> which
> >> was previously wasted.
> >>
> >> We did get disk usage alerts from Datadog but they only fire when disks
> >> hit the
> >> 200MB threshold which is clearly too close to the knife's edge. I've
> >> updated
> >> the monitor in Datadog to alert at 1GB instead of 200MB.
> >>
> >>
> >> Our log rotation needs to be updated to rotate and delete after some time
> >> (https://issues.jenkins-ci.org/browse/INFRA-541)
> >>
> >>
> >>
> >> I'm not sure there's much more to do at this time past the exploratory
> >> work
> >> that abayer has already been doing trying to hunt unnecessary disk usages
> >> on
> >> cucumber
> >>
> >>
> >> - R. Tyler Croy
> >>
> >> ------------------------------------------------------
> >>      Code: <https://github.com/rtyler>
> >>   Chatter: <https://twitter.com/agentdero>
> >>
> >>   % gpg --keyserver keys.gnupg.net --recv-key 3F51E16F
> >> ------------------------------------------------------
> >>
> >> _______________________________________________
> >> Jenkins-infra mailing list
> >> Jenkins-infra at lists.jenkins-ci.org
> >> http://lists.jenkins-ci.org/mailman/listinfo/jenkins-infra
> >>
> >>
> >
> 
> 
> -- 
> -----
> Arnaud Héritier
> http://aheritier.net
> Mail/GTalk: aheritier AT gmail DOT com
> Twitter/Skype : aheritier

- R. Tyler Croy

------------------------------------------------------
     Code: <https://github.com/rtyler>
  Chatter: <https://twitter.com/agentdero>

  % gpg --keyserver keys.gnupg.net --recv-key 3F51E16F
------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://lists.jenkins-ci.org/pipermail/jenkins-infra/attachments/20160107/996927c3/attachment-0001.asc>


More information about the Jenkins-infra mailing list