[Jenkins-infra] Confluence instability post-mortem

Kohsuke Kawaguchi kk at kohsuke.org
Tue Mar 24 15:14:42 UTC 2015


We actually have pagerduty service for getting alerts. It was triggered by
nagios, so it's currently inactive. Datadog has pagerduty integration, so
as a part of my experiment I brought back pagerduty integration --- if
Confluence or JIRA goes down, people will get paged.

To me, the main motivation for pushing monitoring to an external service is
that in that way there's one less thing for us to worry about, and let us
focus precious little brain time of the infra team to other more important
things. I think time is more important than money.

It is a lot nicer than nagios when it comes to comparing against historical
data, etc.


2015-03-24 8:07 GMT-07:00 Oleg Nenashev <o.v.nenashev at gmail.com>:

> // re-sending the e-mail to the mail list. The first one was to R. Tyler
> only
>
> PagerDuty and other such tools provide much value for big mission-critical
> installations, but Jenkins Infra is quite small.
> Probably, an advanced Nagios configuration could resolve the most of
> monitoring cases w/o external tools.
> BTW, somebody needs to monitor Nagios as well :(
>
> Do we have use-cases outside the alerting system and monitoring
> aggregations?
>
> Best regards,
> Oleg Nenashev
>
> 2015-03-24 18:01 GMT+03:00 Kohsuke Kawaguchi <kk at kohsuke.org>:
>
>> Great!!
>>
>> 2015-03-24 7:19 GMT-07:00 R. Tyler Croy <tyler at monkeypox.org>:
>>
>> (replies inline)
>>>
>>> On Mon, 23 Mar 2015, Kohsuke Kawaguchi wrote:
>>>
>>> > I've played a bit with datadog and now eggplant (jira&confluence) is
>>> > monitored through datadog with pagerduty integration, and I like it.
>>> >
>>> > There are really only just two servers that we want to monitor ---
>>> cucumber
>>> > & eggplant, and that just costs $30/month. I think it's a good use of
>>> our
>>> > money to help the infra work. Any thoughts?
>>> >
>>> > If anyone else wants to see the dashboard, I can add you to the
>>> "Jenkins"
>>> > org.
>>>
>>>
>>>
>>> I'll ping some folks at Datadog and see what they can do for us. We're
>>> using
>>> Datadog at Lookout with good success.
>>>
>>>
>>>
>>> - R. Tyler Croy
>>>
>>> ------------------------------------------------------
>>>      Code: <https://github.com/rtyler>
>>>   Chatter: <https://twitter.com/agentdero>
>>>
>>>   % gpg --keyserver keys.gnupg.net --recv-key 3F51E16F
>>> ------------------------------------------------------
>>>
>>
>>
>>
>> --
>> Kohsuke Kawaguchi
>>
>> _______________________________________________
>> Jenkins-infra mailing list
>> Jenkins-infra at lists.jenkins-ci.org
>> http://lists.jenkins-ci.org/mailman/listinfo/jenkins-infra
>>
>>
>
> _______________________________________________
> Jenkins-infra mailing list
> Jenkins-infra at lists.jenkins-ci.org
> http://lists.jenkins-ci.org/mailman/listinfo/jenkins-infra
>
>


-- 
Kohsuke Kawaguchi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.jenkins-ci.org/pipermail/jenkins-infra/attachments/20150324/0c665ec3/attachment-0001.html>


More information about the Jenkins-infra mailing list