[Jenkins-infra] INFRA-240 is fixed / post-mortem

R. Tyler Croy tyler at monkeypox.org
Mon Feb 16 19:02:29 UTC 2015

(replies inline)

On Mon, 16 Feb 2015, Kohsuke Kawaguchi wrote:

> In the last 6 months or so, we've handed out infra acecss right to a few 
> more people (Daniel Beck and Oleg Nanoshev, IIRC), and that was good for 
> better time zone coverage and what not. But the problem still remains that 
> there is a leadership vacuum, that no one sufficiently "owns" the infra, 
> and that's difficult to solve by adding more hands alone.
> So here's what I'd like to propose:
>    - Formalize our ops team more by designating the lead that reports to 
>    the board. The lead shall be chosen in the discussion during the project 
>    meeting.
>    - Under the new lead, accept another round of ops team members to help 
>    spread the workload. I know for example Kostasya is interested in helping.
>    - Kohsuke (and Tyler if he can join) and the ops team will schedule a 
>    series of "transfer of information" sessions to bring the new ops lead and 
>    the team up to speed about how things are put together today.
>    - Identify and remove single-point-of-failure in our infra. Off the top 
>    of my head:
>       - I think I'm currently the only one who has the private key to sign 
>       update center root CA.
>       - jenkins-ci.org domain name still appears to be registered under 
>       Tyler's personal account.
> As the ops lead, I'd like the project to consider Adam Papai 
> <https://github.com/woohgit>. He's been a long time user of Jenkins and he 
> is a member of the CloudBees ops team. I'm sensitive to the fact that he 
> works for CloudBees and how that can come across, but OTOH this will be a 
> part of his day job, and I think that ensures that he can allocate 
> necessary time to the effort.

Since i've got a couple of real-world things consuming a boatload of my time, I
don't have any objections to Adam joining the infra team. I'm not sure I like
the term "ops lead" as I've never thought of there being a leadership structure
around our infrastructure so much as a steaming pile of JIRAs and not enough
people to tackle them :-P

I would suggest ramping Adam up in the following ways to mitigate some of our
current risk:

 * Documenting and migrating backend crawlers into the jenkins-infra GH
   organization. This is one of the places where I think we have a seriously
   low bus factor
 * Helping KostySha where I have failed, with feedback on this PR:
 * Drive migration of JIRA and Confluence onto the newer hardware and newer
   versions we've not been able to complete due to time

There's a long tail of other smaller projects, but in terms of our current
infra health and its affect on the project's continued growth and success, I
think those are the areas of most need.

See you chaps in #jenkins-infra

-R. Tyler Croy

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://lists.jenkins-ci.org/pipermail/jenkins-infra/attachments/20150216/1eb0d8ea/attachment.asc>

More information about the Jenkins-infra mailing list