[Jenkins-infra] Revisiting access control for anonymized "census data"

Kohsuke Kawaguchi kk at kohsuke.org
Wed Jun 8 15:14:19 UTC 2016


Given all the +1s from people who've actually looked & played with the
data, I feel much better.

On Tue, Jun 7, 2016 at 10:52 PM Vivek Pandey <vivek.pandey at gmail.com> wrote:

> To put anonymize do data in context, his is what I am referring to:
> https://wiki.jenkins-ci.org/display/JENKINS/Usage+Statistics
>
>
> Sent from my iPhone
>
> On Jun 7, 2016, at 6:32 PM, Andrew Bayer <andrew.bayer at gmail.com> wrote:
>
> Fwiw, the census data is just a subset of the full anonymized data - all
> we remove are instance reports with no jobs and instances that only report
> (with jobs) once in a month. The data for each record is the same.
>
> A.
>
> On Tuesday, June 7, 2016, Vivek Pandey <vivek.pandey at gmail.com> wrote:
>
>> The anonymized data looks pretty well anonymized, so do not see anything
>> in there that needs any confidentiality as far as ACL is concerned, and I
>> am fine with the proposed licensing.
>>
>> On Tue, Jun 7, 2016 at 5:10 PM, R. Tyler Croy <tyler at monkeypox.org>
>> wrote:
>>
>>> (replies inline)
>>>
>>> On Tue, 07 Jun 2016, Kohsuke Kawaguchi wrote:
>>>
>>> > I meant INFRA-682 for "anonymized data" and not for "monthly data"
>>> that's
>>> > currently in the /census. See
>>> > https://wiki.jenkins-ci.org/display/JENKINS/Usage+Statistics for what
>>> those
>>> > two terms mean. The former contains much richer data than the latter.
>>> Do
>>> > you still feel the same way with the anonymized data?
>>>
>>>
>>> Well, the title refers to "Write down process to request & grant access
>>> to
>>> census.jenkins.io"
>>>
>>> Only census.json.gz data is on census.jenkins.io.
>>>
>>> We do not have currently, nor have we ever had the infrastructure for
>>> serving
>>> up the raw anonymized access logs, but fundamentally I don't see access
>>> controls as necessary for that data either. I would rather we did not
>>> expand
>>> the scope of what's being discussed here however.
>>>
>>> > I agree that at this point access control on the monthly data seems
>>> largely
>>> > unneeded.
>>> >
>>> > Also, just for the record, one of the motivations for putting it
>>> behind the
>>> > access control wall is so that we can know who is looking at the data
>>> so
>>> > that we can encourage them to bring the results to the community.
>>>
>>>
>>> I vaguely recall that, unfortunately I don't think our hopes were ever
>>> realized
>>> there :/
>>>
>>> I hope/think the databsae licensing that I proposed would help encourage
>>> more
>>> open data mining and stats digging.
>>>
>>>
>>> - R. Tyler Croy
>>>
>>> ------------------------------------------------------
>>>      Code: <https://github.com/rtyler>
>>>   Chatter: <https://twitter.com/agentdero>
>>>
>>>   % gpg --keyserver keys.gnupg.net --recv-key 3F51E16F
>>> ------------------------------------------------------
>>>
>>> _______________________________________________
>>> Jenkins-infra mailing list
>>> Jenkins-infra at lists.jenkins-ci.org
>>> http://lists.jenkins-ci.org/mailman/listinfo/jenkins-infra
>>>
>>>
>> _______________________________________________
> Jenkins-infra mailing list
> Jenkins-infra at lists.jenkins-ci.org
> http://lists.jenkins-ci.org/mailman/listinfo/jenkins-infra
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.jenkins-ci.org/pipermail/jenkins-infra/attachments/20160608/5fc86052/attachment.html>


More information about the Jenkins-infra mailing list