[Jenkins-infra] INFRA-896

Kohsuke Kawaguchi kk at kohsuke.org
Thu Aug 18 17:02:50 UTC 2016


That and 201606.json.gz, yes. So far I have only gotten to "anonymized data
<https://wiki.jenkins-ci.org/display/JENKINS/Usage+Statistics>" in our
taxonomy. The stats are still untouched.

On Thu, Aug 18, 2016 at 10:01 AM Andrew Bayer <andrew.bayer at gmail.com>
wrote:

> Also note that I think we have to delete the 201607.json.gz file to get it
> to rebuild.
>
> A.
>
> On Thu, Aug 18, 2016 at 9:59 AM, Kohsuke Kawaguchi <kk at kohsuke.org> wrote:
>
>> Andrew was right. This saga continues.
>>
>> I let the reprocessing happen over night, and this morning, I noticed
>> that the new data size is much smaller. I digged this further and
>> discovered that logs from old usage.jenkins-ci.org (cucumber) and new
>> usage.jenkins.io (which also now owns CNAME usage.jenkins-ci.org) are
>> overwriting each other in interesting ways because they have the same
>> 'access.log.YYYYMMDD*.gz' file names.
>>
>> Looking at the record, I believe usage.jenkins.io was created on July
>> 2nd, and apache access logs from cucumber was moved over to new node at
>> that time. The following two log files are from cucumber, and the rest of
>> the log files on usage.jenkins.io is new.
>>
>> root at usage:/srv/usage/INFRA-896# ls -la input/
>> -rw-r--r-- 1 root root  88611424 Aug 18 00:52 access.log.20160601000000.gz
>> -rw-r--r-- 1 root root  88365458 Aug 18 00:52 access.log.20160602000000.1.gz
>>
>> Cucumber continues to receive traffic until July 19th, at which point I
>> think usage.jenkins-ci.org CNAME moved over from cucumber to
>> usage.jenkins.io.
>>
>> root at cucumber:/var/log/apache2/usage.jenkins-ci.org#-rw-r--r--  1 root root 44119899 2016-07-18 19:59 access.log.20160718000000.gz
>> -rw-r--r--  1 root root 32642209 2016-07-19 19:30 access.log.20160719000000.gz
>> -rw-r--r--  1 root root    65562 2016-07-20 17:57 access.log.20160720000000.gz
>> -rw-r--r--  1 root root    23084 2016-07-21 17:19 access.log.20160721000000.gz
>> -rw-r--r--  1 root root    15763 2016-07-22 18:17 access.log.20160722000000.gz
>> -rw-r--r--  1 root root     1268 2016-07-23 10:11 access.log.20160723000000.gz
>> -rw-r--r--  1 root root     3280 2016-07-24 09:04 access.log.20160724000000.gz
>>
>> To avoid overwriting, I'm going to take cucumber logs from 6/3 to 7/19
>> and rename it to cucumber.log.*.gz and put that on the processing pipeline.
>>
>> On Wed, Aug 17, 2016 at 6:55 PM Kohsuke Kawaguchi <kk at kohsuke.org> wrote:
>>
>>> Fudging was done on usage.jenkins.io. I left the script and the record
>>> of it here <https://github.com/jenkins-infra/INFRA-896> for the sanity
>>> checking and re-processing later.
>>>
>>> jenkins-infra/infra-statistics:Jenkinsfile  did indeed require updates.
>>> I've fixed that well. I'll re-run the pipeline against this new data set
>>> and let's see what stats.jenkins.io would say.
>>>
>>> Andrew seemed to think there are some additional problems. So this might
>>> not be the end of it.
>>>
>>>
>>>
>>> On Wed, Aug 17, 2016 at 4:11 PM R. Tyler Croy <tyler at monkeypox.org>
>>> wrote:
>>>
>>>> (replies inline)
>>>>
>>>> On Wed, 17 Aug 2016, Kohsuke Kawaguchi wrote:
>>>>
>>>> > Probably during the migration to usage.jenkins.io, the access log
>>>> file
>>>> > split has changed from daily to weekly, which broke the rest of the
>>>> log
>>>> > processing pipelines.
>>>> >
>>>> > Andrew reported this problem independently yesteday in INFRA-896
>>>> > <https://issues.jenkins-ci.org/browse/INFRA-896>, so I'm fixing that
>>>> here
>>>> > <https://github.com/jenkins-infra/jenkins-infra/pull/558>. I'm also
>>>> going
>>>> > to retroactively fudge log files during the past 2 months to fix
>>>> stuff back
>>>> > in normal.
>>>> >
>>>> > Tyler suggested that we should rather fix the log processing pipeline
>>>> so
>>>> > that it doesn't make this assumption, but there are many and that's
>>>> harder
>>>> > to do.
>>>> >
>>>> > I'll make sure to save the originals just in case I mess up fudging.
>>>>
>>>>
>>>> On what host are you intending to fudge the log files? The originals
>>>> are on
>>>> usage.jenkins.io, but they are rsynced around to a Jenkins agent under
>>>> ci.jenkins.io which I *think* doesn't do an `rsync --delete` so we
>>>> might end pu
>>>> double-counting.
>>>>
>>>> Check the Jenkinsfile in jenkins-infra/infra-statistics for details on
>>>> that.
>>>>
>>>>
>>>>
>>>>
>>>> - R. Tyler Croy
>>>>
>>>> ------------------------------------------------------
>>>>      Code: <https://github.com/rtyler>
>>>>   Chatter: <https://twitter.com/agentdero>
>>>>
>>>>   % gpg --keyserver keys.gnupg.net --recv-key 1426C7DC3F51E16F
>>>> ------------------------------------------------------
>>>>
>>>
>> _______________________________________________
>> Jenkins-infra mailing list
>> Jenkins-infra at lists.jenkins-ci.org
>> http://lists.jenkins-ci.org/mailman/listinfo/jenkins-infra
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.jenkins-ci.org/pipermail/jenkins-infra/attachments/20160818/6b40fd21/attachment.html>


More information about the Jenkins-infra mailing list