[Jenkins-infra] INFRA-896

Andrew Bayer andrew.bayer at gmail.com
Thu Aug 18 17:03:37 UTC 2016


Okiedokie.

A.

On Thu, Aug 18, 2016 at 10:02 AM, Kohsuke Kawaguchi <kk at kohsuke.org> wrote:

> That and 201606.json.gz, yes. So far I have only gotten to "anonymized
> data <https://wiki.jenkins-ci.org/display/JENKINS/Usage+Statistics>" in
> our taxonomy. The stats are still untouched.
>
> On Thu, Aug 18, 2016 at 10:01 AM Andrew Bayer <andrew.bayer at gmail.com>
> wrote:
>
>> Also note that I think we have to delete the 201607.json.gz file to get
>> it to rebuild.
>>
>> A.
>>
>> On Thu, Aug 18, 2016 at 9:59 AM, Kohsuke Kawaguchi <kk at kohsuke.org>
>> wrote:
>>
>>> Andrew was right. This saga continues.
>>>
>>> I let the reprocessing happen over night, and this morning, I noticed
>>> that the new data size is much smaller. I digged this further and
>>> discovered that logs from old usage.jenkins-ci.org (cucumber) and new
>>> usage.jenkins.io (which also now owns CNAME usage.jenkins-ci.org) are
>>> overwriting each other in interesting ways because they have the same
>>> 'access.log.YYYYMMDD*.gz' file names.
>>>
>>> Looking at the record, I believe usage.jenkins.io was created on July
>>> 2nd, and apache access logs from cucumber was moved over to new node at
>>> that time. The following two log files are from cucumber, and the rest of
>>> the log files on usage.jenkins.io is new.
>>>
>>> root at usage:/srv/usage/INFRA-896# ls -la input/
>>> -rw-r--r-- 1 root root  88611424 Aug 18 00:52 access.log.20160601000000.gz
>>> -rw-r--r-- 1 root root  88365458 Aug 18 00:52 access.log.20160602000000.1.gz
>>>
>>> Cucumber continues to receive traffic until July 19th, at which point I
>>> think usage.jenkins-ci.org CNAME moved over from cucumber to
>>> usage.jenkins.io.
>>>
>>> root at cucumber:/var/log/apache2/usage.jenkins-ci.org#-rw-r--r--  1 root root 44119899 2016-07-18 19:59 access.log.20160718000000.gz
>>> -rw-r--r--  1 root root 32642209 2016-07-19 19:30 access.log.20160719000000.gz
>>> -rw-r--r--  1 root root    65562 2016-07-20 17:57 access.log.20160720000000.gz
>>> -rw-r--r--  1 root root    23084 2016-07-21 17:19 access.log.20160721000000.gz
>>> -rw-r--r--  1 root root    15763 2016-07-22 18:17 access.log.20160722000000.gz
>>> -rw-r--r--  1 root root     1268 2016-07-23 10:11 access.log.20160723000000.gz
>>> -rw-r--r--  1 root root     3280 2016-07-24 09:04 access.log.20160724000000.gz
>>>
>>> To avoid overwriting, I'm going to take cucumber logs from 6/3 to 7/19
>>> and rename it to cucumber.log.*.gz and put that on the processing pipeline.
>>>
>>> On Wed, Aug 17, 2016 at 6:55 PM Kohsuke Kawaguchi <kk at kohsuke.org>
>>> wrote:
>>>
>>>> Fudging was done on usage.jenkins.io. I left the script and the record
>>>> of it here <https://github.com/jenkins-infra/INFRA-896> for the sanity
>>>> checking and re-processing later.
>>>>
>>>> jenkins-infra/infra-statistics:Jenkinsfile  did indeed require
>>>> updates. I've fixed that well. I'll re-run the pipeline against this new
>>>> data set and let's see what stats.jenkins.io would say.
>>>>
>>>> Andrew seemed to think there are some additional problems. So this
>>>> might not be the end of it.
>>>>
>>>>
>>>>
>>>> On Wed, Aug 17, 2016 at 4:11 PM R. Tyler Croy <tyler at monkeypox.org>
>>>> wrote:
>>>>
>>>>> (replies inline)
>>>>>
>>>>> On Wed, 17 Aug 2016, Kohsuke Kawaguchi wrote:
>>>>>
>>>>> > Probably during the migration to usage.jenkins.io, the access log
>>>>> file
>>>>> > split has changed from daily to weekly, which broke the rest of the
>>>>> log
>>>>> > processing pipelines.
>>>>> >
>>>>> > Andrew reported this problem independently yesteday in INFRA-896
>>>>> > <https://issues.jenkins-ci.org/browse/INFRA-896>, so I'm fixing
>>>>> that here
>>>>> > <https://github.com/jenkins-infra/jenkins-infra/pull/558>. I'm also
>>>>> going
>>>>> > to retroactively fudge log files during the past 2 months to fix
>>>>> stuff back
>>>>> > in normal.
>>>>> >
>>>>> > Tyler suggested that we should rather fix the log processing
>>>>> pipeline so
>>>>> > that it doesn't make this assumption, but there are many and that's
>>>>> harder
>>>>> > to do.
>>>>> >
>>>>> > I'll make sure to save the originals just in case I mess up fudging.
>>>>>
>>>>>
>>>>> On what host are you intending to fudge the log files? The originals
>>>>> are on
>>>>> usage.jenkins.io, but they are rsynced around to a Jenkins agent under
>>>>> ci.jenkins.io which I *think* doesn't do an `rsync --delete` so we
>>>>> might end pu
>>>>> double-counting.
>>>>>
>>>>> Check the Jenkinsfile in jenkins-infra/infra-statistics for details on
>>>>> that.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> - R. Tyler Croy
>>>>>
>>>>> ------------------------------------------------------
>>>>>      Code: <https://github.com/rtyler>
>>>>>   Chatter: <https://twitter.com/agentdero>
>>>>>
>>>>>   % gpg --keyserver keys.gnupg.net --recv-key 1426C7DC3F51E16F
>>>>> ------------------------------------------------------
>>>>>
>>>>
>>> _______________________________________________
>>> Jenkins-infra mailing list
>>> Jenkins-infra at lists.jenkins-ci.org
>>> http://lists.jenkins-ci.org/mailman/listinfo/jenkins-infra
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.jenkins-ci.org/pipermail/jenkins-infra/attachments/20160818/1e864316/attachment-0001.html>


More information about the Jenkins-infra mailing list