[Jenkins-infra] Cucumber bandwidth situation
Kohsuke Kawaguchi
kkawaguchi at cloudbees.com
Mon Jun 2 23:52:42 UTC 2014
More progress.
When the last overage happened we started using mod_logio to record
total bytes in/out per request [1]. So I looked at which virtual host is
eating bandwidth, and the result was quite surprising.
Looking at the # of bytes transferred outbound per virtual host per
week, I get this.
> ci 489,290,628,016
> main 12,203,528,309
> maven 7,378,839,415
> pkg 8,105,888,873
> svn 937,561,148
> updates 17,900,766,883
So ci.jenkins-ci.org has served 490GB/week (!) of data. That's clearly
too much.
So I looked at its access log and noticed that 64.125.71.142 is
downloading
/job/jenkins_rc_branch/lastSuccessfulBuild/artifact/war/target/jenkins.war
over and over.
This file is about 68MB in size, and a download happens every minute.
That's about 95GB/day, or 2.8TB/month if extrapolated.
THIS GUY SINGLEHANDEDLY CONSUMED MORE THAN HALF THE BANDWIDTH!
The actual download counts per week is as follows:
Apr,03 10165
Apr,10 10171
Apr,17 7426
Apr,24 10180
May,1 11203
May,8 10226
May,15 8975
May,22 10140
There are occasional gaps (perhaps ci.jenkins-ci.org was down, or
there's some pause in what he does), but still clearly this amounts to a
ridiculous amount of network transfer.
I'm going to ban this IP in apache config to stop this immediately.
Going forward, we need a better approach to this than waiting for the
overage to slap our face before noticing the problem. For that I think
we need to have some monitoring in place to alert us in case of sudden
traffic increase.
Also, as you see in the log above, this didn't start in May, so it
doesn't explain the sudden increase in the traffic of May. So we've got
still more analysis to do.
[1] http://httpd.apache.org/docs/2.2/mod/mod_logio.html
On 06/02/2014 03:39 PM, Kohsuke Kawaguchi wrote:
>
> I looked at output from haproxy
> http://kohsuke.org/private/20140602/haproxy-stats.png, and this shows a
> large amount of activities under "maven", which is
> http://maven.jenkins-ci.org/ that acts as a reverse proxy to
> repo.jenkins-ci.org.
>
> If you look under "bytes out", of 6.8TB that has been served for the
> duration of the haproxy uptime, 1TB is from maven.jenkins-ci.org.
>
> This is surprising because all the download traffic for Maven repository
> should be served through http://repo.jenkins-ci.org/
>
> I need to look into this a bit more.
>
>
> OTOH, note that this shows only the cumulative value. I recorded the
> value 2.5 hours later, and the delta for maven was 169MB (extrapolates
> to 50GB/month) whereas delta for overall is 9.6GB (extrapolates to
> 2.7TB/month.) So it's nowhere near big enough to explain the
> 1.45TB/month usage spike in May.
>
> In addition, the traffic to maven.jenkins-ci.org is reverse-proxied to
> repo.jenkins-ci.org, and so if this accounts for the traffic increase,
> it should show up in the corresponding increase in the RX side. There's
> no such spike in the RX side.
>
> I'll continue digging...
>
>
> On 06/02/2014 11:15 AM, Kohsuke Kawaguchi wrote:
>>
>> Tyler got a surprisingly bill for the overage charge for cucumber, which
>> runs "jenkins-ci.org", "mirrors.jenkins-ci.org" and a number of other
>> virtual hosts.
>>
>> I think I've set up vnstat when it happened the last time to track
>> utlization. Here is the vnstat output.
>>
>>> kohsuke at cucumber:~$ vnstat -m
>>>
>>> eth0 / monthly
>>>
>>> month rx | tx | total | avg. rate
>>> ------------------------+-------------+-------------+---------------
>>> Jul '13 338.35 GiB | 3.21 TiB | 3.54 TiB | 11.36 Mbit/s
>>> Aug '13 338.36 GiB | 2.86 TiB | 3.19 TiB | 10.23 Mbit/s
>>> Sep '13 354.39 GiB | 3.52 TiB | 3.87 TiB | 12.82 Mbit/s
>>> Oct '13 395.09 GiB | 4.20 TiB | 4.59 TiB | 14.72 Mbit/s
>>> Nov '13 449.73 GiB | 3.51 TiB | 3.94 TiB | 13.07 Mbit/s
>>> Dec '13 562.26 GiB | 3.68 TiB | 4.23 TiB | 13.56 Mbit/s
>>> Jan '14 672.19 GiB | 3.91 TiB | 4.56 TiB | 14.64 Mbit/s
>>> Feb '14 370.69 GiB | 3.13 TiB | 3.49 TiB | 12.39 Mbit/s
>>> Mar '14 351.83 GiB | 3.33 TiB | 3.67 TiB | 11.77 Mbit/s
>>> Apr '14 362.76 GiB | 3.39 TiB | 3.74 TiB | 12.40 Mbit/s
>>> May '14 401.56 GiB | 4.80 TiB | 5.19 TiB | 16.65 Mbit/s
>>> Jun '14 20.11 GiB | 241.83 GiB | 261.94 GiB | 16.07 Mbit/s
>>> ------------------------+-------------+-------------+---------------
>>> estimated 381.25 GiB | 4.48 TiB | 4.85 TiB |
>>
>> As you see, the outbound traffic jumped in May.
>> Here's the daily output, and I think it means that new trend is
>> continuing in June so far as I can tell.
>>
>> In other words, we need to act on it ASAP to avoid another overage for June.
>>
>>
>>> kohsuke at cucumber:~$ vnstat -d
>>>
>>> eth0 / daily
>>>
>>> day rx | tx | total | avg. rate
>>> ------------------------+-------------+-------------+---------------
>>> 05/04/14 11.61 GiB | 151.50 GiB | 163.11 GiB | 15.84 Mbit/s
>>> 05/05/14 18.37 GiB | 167.82 GiB | 186.18 GiB | 18.08 Mbit/s
>>> 05/06/14 12.54 GiB | 176.53 GiB | 189.07 GiB | 18.36 Mbit/s
>>> 05/07/14 13.07 GiB | 169.65 GiB | 182.73 GiB | 17.74 Mbit/s
>>> 05/08/14 12.49 GiB | 152.46 GiB | 164.95 GiB | 16.01 Mbit/s
>>> 05/09/14 13.54 GiB | 167.91 GiB | 181.45 GiB | 17.62 Mbit/s
>>> 05/10/14 10.72 GiB | 149.50 GiB | 160.22 GiB | 15.56 Mbit/s
>>> 05/11/14 15.25 GiB | 141.75 GiB | 157.01 GiB | 15.24 Mbit/s
>>> 05/12/14 16.06 GiB | 168.05 GiB | 184.11 GiB | 17.88 Mbit/s
>>> 05/13/14 13.25 GiB | 144.43 GiB | 157.68 GiB | 15.31 Mbit/s
>>> 05/14/14 18.24 GiB | 160.37 GiB | 178.61 GiB | 17.34 Mbit/s
>>> 05/15/14 14.30 GiB | 154.11 GiB | 168.41 GiB | 16.35 Mbit/s
>>> 05/16/14 12.99 GiB | 153.21 GiB | 166.20 GiB | 16.14 Mbit/s
>>> 05/17/14 9.87 GiB | 127.64 GiB | 137.51 GiB | 13.35 Mbit/s
>>> 05/18/14 11.43 GiB | 186.48 GiB | 197.91 GiB | 19.22 Mbit/s
>>> 05/19/14 14.08 GiB | 171.26 GiB | 185.35 GiB | 18.00 Mbit/s
>>> 05/20/14 13.47 GiB | 149.67 GiB | 163.14 GiB | 15.84 Mbit/s
>>> 05/21/14 12.67 GiB | 150.21 GiB | 162.89 GiB | 15.81 Mbit/s
>>> 05/22/14 13.43 GiB | 168.17 GiB | 181.61 GiB | 17.63 Mbit/s
>>> 05/23/14 14.06 GiB | 165.47 GiB | 179.53 GiB | 17.43 Mbit/s
>>> 05/24/14 9.75 GiB | 132.61 GiB | 142.36 GiB | 13.82 Mbit/s
>>> 05/25/14 10.10 GiB | 131.50 GiB | 141.60 GiB | 13.75 Mbit/s
>>> 05/26/14 13.81 GiB | 180.48 GiB | 194.28 GiB | 18.86 Mbit/s
>>> 05/27/14 14.51 GiB | 187.94 GiB | 202.45 GiB | 19.66 Mbit/s
>>> 05/28/14 12.76 GiB | 163.84 GiB | 176.60 GiB | 17.15 Mbit/s
>>> 05/29/14 11.95 GiB | 157.68 GiB | 169.63 GiB | 16.47 Mbit/s
>>> 05/30/14 11.62 GiB | 142.46 GiB | 154.08 GiB | 14.96 Mbit/s
>>> 05/31/14 9.97 GiB | 136.38 GiB | 146.35 GiB | 14.21 Mbit/s
>>> 06/01/14 11.49 GiB | 142.08 GiB | 153.56 GiB | 14.91 Mbit/s
>>> 06/02/14 8.62 GiB | 99.75 GiB | 108.37 GiB | 18.07 Mbit/s
>>> ------------------------+-------------+-------------+---------------
>>> estimated 14.82 GiB | 171.41 GiB | 186.22 GiB |
>>
>> I'm trying to get vnstat print out more daily data going back to April
>> and March, but I notice that the infra rehaul was April 29-May 2, so I'm
>> suspecting we'd changed something during that period to add more load.
>> In particular, I remember my shrinking disk consumption on OSUOSL
>> mirrors by removing old releases. I wonder if this somehow resulted in
>> the traffic increase.
>>
>>
>>
>
>
--
Kohsuke Kawaguchi | CloudBees, Inc. | http://cloudbees.com/
Try Jenkins Enterprise, our professional version of Jenkins
More information about the Jenkins-infra
mailing list