Rate limiting for some git-related requests

TLDR: If you run into 503 errors when trying to git pull, try pulling less often.

I’ve noticed an unusual amount of git-related HTTP requests in nginx logs which caused a noticeable increase in overall traffic sent per month (each request is only ~16K but with the rate of 1 per 3-4 seconds per IP it stacks up quite quickly - going by webalizer this specific URL endpoint caused 88% of monthly hits and 36% overall bytes sent in April, however this could be legit git traffic, at least partially):

(ip redacted) - - [09/Apr/2017:08:46:00 +0300] "GET /gitlab/fox/tt-rss.git/info/refs?service=git-upload-pack HTTP/1.1" 200 13638 "-" "git/1.9.1"

Only three IP addresses originate the vast majority of the above traffic, one of those registered to Digital Ocean. Maybe they NAT all outbound traffic for their hosted sites under one address? Maybe some special person out there decided to git pull every second? Who knows, really.

Anyway, for the time being I’m implementing a rate limit if the following two conditions match: request URI contains “service=git-upload-pack” and user agent contains “git”.

Results so far:

Seems like on DigitalOcean all the downloads over IPv6 hit the limit. If you disable IPv6 connectivity you’re able to interact with the repository though.

# Disable IPv6 (from root)
echo 1 > /proc/sys/net/ipv6/conf/all/disable_ipv6
# Enable IPv6 (from root)
echo 0 > /proc/sys/net/ipv6/conf/all/disable_ipv6

I don’t know if the rate limiting still applies, but I was getting 50kbps download speeds cloning when my connection should support up to 50mbps.

This is my first checkout since I haven’t updated since the last tarball, took several minutes to download the whole 75MB repo

consider running git gc --aggressive on the bare repo in the server. it reduced the size locally to 29MB, which would make initial checkouts faster even when rate limited.
before

tt-rss2 (master)
$ du -h
35K     ./.git/hooks
1.0K    ./.git/info
1.0K    ./.git/logs/refs/heads
1.0K    ./.git/logs/refs/remotes/origin
1.0K    ./.git/logs/refs/remotes
2.0K    ./.git/logs/refs
3.0K    ./.git/logs
0       ./.git/objects/info
75M     ./.git/objects/pack
75M     ./.git/objects
1.0K    ./.git/refs/heads
1.0K    ./.git/refs/remotes/origin
1.0K    ./.git/refs/remotes
0       ./.git/refs/tags
2.0K    ./.git/refs
75M     ./.git

after

$ du -h
35K     ./.git/hooks
13K     ./.git/info
1.0K    ./.git/logs/refs/heads
1.0K    ./.git/logs/refs/remotes/origin
1.0K    ./.git/logs/refs/remotes
2.0K    ./.git/logs/refs
3.0K    ./.git/logs
4.0K    ./.git/objects/14
4.0K    ./.git/objects/4b
1.0K    ./.git/objects/6a
16K     ./.git/objects/6d
1.0K    ./.git/objects/8c
4.0K    ./.git/objects/b4
4.0K    ./.git/objects/be
4.0K    ./.git/objects/c9
4.0K    ./.git/objects/da
1.0K    ./.git/objects/dd
1.0K    ./.git/objects/f0
4.0K    ./.git/objects/f8
1.0K    ./.git/objects/info
29M     ./.git/objects/pack
29M     ./.git/objects
0       ./.git/refs/heads
1.0K    ./.git/refs/remotes/origin
1.0K    ./.git/refs/remotes
0       ./.git/refs/tags
1.0K    ./.git/refs
29M     ./.git

connection speed is not limited, only amount of requests per ip per minute

i don’t know which country you are posting from but it looks like your overseas traffic is shaped

I’m sure you’re right. comcast is a pain, but its my only option.