blog.dasrecht.net

With a min of max the opt!

Hacking for humanity: bag-cache.nrdy.ch


It’s been an interesting week so far. But I was saddened by the fact that the website of the Federal Office of Public Health in Switzerland was fighting and couldn’t withstand the traffic that was coming in. To be fair, I don’t have the slightest idea on how much traffic was coming in – but having run quite some big websites in the past few years I feel this issue could have been prevented. Needless to say that this is critical infrastructure and should be available. After a few Tweets Chregu setup a cache via Cloudfront and extended it later with an Nginx/Varnish combination. His mirror is available on https://bag.rokka.io/

I didn’t have too much time on Friday during the day to play around but had some ideas and always wanted to try out the Nginx proxy caching.

Enter bag-cache.nrdy.ch

It’s pretty simple but needed a few yak-shavy moments I used the nginx-cache from Paweł Mendelski as a starting point to get everything going. Currently, it caches every request for 10 minutes and tries to refresh content in the background and also trying to serve stale content if the cache can’t be updated after those 10 minutes. The core-config of the whole thing:

    location / {
      expires 30d;
      proxy_cache cache_zone;
      proxy_cache_lock on;
      proxy_cache_revalidate on;
      proxy_cache_background_update on;
      proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504;
      proxy_cache_valid 200 302 301 10m;
      proxy_cache_key $scheme://$host$request_uri;
      proxy_pass https://www.bag.admin.ch$request_uri;
      proxy_set_header User-Agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36 (never gonna give you up, never gonna let you down - bag-cache.nrdy.ch - operated by bastian@amazee.io)";
      proxy_set_header Host www.bag.admin.ch;
      proxy_set_header Accept-Encoding ""; # no compression allowed or next won't work
      sub_filter "<!-- begin: container -->" "<p style='width: 100%; height: 100px; padding: 30px;'> ⚠️ This is a cached version and non-authoritative mirror of <a href='https://www.bag.admin.ch'>www.bag.admin.ch</a> for faster access during demanding times. It should be up to date around 10 minutes of delay.</p><!-- begin: container -->";
      sub_filter "https://www.bag.admin.ch/" "https://bag-cache.nrdy.ch/";
      add_header X-Cached $upstream_cache_status;
      add_header X-Cache-Server "amazeeio/nginx-cache";
      add_header X-Robots-Tag "noindex, nofollow";
      add_header X-LAGOON $hostname always;
      proxy_ignore_headers "Set-Cookie";
      proxy_ignore_headers "Expires";
      add_header set-cookie "";
      proxy_hide_header "x-content-type-options";
      proxy_hide_header "Set-Cookie";
    }

Way too much time was spent on figuring out why the sub_filter of Nginx didn’t work. This was in the end because the sub_filter does not unpack compressed backend requests. Fixed that by setting proxy_set_header Accept-Encoding "";. Also trying to be a good citizen and letting them know in the User-Agent string who to reach out if they don’t like the traffic from my mirror. Plus I switched off access logging of the Mirror as I don’t care about who visits the site – I only see Hits/Misses and the URI that has been hit.

The whole thing is built into a Container and then spun up on amazee.io.

Code can be found on Github – After all, It’s not rocket science and far from being perfect – Just a little bit of elbow-grease and trying to handle caching gracefully. I hope it’s of use for anyone.