Forem: Ján Regeš

How to build a CDN (3/3): security, monitoring and practical tips

Ján Regeš — Wed, 10 Jan 2024 12:46:02 +0000

In the first two articles you learned what components you can build a CDN from and how to set up servers and reverse proxies (CDN cache).

In the third and last article of this series, we would like to add tips and recommendations on how to secure your own CDN, protect it from attacks, how to monitor it or how to develop it further.

In the very end you will find various interesting facts or experiences that implementing our own CDN has taught us as well as some off-topic information. At the same time, I apologize for the very late publication of this third article. Some of the information given in the conclusion is not completely up-to-date, but I believe it's useful. I wish you a pleasant reading :)

Security

Since your reverse proxy will also forward attacker requests to Origin servers, deploy one of the available WAF (Web Application Firewall) to reject obvious attacker requests outright and not send them to Origin unnecessarily. In many situations, this can also prevent a cache poisoning attack. If you choose Nginx, we recommend ModSecurity, or Nemesida WAF. Even their basic OWASP TOP 10 rule sets will do a good service. The downside with Nemesida is that it also needs RabbitMQ to run, but the upside is that it has a background process that continuously updates the rules according to a maintained database of known vulnerabilities.
If you also want to send file types from the CDN that are subject to CORS (e.g. fonts), then your CDN needs to return the correct Access-Control-Allow-Origin header for CORS requests with the Origin header in the request. We have this configurable per-origin and by default only allow loading from the origin domain. The value * is not recommended. The correct way to do this is to have a set of trusted origins in your webserver or application configuration and only return the header for the trusted origin. It's also a good idea to be aware of what possible caching of CORS headers can do, so consider using the Vary: Origin header as well.
For CDNs, as with regular application servers, we recommend setting security headers. If you have a CDN mainly for static content, then especially the X-Content-Type-Options: nosniff header, possibly also X-Frame-Options or X-XSS-Protection which make sense mainly for HTML, but possibly also for SVG or XML. Don't forget also HSTS and the Strict-Transport-Security header, so that the browser already enforces HTTPS internally and doesn't allow downgrades to HTTP.
To make sure your CDN is not vulnerable to cache poisoning, we recommend setting various buffers and limit values much more strictly than origin servers usually do. At the same time, if you can afford it, it is better to ignore incoming HTTP headers and forward only a few relevant ones to the origins (e.g. Accept, Accept-Encoding, Origin, Referer, User-Agent). It is also worth considering not to cache the HTTP code 400 Bad Request and definitely not to cache e.g. 413 Request Entity Too Large.
When deploying TLS v1.3 with 0-RTT (early data), you need to consider the risk of Replay attack. Since our CDN is optimized and strict only for static content and blocks POST/PUT/PATCH/DELETE requests, the risk of real resulting abuse is almost zero. Furthermore, data modification in the application should never implement a GET request, but at least a POST with a CSRF token, which should additionally have a one-time validity (nonce).
You can defend against DNS spoofing by having Nginx upstreams to originals set to IP addresses, not hostnames. We host projects for most clients on our own clustered solutions that allow sites to be accessed through a primary as well as a secondary datacenter and through multiple different IP addresses. So even a CDN upstream to a single original load-balances loading across 2-3 IP addresses at different ISPs. If you must already use hostnames, we recommend using at least Dnsmasq as a local DNS cache.
This will not protect you from a DDoS attack, but you can defend against a DoS attack from a single IP address by setting rate-limiting (maximum number of requests per second or minute from a single IP) and connection-limiting (maximum number of open concurrent connections from a single IP). We recommend that you study and understand the burst and delay or nodelay parameters, which fundamentally affect the behavior when an IP address starts to exceed the limits. We typically use multiple levels of rate-limiting on application servers. Also, in the case of POST/PUT/PATCH/DELETE requests, we limit the number of requests per 1 minute as a matter of principle - this effectively prevents brute-force attacks.
In addition to the HSTS header, force an immediate redirect from HTTP to HTTPS.
If the request comes to a URL where the domain is an IP address or another unsupported domain, use return 444; – Nginx immediately terminates such a connection.
Be aware of the risk and implement at least basic protection against looping - for example, refuse to process a URL that includes in the path any of the domains that the CDN "listens" on.
If you don't want someone to be able to insert content from a specific origin into foreign pages (and thus draw your data), you can use the valid_referers directive, which will set the variable $invalid_referer according to your rules.
Test your HTTPS configuration correctly at SSLLabs.com - you should easily achieve an A+ grade. You can also check security headers at SecurityHeaders.com.

In case you don't have a router in front of the servers that would forward/NAT only selected ports to your server, don't forget to set iptables/nftables firewall. By default, everything should be disabled and only TCP ports 80 and 443 explicitly enabled. Furthermore, you can enable IPsec, SSH, etc. from your IP addresses. In terms of security, it has worked for us for a long time to bind all services that are possible, to bind only to the loopback interface and to route only selected ports from the outside using DNAT in the firewall. You can set some high per-ip rate-limiting even with DNAT at the network level, it is nicely described in the article Per-IP rate limiting with iptables. We recommend completely disabling ICMP as well. But you'll probably at least enable echo-request because of the various online tools for measuring latencies in different parts of the world, such as CDN Latency Benchmark.

DDoS attack protection

The most expensive and effective DDoS protection for your CDN would be to use anycast IP addresses (which most commercial CDN providers don't even have) and use robust DDoS protection from commercial providers who have very powerful devices on the backbone that "protects" and in in case of detection of an attack, they activate mitigation and "cleaning" of the traffic of your IP ranges (scrubbing). Some of these solutions manage to clean even the largest DDoS attacks with a power of up to hundreds of Gbps. However, these solutions cost thousands of USD per month, and you definitely cannot afford them at all PoPs in the world. From our experience, we recommend NetScout's Arbor.

Just for the sake of interest, I will state that from February 25 to 27, 2018, one of our hosted Czech clients was the target of a 230Gbps DDoS attack, built on memcrashed (enables amplification of a UDP attack by up to tens of thousands of times, not tens/hundreds as with DNS or NTP amplification attacks). The first big memcrashed attack came on Cloudflare, and as the first in the Czech Republic, just 1 day later, we had to deal with it. If you don't pay for robust DDoS protection, expect that in the event of a massive attack, only the ISP will call you, saying that in order to protect the entire data center and all its clients, they must completely block your IP subnets on the backbone network (blackhole) until the end of the attack.

So, if you are serious about CDN and its high availability, you need to have DDoS protection arranged at least for the main PoPs. At worst, at least a few of them should be able to withstand even the biggest attack. If you use GeoDNS with auto-failover as I described in the first article and if you follow the rule of always returning at least 2 IP addresses of independent providers in each world location, CDN users would necessarily not even notice some DDoS attacks.

What we based our DDoS protection design on:

We have very rich statistics from all our PoPs, possibly even from routers for some. We therefore have a detailed overview and trends of legitimate traffic - the number of open connections, packets, unique IP addresses and GEO information about them. We also collect and monitor NetFlow data for some PoPs. Having detailed information about legitimate traffic and its peaks is key - only on the basis of it is it possible to make correct decisions and propose optimal limits for activating mitigation.
From all the DDoS attacks that we have completed in the past, we know that more than 99% of the source IP addresses involved were outside the Czech Republic - that means outside the country of our majority visitors.
We certainly cannot afford anycast IP addresses for our PoPs. However, there are a few providers on the market that offer physical or virtual servers with anycast IP addresses.
Anycast IP addresses and robust DDoS protection are provided by our DNS providers (Constellix, Cloudflare and ClouDNS). They have such a robust infrastructure that DDoS attacks on their NS servers should withstand.
We have robust DDoS protection capable of handling hundreds of Gbps at some PoPs in the Czech Republic. Other PoPs have to make do with any robust DDoS protection of the entire network of a specific ISP (most of them have it at least for an additional fee).
Due to the nature of the CDN (different resolved IP addresses in different parts of the world), a higher resistance to DDoS attacks can seemingly follow. It is true, but only partially. Finding out all the IP addresses that your CDN domain/hostname resolves to in different corners of the world is a matter of minutes. The attacker therefore needs to direct the attack to multiple IP addresses (which are also not anycast), so attacking the entire CDN network is only several times more difficult (or requires more power), but not impossible. But if they attack only the domain, then attack sources from, for example, Asia will really only affect PoPs in Asia, so in our case the impact on legitimate primary visitors is almost zero.
Except for a few exceptions with 10 Gbps, we have a maximum of 1 or 2×1 Gbps line (bonding) everywhere. That's a pretty thin pipe, however, most of the world's DDoS attacks are statistically smaller attacks around 2-5 Gbps, so if we have firewalls on routers or in Linux set up optimally, we can withstand it quite decently.
We have GeoDNS available with minute health checks and automatic failover, so in the event of a successful attack (unavailability of some IP addresses/ports) we can connect backup PoPs to the CDN network, which the attacker did not know about until now (DNS translation has never shown them), or it is familiar PoPs, but with robust DDoS protection.
We know that 90% of legitimate traffic at some PoPs consists of traffic from IP addresses of a specific country/continent. We can take this into account for setting geo-based limiting.

A couple of tips on how to handle DDoS protection at the end server level:

When you use a Linux firewall or have Linux-based routers, drop all unwanted traffic directly in the RAW table (UDP, ICMP or TCP ports other than 80/443). For UDP, only allow responses from the IP whitelist of the DNS servers you use. In this way, you can protect end devices (servers or routers) against UDP amplification attacks and ICMP flood as effectively as possible. If you only do it in the standard filter, which is after connection-tracking, it's already too late. The processor had to deal with each connection or packet (prerouting, connection tracking, mangle, nat, filter) and each open connection also allocates memory.
TCP SYN flood on port 80/443 can be prevented by using rate limiting (in iptables limit or dst-limit), where you say how many new connections with the SYN flag per time (typically a second or a minute) ) you accept (globally or with respect to the src/dest IP address or port). Similar to Nginx, the key here is to properly understand the meaning of the burst setting and understand the leaky bucket algorithm. Be sure to activate SYN cookies.
You can protect L7 itself (HTTP/HTTPS traffic) with rate and connection limiting on the firewall and secondarily on Nginx (however, it will never be as effective as a firewall).
For PoPs where you know the vast majority of legitimate traffic is local traffic, download the IP subnets of the country/countries (e.g. from ip2location.com). E.g. on PoPs in the Czech Republic, you can have more benevolent rate-limiting for the Czech Republic, but you can be significantly stricter for other countries. When the size of the attack does not exceed the throat of your pipe (connectivity), most likely the majority of visitors from the Czech Republic will not even notice the outage, and you will filter out foreign attacking IP addresses. With good routers, you can easily ensure this, including the dynamic creation of an IP blacklist (which you then directly filter in the RAW table). If you only use a firewall in Linux, you can use ipset to manage these IP lists. Whichever firewall you use, study the meaning of the definition of so-called "chains" in order to minimize the number of firewall rules that connections/packets must go through for their final approval or rejection. Use DROP, not REJECT, to reject. If your firewall allows it and you have a lot of memory, you can also use TARPIT for some TCP situations and slow down the attacker.
Extra tip (our non-standard, but functional solution for medium-sized DDoS attacks on L7): A DDoS attack on L7 is a situation where an attacker sends thousands of HTTP or HTTPS requests to your servers per second from thousands of completely unique IP addresses various around the world. Usually these L7 attacks are "only" hundreds of Mbps or units of Gbps, so you can handle it. To give you an idea - if an attacker is to generate 1 Gbps of traffic at the input with 500B (bytes) of HTTP/HTTPS requests, he needs to generate 250,000 requests consistently per second. The proposed solution is optimally implemented on the router, or on the SW firewall of your server (iptables/nftables and ipset). The solution consists in defining several levels of connection-limit rules with different high limits for different sized IP subnets (e.g. . /3, /8, /16, /24) and when the number of open connections from a given IP subnet exceeds the limits, you add the IP address (or, in extreme cases, the entire IP subnet) to the temporary blacklist (technically, in the case of Linux ipset with timeout), which ensures the DROP of all source traffic already directly at the input, in the RAW table. Usually, even in the case of a DDoS attack, several requests come from each source IP address at the same time. IP subnet /3 will temporarily block an eighth of global IP addresses, or even all 8 /3 IP subnets, if it is a really extensive DDoS attack. But if you set it in combination with the previous recommendation and traffic from the Czech Republic IP addresses (or domestic IP addresses of the given PoPs), you enable higher limits and these rules are processed earlier, the majority of visitors get to the reverse proxy (cache) and the CDN will be work, although it will send content more slowly due to saturated connectivity. From other corners of the world (some high-traffic IP subnets), however, you will temporarily drop traffic on the given PoPs, and the attacker will feel that he has brought down your servers (= a successful DDoS attack), because he will have ports 80 and 443 unavailable. Of course, then you need to have your origin servers on the IP whitelist, the IP addresses through which monitoring, IPsec, DNS, etc. connect to the servers. This solution is a bit strange and we invented it ourselves, but it works very well even in a real DDoS attack. However, it is necessary to set the individual limit levels with balance and based on the maximum number of open TCP connections during peaks, which the monitoring will show you. Then, for example, you can set the limit of the number of open TCP connections for the entire huge /3 IP subnet to 5-10 times the previous peak peak time. This will not limit legitimate traffic and you may be able to withstand a DDoS attack.
If you have the options, test your DoS and DDoS protections, analyze the behavior, monitor the related load. There are also online tools that, for a fee, can generate quite a lot of traffic from a large number of unique IP addresses and it is not something immoral from the dark net.
Design some active mechanisms that will immediately notify you of an ongoing attack - for example, by monitoring the size of the blacklist queue.
In any case, when designing these protections, it is good to know how/why and how long TCP connections are open, what governs it and how it behaves from a TCP point of view in the case of the majority of HTTP2 traffic today. In the event of a reaction to an active attack, you can automatically temporarily reduce various timeouts in the TCP stack or on the web server, start sending the Connection: close header, etc.
It is also worth mentioning here the possibility of using Fail2Ban, however, the way it is detected and how it works in the case of a large-scale DDoS attack, where tens/hundreds of thousands of lines start appearing in the log a second, not much help. Logging alone then easily writes 10 MB/s to the disk, and if you did not have log buffering access turned on, extreme IOPS.

Monitoring

Regardless of how many servers your CDN consists of, you need to actively and passively monitor them.

We use Nagios for active monitoring and Munin for quick basic graphs of vital signs. In Munin, we can also quickly view trend charts for several years. This is simply not possible with the Kibana listed below (part of the Elastic stack), due to the size of the indexes, or it is necessary to use transformations/rollup into archive indexes.

For more live statistics we use 2 other tools:

We use collectd to collect metrics of all vital functions (CPU, RAM, IOPS, storage, network, Nginx) - we send everything to Kibana.
Using filebeat, we send all access and error logs to another Kibana. From Ansible, we generate Nginx vhosts so that each origin has its own access and error log.

In individual Kibanas, we have dashboards summarizing CDN traffic as a whole as well as breakdowns by individual servers (PoPs). Thanks to the evaluation of absolutely all metrics from access logs, we have detailed information about, for example:

cache hit-ratio
statistics of IP addresses and rendering of traffic to the GEO map of the world
statistics of HTTP codes
statistics of data transfers (we collect the sizes of requests and responses)
response time statistics
breakdown by servers (PoPs) or individual GEO locations
breakdown by origin domains
breakdown by content types (JS/CSS/images/fonts/audio/video)
breakdown by specific URLs.

We recommend monitoring the DNS resolving of your CDN domains as well, so that you can constantly check whether the GeoDNS providers always return the expected sets of IP addresses. We implemented this monitoring as follows:

Nagios monitors the controls listed below every minute and immediately notifies us by e-mail and SMS of unexpected conditions or slow responses of NS (name servers).
We wrote a Nagios plugin, which receives the NS server (e.g. ns11.constellix.com, or perhaps 8.8.8.8), the tested domain (e.g. my.cdn.com), a set of expected IP addresses, min. the number of IP addresses, how many of the set must occur in resolving and, of course, the maximum response time and timeout of the NS server. In the event that DNS resolve does not contain the expected set of IP addresses in min. number, or the domain is resolved to another IP address/addresses, or resolving takes a long time, notifications are sent.
In this way, every minute we test absolutely all authoritative NS servers of our GeoDNS providers (6× NS Constellix and 4× NS ClouDNS).
Every minute we also check the correct functionality of DNS resolving on the popular recursive cache NS servers of Google (8.8.8.8) and Cloudflare (1.1.1.1) to make sure that there is no hitch on the way between the authoritative and recursive DNS servers.
We carry out this monitoring both from our servers in the Czech Republic and in other countries through NRPE agents, while, for example, in the case of a plugin running on a German server, it is checked that the DNS has been translated to the IP addresses of our German POPs.
We record the results of all these checks in daily-rotated logs and, if necessary, serve as a basis for retroactive analysis of problems or anomalies.

Other useful tools

vnstat is recommended for quick network traffic statistics on individual servers. Commands such as vnstat -l for live info, or statistical vnstat -h, vnstat -d or vnstat -m are also often useful. iptraf-ng again for a detailed analysis of current traffic. For an overview of TCP connections, use ss -s or e.g. ss -at.
For a quick live overview of what the server is currently doing in all important areas, we prefer dstat, specifically with the dstat -ta switches. And of course htop.
If you don't have experience with Kibana yet, take a look at Grafana with InfluxDB. We've been using Kibana for years and have hundreds of custom visualizations and dashboards in it (that's why it was our first choice), but our latest experience is that Grafana with InfluxDB is overall faster, especially for long-term dashboards. However, the concept of working with data, creating visualizations and dashboards is quite different.

Tips and highlights from the implementation

When implementing some functionalities, you will definitely encounter one inconvenience in Nginx – the add_header directive does not behave inheritably. If you set add_header in the server level and then also inside the location, only the headers set in the location will be sent in the final, but those set one level higher in the server will be ignored. For that reason, it is better to use the more-headers module and its functions that behave inherited (more_set_headers, more _clear_headers, more_set_input_headers, more_clear_input_headers).
If you use Debian, I recommend using the repo from Ondřej Surý (packages.sury.org) (thanks to Ondřej thank you for maintaining this repo), which, in addition to the latest versions of Nginx, also contains compatible versions of the more-headers module.
The standard of reliability for us has long been HAProxy, which we have been using for many years for load balancing and various auto-failover scenarios. In addition, since version 2, it has completely redesigned and improved handling of HTTP requests and more robust HTTP/2 support. We first tried to use HAProxy instead of Nginx, but unfortunately it only has very limited caching capabilities, which is critical for a CDN. However, we would certainly use HAProxy as a load-balancer in the event that we have multiple servers behind one PoP.
If you want maximum performance, we recommend trying H2O instead of Nginx - https://h2o.examp1e.net/. We have many years of experience with Nginx, so even more complex scenarios are already fully automated in Ansible. Transcription into H2O would definitely be interesting, but also quite time-consuming. In addition, the ratio of 500 open and 650 closed tickets on GitHub is a sign that it is not yet completely production ready.
If you have even greater demands on the functioning of the cache, we recommend Varnish instead of Nginx. Nginx is great and according to our measurements a bit more powerful, but with Varnish you can get, for example, cache tagging support through HTTP headers, when you can then selectively invalidate the cache of all URLs with the desired tag. This can be very useful, e.g. in combination with caching of POST requests (e.g. on the GraphQL API), where after detecting a change in some entity on the BE, you could invalidate all relevant caches on the API layer. This is how we cache and invalidate it at the application layer, and our future goal is to cache it at the data level in the CDN as well. For future web projects, we want to stick to the JAMStack philosophy, where such a CDN with smart options for selective cache invalidation plays a key role. Therefore, we will definitely be using Varnish for our CDN in the future, probably in combination with Nginx.
If you want to support HTTP/3 (QUIC) we recommend quiche from Cloudflare, or lsquic which is part of the OpenLiteSpeed web server. For now, we are just experimenting with HTTP/3. It requires BoringSSL instead of OpenSSL and additionally Nginx older version 1.16.
UPDATE: The point above was valid at the end of 2021. At the beginning of 2024, you already have HTTP/3 support directly in Nginx. We are currently cautious with the deployment of HTTP/3, especially due to the risks of DoS/DDoS attacks, for which we do not yet have sufficient protection mechanisms with UDP.
If you use virtualized servers and have the supported HW, use SR-IOV and driver ixgbevf with setting InterruptThrottleRate=1. The queue of incoming requests will be processed more efficiently and the CPU load will also be reduced.
If you have a lot of CPU cores and optimize for hundreds of thousands of requests per second, also focus on RPS (Receive Packet Steering), because usually only one CPU core processes the incoming queue.
For those who are also interested in various network details regarding browser requests, HTTP/2 streams or DNS resolving, we recommend studying the tools around Google Chrome. Specifically chrome://net-internals/, chrome://net-export/ and related tool https://netlog-viewer.appspot.com/. It helps us to understand the influence of the behavior of HTTPs requests on the rendering of the page itself, also to reveal blind spots where something is waiting, etc.
If you really want to understand HTTP/2 and optimize the loading speed of your pages, install nghttp2 and understand how HTTP/2 communicates directly with your website. You can try, for example, the command nghttp -nv https://dev.to/.
The performance of the server and its connectivity can be easily tested, e.g. using the one-line nench benchmark.
In the case of hosting large files, it is necessary to realize that even if the client makes a byte-range request, your CDN must first load the entire file from the origin (or cache it) and only then return the required chunk from it. That's why it can be better if you have the option to push these videos and other large files to the CDN before visitors start accessing them. But you can also help yourself by using the slice module of Nginx, which can download and cache only configurable "chunks" from Origin.
Beware of the popular and sometimes slightly treacherous recursive cache DNS servers of Google (8.8.8.8, 8.8.4.4) or Cloudflare (1.1.1.1). It is not uncommon for Czech visitors to occasionally translate requests to foreign IP addresses. But it only happens once every few days or hours and it usually only lasts a few minutes.
Although CDN PoPs as such are functionally fully autonomous and independent, you will still need a connection to some central location for their management, monitoring or e.g.: distribution of cache-purge requests. Therefore, set up IPsec tunnels using strongSwan or WireGuard, the configuration of which can be very nicely automated.
When implementing cache deletion, you can use the script nginx-cache-purge, which shows how cache files can be effectively found by URL or mask. I also recommend the articles Purging cached items from Nginx with Lua and Improving NGINX LUA cache purges. We decided to base it on this Lua script, we just added a few of our modifications. If you script it in Lua, we recommend making a vhost listening on a non-standard port, which you will only have available through an IPsec tunnel. If you also implement static brotli/gzip compression, don't forget to delete your .br/.gz files or .webp/.avif files as well.
If you are deploying your own or commercial CDN in front of your entire domain, be aware of one potential vulnerability that you can quickly overlook. Respect the client IP address from the X-Forwarded-For header only if the request comes to you via the network only from specific known public IP addresses of CDN servers. In Nginx, trusted sources are defined via the realip module and the set_real_ip_from directive. Never use something like set_real_ip_from 0.0.0.0/0. If you have some part of the domain or application functionality limited only to the IP whitelist, then the attacker could obtain another IP address with the HTTP header.
In case you decide to use a commercial CDN, we recommend the domestic CDN77 because their support can ensure that all requests to your source Origin domain will only come from a few fixed IP addresses in the Czech Republic (their CDN proxy servers), and you can set them as trusted. Usually, CDN providers do not tell you the entire list of possible IP addresses of their PoPs, and you cannot rely on the fact that they send a header e.g. Via: cdn-provider in requests. This is simply not safe and can be easily thrown away, while the support of CDN providers will often recommend such a dangerous solution.

Conclusion and practical experience

We hope that the series of these 3 articles helped you and showed you how you can build a CDN yourself. We have described to you what it consists of, and how you can lay out specific components and set them up yourself. But carefully consider whether it is really worth it for your needs. Also keep in mind that you will have several servers running around the world that need to be paid for and also taken care of and patched.

Our CDN, built as described in this article, works great. We have it under close scrutiny and carefully actively and passively monitor traffic on all servers. Its performance and speed in browsers is even higher than commercial CDNs (thanks to static compression and also the fact that most of the content is in RAM, since we don't have thousands of clients). We gradually deploy it in the projects we develop for our clients. Thanks to this, we cover Europe in particular very well and under our own power. In order to cover remote corners of the world just as well, we use another commercial CDN in these secondary locations. We know that we provide our clients with a quality service at a good price. And technically, we have another interesting project in our portfolio that is "living" and brings real value.

In addition, since the production deployment in 09/2019, not a single problem has appeared - all components work flawlessly. We tried not to underestimate anything - the production deployment was preceded by stress and penetration tests. We looked for post-mortems of various successful commercial CDN attacks and tried to debug our configurations according to them. We first tested the functionality on various non-production environments of our client projects. Search engines detect the use of our CDN correctly - despite the fact that the images are loaded from the CDN domain, they are indexed correctly under the domain of origin.

In the future, we will consider splitting the CDN into two parts - one optimized especially for many small, frequently loaded files (eg JS/CSS/icons/fonts) and the other for larger files (eg audio/video or large images). Such a solution can have 2 advantages - the browser will allow even more parallelism when rendering the page (assets will be loaded from several different CDN domains/IP addresses depending on the type) and it will also allow fine-tuning even more precisely to the level of traffic, more efficient use of cache, or HW selection.

In our heads, we still have the option of using our CDN as a reverse proxy in front of the entire client domain, i.e. for all requests, including POST/PUT/DELETE. This would give us the benefit of another level of DDoS protection against Origin servers, but we would deprive ourselves of other benefits – especially targeted optimization for static content and also the use of higher parallelism in browsers, thanks to loading content from several different domains, or IP address. At the same time, it would be very tempting for each PoP to use multiple servers for different types of content with load balancing between these servers, e.g. according to the suffix in the URL. But we have a lot of such possible improvements in the drawer, and maybe they will give us meaning and return in the coming years.

I'm asking everyone - let's report bugs

CDN implementation and debugging also showed us that all technologies have flaws. The more super-features someone brings, the more bugs they make. And that regardless of whether it is developed and tested by one or thousands of people. That's why I have one personal off-topic request: please don't be lax and when we encounter a problem, report it to the authors and don't expect someone to do it for us. This way we will solve the community problem, but also our problem, and at the same time we will learn a lot more, because we often have to go in depth. It also teaches us to communicate things to the other party in an understandable form.

I used to not do it myself, and I thought to myself that "they will surely quickly find out and fix it themselves". A mistake and a faulty reasoning, which I admitted to myself over time...

However, in recent years I have already reported or participated in the correction of various errors myself and problems. For example in Firefox (bugs in behavior and headers around AVIF), Google Chrome (problems with CORS vs. cache vs. prefetching), web server Nginx (HTTP/2), PHP (OPcache), ELK Stack (UI/UX errors in Kibana and Grok in Logstash), in Mikrotik RouterOS or GlusterFS. I also have 13 tickets for MariaDB and MaxScale proxy. Although I could not help with these technologies as a developer, I at least provided enough comprehensible information so that developers could quickly understand the problems, simulate and fix it. If you happen to be making some resolutions to 2024, the willingness to open well-described tickets or send PR could be one of them.

If you are interested in any other CDN-related details, ask in the comments or ask on X/Twitter @janreges. I will be happy to answer.

Test your websites with my analyzer

In conclusion, I would like to recommend one of my personal open-source projects, which I would like to help improve the quality of websites around the world. The tool is available as a desktop application, but also a command-line tool usable in CI/CD pipelines. For Windows, macOS and Linux.

I launched it at the end of 2023 and I believe that it will help a lot of people to increase security, performance, SEO, accessibility or other important aspects of a quality web presentation or application. It's called SiteOne Crawler - Free Website Analyzer and I also wrote an article about it. Below you will find 3 descriptive videos - the last one also shows what report it will generate for your website.

In addition to various analyses, it also offers, for example, the export of the entire website into an offline form, where you can view the entire website from a local disk without the internet, or the generation of sitemaps.

Sharing this project with your colleagues and friends will be the greatest reward for me for writing these articles. Thank you and I wish you all the best in 2024.

Desktop Application

Command-line tool

HTML report - analysis results

SiteOne Crawler — website analyzer you will ♥

Ján Regeš — Sat, 09 Dec 2023 22:12:29 +0000

Greetings to all web developers, QA engineers, DevOps, website owners, IT students or consultants in the online environment.

I would like to introduce you all to a very useful and open-source tool that I believe you will quickly come to love and will be a useful tool for you in the long run. The goal of the tool is to help improve the quality of websites worldwide.

It analyzes your entire website, every single file found, provides you with a clear report and has additional features, such as complete export of your website to offline version, where you can view your website from a local disk or USB stick.

This tool can be used as a desktop application (for Win/macOS/Linux) or just as a command-line tool with clear and detailed output in the console, also usable in CI/CD pipelines. Note: In the next few days we will set up an Apple and Microsoft developer account so that we can properly sign the desktop apps and the installation will be trusted. At the same time, to get the applications into the official App Store or Microsoft Store.

If you don't like reading, scroll to the end of the article with videos where there are practical examples.

Main features

For developers and QA engineers

No one is perfect and I don't know of a single developer or company that, even across different levels of testing and checklists, runs a really perfect website. Websites are usually not about the optimized homepage, but a bunch of different pages. This makes it difficult to really check the entire website for SEO, security, performance, accessibility, semantics, content quality, etc. This tool will crawl every single page, every URL contained anywhere in the content, including JS, CSS, images, fonts or documents. Depending on the type of content, it performs various analyses and reports imperfections.
Works well for development versions of websites on localhost and specific ports, or with HTTP proxy or HTTP authentication required.
It can also generate a fully functional (usually) and viewable offline static version of the website, even when dynamic query parameters are used in the URL. However, the problem are some modern JS frameworks that use JS modules, and unfortunately these are disabled by CORS with local file:// protocol.
Can generate sitemap.xml and sitemap.txt with lists of all URLs of existing pages.
It can also serve as a stress-test tool, as it allows you to set the max number of parallel requests and the max number of requests per second. But please do not abuse the tool for DoS attacks.
It's really consistent in searching and crawling URLs - it pulls and downloads e.g. all images listed in srcset attributes, in CSS url(), even e.g. for NextJS websites it detects build-manifest and creates from it URLs to all JS-chunks, which it then downloads.

A list of analyses that the crawler performs and reports imperfections:

for each URL HTTP status code, content type, response time and size, title, description, DOM elements count, etc.;
checks inline SVGs and warns when there are large inline SVGs in the HTML, or a lot of duplication, and it would be better to insert them as an extra *.svg file that may be cached;
checks the validity of the SVG from an XML perspective (very often manual editing of SVGs will break the syntax and not all browsers can fix this with their autocorrect);
checks for missing quotes in HTML text attributes (can cause problems if values are not escaped correctly);
checks the max depth of DOM elements and warns if the depth exceeds threshold;
checks the semantic structure of headings, the existence of just one <h1>, warns about details;
checks that phone numbers contained in the HTML are correctly wrapped in a link with href="tel:", so that they can be clicked on to make a phone call;
checks the uniqueness of titles and meta descriptions - it will alert you very quickly if you don't add the page number to the title, or the name of a filtered category, etc.;
checks the use of modern Brotli compression for the most efficient data transfer;
checks the use of modern WebP and AVIF image formats;
checks for accessibility and that important HTML elements have aria attributes, images have alt attributes, etc.;
checks HTTP headers of all responses and warns about the absence of important security headers and generates statistics of all HTTP headers and their unique values;
checks cookie settings and warns about missing Secure flags on HTTPS, HttpOnly or SameSite;
checks OpenGraph metadata on all pages and displays their values in the report;
checks and reports on all 404 pages including URLs where non-existent URLs are located (also monitors links to external domains);
checks and reports all 301/302 redirects including the URL where the redirected URL is located;
checks and reports DNS settings (IP address(es) to which the domain is resolved, including visualization of possible CNAME chain);
checks and reports SSL/TLS settings - reports the validity of the certificate from-to, warns about support of unsafe SSL/TLS protocols, or recommends the use of newer ones;
if enabled, downloads all linked assets from other domains (JS, CSS, images, fonts, documents, etc.);
downloads robots.txt on every domain it browses and respects the prohibition of crawling on pages forbidden in robots.txt;
shows all unique images found on the website in the Image Gallery report;
shows statistics of the fastest and slowest pages, which are best to optimize, add cache, etc.;
shows statistics on the number, size and speed of downloads of each content type and then a larger breakdown by mime-type (Content-Type header);
shows statistics from which different and foreign domains you are retrieving which type of content;
shows a summary of all findings, sorted by severity;
allows you to also set response HTTP headers to be included in the URL listing (in the console and HTML report) via the --extra-columns setting - typically e.g. X-Cache;
has dozens of useful settings that can be used to influence the behavior of crawling, parsing, caching, reporting, output, etc.;
in the future we want to implement a lot of other controls and analyses that will make sense within the user community - the goal is to create a free tool that will be very useful and versatile.

For DevOps

Especially for Linux users, the command-line part of SiteOne crawler is very easy to use, without having to install any dependencies. Included is the runtime binary for x64/arm64 and the crawler source code. Just git clone, or use the crawler in tar.gz to where you need it. By default, crawler saves files in its 'tmp' folder, but any paths for caching or reports/exports can be set with a CLI switch. In the coming weeks we will also prepare public Docker images for the possibility to use Crawler in CI/CD environments with Docker or Kubernetes.
Very useful is the possibility to have the whole website rebuilt during some pre-release phase in CI/CD. Using CLI switches you can have the resulting HTML report sent to one or more emails via your SMTP server.
Crawler allows you to configure the use of your HTTP proxy, set up HTTP authentication or crawl the website on a special port, e.g. http://localhost:3000
By setting the number of parallel workers or max requests per second, you can test your DoS protections, or perform a stress test to see how much load the target server(s) are producing what traffic.
You can use CLI switches to turn off support for JS, CSS, images, fonts or documents, and you can use the crawler to immediately warm up the cache after a new release, which usually includes flushing the cache of the previous version.
In addition to the HTML and TXT report (output as in the console), the crawler also generates output to a JSON file, which then contains all the findings and data, in a structured and programmable form. So you can integrate the output from the crawler further, according to your needs.

For website owners and consultants

General quality audit of website processing - website owners should be aware of what reserves their website has and where improvements could be made. Some improvements are not trivial and can be quite costly to implement. Some, however, take tens of minutes to implement and their impact on output quality can be high.
Audit on-page SEO factors - checks all titles and descriptions as well as headings on all pages pointing out lack of uniqueness, or missing
headings or incorrect semantic structure. Most of the findings can usually be corrected by the website owner themselves through the CMS.
Link Functionality Audit - goes through every single link in the content on all pages and alerts you to broken links or unnecessary redirects (typically due to missing slashes at the end of the URL).
Audit various UX details, such as whether all phone numbers found in the HTML are wrapped in an active link with href="tel:" so that a visitor can click on them and dial the call without having to rewrite or copy the number.
Overview of all images on the website - the HTML output report contains a viewable gallery of absolutely all images found on the website. You may notice, for example, low-quality or unwanted images.
Overview of page generation speed - the website owner should strive to have all pages on their website generate ideally in tens, max hundreds of milliseconds, as slow sites discourage visitors and are statistically proven to have lower conversions on slow sites. In fact, often only the homepage is measured, which is often optimized by the developers, but the other pages may be neglected from the perspective. If the website is slow and optimization would be expensive, it is often possible to move the website to a more powerful hosting with a slightly higher price. SiteOne Crawler stores all reports on your hard drive, so you can then use it to measure and compare the website before/after optimizations or moving to faster hosting.
You can tell the Crawler what other domains it can also fully crawl - typically subdomains or domain extensions with other language mutations, such as *.mysite.tld or *.mysite.*.
Crawler offers the possibility to have the entire website, including all images or documents, exported to the offline form. The site is then fully, or almost fully functional in terms of browsing and crawling even from a local disk, without the need for the Internet. Great functionality for easy archiving of web content at any given time. It can also help in a situation where some institution requires you to keep an archive of your website on different days, for legal purposes.

Feedback is welcome

We would be very happy if you try our tool and give us your feedback. Any ideas for improvement are also very welcome. The tool is certainly not perfect today, but our goal is to make it perfect in the coming months.

And, of course, we will also be happy to share this article or the website crawler.siteone.io with your colleagues or friends who the tool could help. On the homepage you will find sharing buttons.

Thank you for your attention and we believe that our tool will help you in improving the quality of your website(s).

Videos

Desktop Application

Command-line Interface

HTML report

How to build a CDN (2/3): server and reverse proxy configuration

Ján Regeš — Sat, 08 Jan 2022 17:57:33 +0000

In the previous article about basic CDN components we described what components you need to build a CDN, and today we will focus on the software configuration of the servers and the reverse proxy itself, which will cache the content to ensure that the data is always as close as possible to the end visitors.

The primary goal of this article is not to give you specific values for each setting (although we will recommend some), but to tell you what to look for and what to watch out for. In fact, we also tune and optimize the specific values ourselves over time according to the traffic and the collected monitoring indications. It is therefore essential to understand the individual settings and adjust them with respect to your HW and expected traffic.

Operating system

At SiteOne we have the vast majority of servers running on Linux — specifically Gentoo and Debian distributions. In the case of CDN, however, all our servers are running on Debian, so any detailed tips will include Debian paths/settings.

In the area of OS and kernel, we recommend focusing on the following parameters, which will significantly affect how much traffic each server can handle without rejecting TCP connections or hitting other limits:

Configure /etc/security/limits.conf — set significantly higher soft and hard limits especially for nproc and nofile for the nginx process (tens to hundreds of thousands).
Ideally, configure the kernel via sysctl.conf and focus on the parameters you see in the recommended configuration below. It’s a good idea to study each parameter, understand how it affects your operation, and set it accordingly.
If you have kernel 4.9+ you can enable the TCP BBR algorithm to reduce RTT and increase the speed of content delivery. Parameters: net.ipv4.tcp_congestion_control=bbr, net.core.default_qdisc=fq (more info in the article at Cloudflare).
Check the RX-DRP value with netstat -i, and if the value is already in the millions after a couple of days and still increasing, increase the RX/TX buffers on the netstat. To find the current setting and max value, use ethtool -g YOUR-IFACE and set the new value with ethtool -G, so for example ethtool -G ens192 rx 2048 tx 2048. To make the setting survive a reboot, call the command in post-up scripts in /etc/network/interfaces or /etc/rc.local. If you are modifying the network interface that connects you to the server, be careful, because the change will reboot the interface.
Txqueuelen on network cards is recommended to be raised from the default 1000, depending on your connectivity and network card.
Set the IO scheduler on each disk/array depending on what storage you are using — /sys/block/*/queue/scheduler. If you are using SSD or NVME, we recommend none.
Iptables or router — it is recommended to set some hard limits on the number of simultaneous connections from one IP address and the number of connections per certain time. In case of a DoS attack, you can filter out a large part of the traffic effectively already at the network level. However, you should also set limits with respect to possible visitors behind NAT (multiple legitimate visitors behind one IP address is a typical situation e.g. with mobile operators or smaller local ISPs).

When setting individual parameters, consider what the typical traffic of a visitor who retrieves content from the CDN looks like. HTTP/2 is essential, as it usually only takes one TCP connection for a visitor to download all the content on the page. You can afford shorter TCP connection timeouts, keepalives, smaller buffers. The metrics you collect, such as: the number of TCP connections in each state, will tell you a lot in real traffic. If you want to handle tens of thousands of visitors in seconds or minutes, forget about the default values of various timeouts in minutes and test values in units to tens of seconds.

Recommended kernel configuration

The values of each setting should be taken only as our recommendation, which has been proven to work well for a server with 4–8 GB RAM, 4–8 vCPUs and Intel X540-AT2 or Intel I350 network cards. Some directives have values an order of magnitude higher or lower than the distributions default. These are usually modifications to increase the ability to handle heavy traffic efficiently and minimize the impact of a DoS or DDoS attack. It is also important to note that the configuration is for a server with IPv6 support disabled. If your situation allows it, use IPv6 too.



fs.aio-max-nr = 524288  
fs.file-max = 611160  
kernel.msgmax = 131072  
kernel.msgmnb = 131072  
kernel. panic = 15  
kernel.pid_max = 65536  
kernel.printk = 4 4 1 7  
net.core.default_qdisc = fq  
net.core.netdev_max_backlog = 262144  
net.core.optmem_max = 16777216  
net.core.rmem_max = 16777216  
net.core.somaxconn = 65535  
net.core.wmem_max = 16777216  
net.ipv4.conf.all.accept_redirects = 0  
net.ipv4.conf.all.log_martians = 1  
net.ipv4.conf.all.rp_filter = 1  
net.ipv4.conf.all.secure_redirects = 0  
net.ipv4.conf.all.send_redirects = 0  
net.ipv4.conf.default.accept_redirects = 0  
net.ipv4.conf.default.accept_source_route = 0  
net.ipv4.conf.default.rp_filter = 1  
net.ipv4.conf.default.secure_redirects = 0  
net.ipv4.conf.default.send_redirects = 0  
net.ipv4.ip_forward = 0  
net.ipv4.ip_local_port_range = 1024 65535  
net.ipv4.tcp_congestion_control = bbr  
net.ipv4.tcp_fin_timeout = 10  
net.ipv4.tcp_keepalive_intvl = 10  
net.ipv4.tcp_keepalive_probes = 5  
net.ipv4.tcp_keepalive_time = 60  
net.ipv4.tcp_low_latency = 1  
net.ipv4.tcp_max_orphans = 10000  
net.ipv4.tcp_max_syn_backlog = 65000  
net.ipv4.tcp_max_tw_buckets = 1440000  
net.ipv4.tcp_moderate_rcvbuf = 1  
net.ipv4.tcp_no_metrics_save = 1  
net.ipv4.tcp_notsent_lowat = 16384  
net.ipv4.tcp_rfc1337 = 1  
net.ipv4.tcp_rmem = 4096 87380 16777216  
net.ipv4.tcp_sack = 0  
net.ipv4.tcp_slow_start_after_idle = 0  
net.ipv4.tcp_synack_retries = 2  
net.ipv4.tcp_syncookies = 1  
net.ipv4.tcp_syn_retries = 2  
net.ipv4.tcp_timestamps = 0  
net.ipv4.tcp_tw_reuse = 1  
net.ipv4.tcp_window_scaling = 0  
net.ipv4.tcp_wmem = 4096 65536 16777216  
net.ipv6.conf.all.disable_ipv6 = 1  
net.ipv6.conf.default.disable_ipv6 = 1  
net.ipv6.conf.lo.disable_ipv6 = 1  
vm.dirty_background_ratio = 2  
vm.dirty_ratio = 60  
vm.max_map_count = 262144  
vm.overcommit_memory = 1  
vm.swappiness = 1

Reverse proxy and cache

On all PoP servers, you need a critical CDN component — a reverse proxy with robust caching support. Most popular are Varnish, Squid, Nginx, Traefik, H2O and with limited functionality e.g. HAProxy. Tengine is also worth considering, built on Nginx and adding a lot of interesting functionality.

In the context of a CDN, the functionality of the reverse proxy is quite clear — based on the URL and request headers, find the content in the cache and if it is not there, or has expired, download it from the Origin server and store it in the cache so that the next visitor’s request is processed faster, from the cache on the PoP.

We finally chose Nginx web server because we have been using it successfully on most of our servers for many years. We have all the configurations and different vhost variants as well as optimal functional, performance and security settings in Ansible. As for the specific version, we recommend the latest 1.19.x, which already includes the improved HTTP/2 implementation, along with OpenSSL 1.1.1 due to TLSv1.3.

Compared to our normal default values for application servers, we have significantly reduced various buffers, timeouts, and thresholds for CDNs, as well as for the kernel. Our CDN is optimized for static content and for handling only GET/HEAD/OPTIONS requests. Since we don’t have to support POST or uploads anymore, we could tighten the parameters significantly, both on the client side and on the backend (requests to source origin servers).

The following text assumes that you already have at least basic experience with Nginx — that’s why there are no specific configuration snippets, but rather various recommendations beyond basic usage that you won’t usually find in Nginx tutorials and have a significant impact on CDN operation.

Cache is a key functionality of a CDN, so we recommend:

Check out the High-Performance Caching guide. For proxy cache, carefully study and understand all proxy_cache*_ directives and their parameters. Start with proxy_cache_path and the levels, key_zone, _inactive_or max_size attributes. For remote secondary PoPs, you can have inactive for weeks or months, for example — the cache manager will also keep content that hasn’t been accessed for longer, thus increasing the accelerating effect of CDN and cache hit-ratio even for PoPs from which the content of specific URLs is not downloaded as often.
Optimally set the proxy_cache_valid directive, which affects how long the HTTP codes are cached. If you decide to cache error codes, e.g. 400 Bad Request, then only cache them for a very short period of time to minimize the effects of possible “cache poisoning”.
If you don’t want an original to consider its “cache control” through response headers when caching, you can use proxy_ignore_headers and ignore typically Cache-Control, _Expires_ or Vary headers.
Also pay attention to the proxy_cache_use_stale, which affects how the cache behaves if the origin is unavailable. We decided that if by chance the original is down and the cache has expired, we will return the original content to the visitor anyway. This will encourage high availability. Also set up updating to load the visitor’s content immediately from the cache after expiration (without waiting for the original), but update the content immediately from the original in the background for future visitors. This eliminates the effect of occasional slowdowns, where once in a while a visitor “gets carried away” by the need to update the expired content of a given URL in the CDN.
Decide what to set in the proxy_cache_key. For example, do you want to include a possible query string in the cache key, which is often used to “version” files and suppress the cache of the original version of the file?
Activate proxy_cache_lock to keep the cache filling/keeping optimal even with high parallelization and decide how to set proxy_cache_min_uses .

In addition, consider the following tips and settings that affect Nginx performance:

If your platform allows it, set up use epool. If you have kernel 4.5+, it will use EPOLLEXCLUSIVE.
For listen directivity of the main node of your CDN (cdn.company.com) use reuseport, so that requests to individual Nginx workers are distributed by the kernel, it is many times more efficient. For the listen directive, study also the backlog and fastopen parameters. You can also activate deferred, so that the request reaches Nginx only when the client actually receives the first data, which can better address some types of DDoS attacks.
Activate http2 on the listen directive and always keep a secure set of ssl_ciphers (with respect to the browser versions you want to support).
If you can afford to do so given the browsers supported, only support TLSv1.2 and TLSv1.3.
The CDN server processor will be mostly loaded by gzip/brotli compression and SSL/TLS communication. Set ssl_session_cache to minimize SSL/TLS handshakes. We recommend shared so that the cache is shared between all workers. For example, a cache size of 50 MB, which will fit about 200,000 sessions in the cache. To minimize the number of SSL/TLS handshakes, you can increase the ssl_session_timeout. If you don’t want to use SSL cache on the server, enable ssl_session_tickets to keep the session cache active at least in the browser.
For SSL settings, activate 0-RTT on TLSv1.3 (ssl_early_data on) to substantially reduce latency, but understand and consider Replay attack.
If you want to achieve minimal TTBF (at the expense of higher load when transferring large files), study and set reasonably low ssl_buffer_size and http2_chunk_size. Alternatively, deploy the Cloudflare patch to Nginx, which supports dynamic settings — just google the ssl_dyn_rec_size_lo directive.
Also focus on understanding and setting up KeepAlive both on the client side and in the upstreams — this will help streamline communication with the origin servers. KeepAlive HTTP/2 is governed by the http2_idle_timeout directive (default: 3min), also look at http2_recv_timeout. Keeping connections open unnecessarily long significantly reduces the number of visitors you are then able to serve. It also affects how large a DDoS attack you are then able to withstand. It’s good to have an understanding of how connection-tracking works (both on Linux and possibly on routers when the server is behind NAT), how it relates to the limit_conn setting, and how it behaves as a whole if you have hundreds of thousands of clients accessing your servers or are under a DDoS attack on L7.
If you need to detect a change in the IP address of the original and you don’t have a paid Nginx Plus with the resolve attribute on the upstream server, you can just use proxy_pass: https://www.myorigin.com; instead of defining an upstream. In this mode, proxy_pass monitors the TTL in the domain DNS and updates the IP address(es) if necessary.
Also study the lingering_close, lingering_time, and lingering_timeout directives, which determine how quickly inactive connections should be closed. For better resistance to attacks, it makes sense to reduce the default times. For HTTP/2 connections, however, lingering_* directives have only been applied since Nginx 1.19.1.
Increase ULIMIT in /etc/default/nginx and also set a higher LimitNOFILE in /etc/systemd/system/nginx.service.d/nginx.conf.
The sendfile, tcp_nopush and tcp_nodelay also help to handle files and requests quickly. To prevent clients with fast connections downloading large files from using up the entire worker process, set sendfile_max_chunk sensibly as well.
If you are handling very large files and are seeing slowdowns in other requests, consider using aio. Be sure to set the directio directive appropriately, which defines the max size of the file that will still be sent via sendfile and larger ones via aio. We find 4MB to be the optimal value, so all JS/CSS/fonts and most images are handled through the sendfile and usually from the FS cache, so no IO does this either.
Also look at the directives around open_file_cache. With optimal settings and enough RAM you will have almost zero IOPS, even if you are clearing hundreds of Mbps.
To handle high numbers of concurrent visitors and protect yourself from attacks, reduce client_max_body_size, client_header_timeout, client_body_timeout, and send_timeout as a matter of principle.
For access log settings, study the buffer and flush parameters to minimize the IOPS associated with writing logs. Beware that this will also cause the logs to not be written 100% chronologically. Access logs should ideally be stored on a different disk than the cached data.
For upstreams, you can play with load balancing (if the original can be accessed via multiple IP addresses) and backup weighting attributes. In the current version, the useful max_conns attribute, which was for a long time only in the paid version, is now freely available.
If you also want to have some form of auto-retry logic (for case of short unavailability of the origin), you can solve it for example by using multiple upstream-servers to the same original, but in between them put a vhost with short Lua code that will provide sleep between retry requests.
Use a custom resolver setup and consider using the local dnsmasq as the primary resolver.
Learn how the Cache Manager works in Nginx, which starts working especially when the cache gets full.
Not everything can be mentioned here, but other attributes have an impact on proxy and cache behavior, which we recommend to study and set as well: proxy_buffering, proxy_buffer_size, proxy_buffers, proxy_read_timeout, output_buffers, reset_timedout_connection.
If you will be using dynamic modules with Nginx (in our case for brotli compression and WAF), with every Nginx upgrade you have to recompile all modules against the new Nginx version. If you don’t do this, Nginx won’t boot after the upgrade due to signature conflicts with *.so modules. It is therefore better to automate the whole process of upgrading Nginx, because you will end up with a broken Nginx when you upgrade e.g. apt. Part of this automation should include using the option to do Nginx upgrade on-the-fly where Nginx continues to run the old instance (from memory) and at the same time runs (or at least tries to) the new instance from the current binary and modules. This will ensure that you don’t lose a single request during the upgrade, even if the new Nginx doesn’t run after the upgrade for some reason. This whole process is in most distributions in init scripts under the upgrade action, i.e. service nginx upgrade. To prevent unwanted Nginx upgrades when upgrading packages globally, use apt-mark hold/unhold nginx.

Depending on what content and behavior of the originals you want to support, you will need to study and possibly debug the behavior of the CDN cache with respect to the Cache-Control header or, perhaps quite fundamentally, the Vary header. For example, if the origin says in the response Vary: User-Agent, the cache key should include the user-agent of the client, otherwise it can easily happen that you return cached HTML for the mobile version to someone on the desktop. But that depends on what scenarios and content types you want/do not want to support. Supporting these scenarios often means a lot of work, and it also reduces the efficiency of the cache. Usually you won’t be able to get by with native Nginx directives and will have to handle some scenarios with Lua scripts.

Finally, I’ll mention that in the case of Nginx you also have a paid version Nginx Plus which offers various useful functionalities, a live dashboard and extra modules. Important is for example the resolve directive of the upstream server, which in conjunction with the resolver directive can detect a change in the IP address of the origin. However, the cost per instance is in the thousands of dollars per year, so its use would only make sense for a large commercial solution. If you don’t have thousands of dollars and would still like to have a realtime view of Nginx traffic, we recommend buying the $49 Luameter (demo). It works well, but if you’ll be handling hundreds of requests per second and a lot of unique URLs, expect increased load and RAM requirements. We have it disabled by default and only activate it when debugging.

Sample Nginx configuration

Below we have prepared a sample average basic configuration of Nginx, which in this model example does not do a reverse proxy in front of the whole domain, but provides a CDN endpoint https://cdn.company.com/myorigin.com/*.(css|js|jpg|jpeg|png|gif|ico) that retrieves content from the origin https://www.myorigin.com/*. Averaged because we further modify some directives due to the HW of individual PoP servers, and it also doesn’t include some additional security mechanisms that we don’t want to expose. On the servers this configuration is of course split into separate configuration files, which in our case we generate via Ansible.

The settings are especially different at the definition level for individual locations/origins, because you may want differently composed cache-keys, cache validity, limits, ignore cookies, have/not WebP or AVIF support, referer validation, active CORS-related settings, or maybe use a slice module, where you have to cache the 206 code and the cache key must also contain $slice_range. Similarly, for some origins you may want to ignore Cache-Control headers entirely and cache everything at a fixed time, or other per-origin specialties.

The configuration also contains various per-origin directories or files — these must of course be set up by your automation, which you are using to introduce the new origin into your CDN. So really just take this as a guide on how to grab and set up the various functionalities.



worker_processes 4;
worker_rlimit_nofile 100000;
pcre_jit on;

events {
  use epoll;
  worker_connections 16000;
  multi_accept on;
}

http {

  # IP whitelist to which no conn/rate restrictions should be applied
  geo $ip_whitelist {
    default        0;
    127.0.0.1      1;
    10.225.1.0/24  1;
  }
  map $ip_whitelist $limited_ip {
    0  $binary_remote_addr;
    1  "";
  }

  limit_conn_zone $limited_ip zone=connsPerIP:20m;
  limit_conn connsPerIP 30;
  limit_conn_status 429;

  limit_req_zone $limited_ip zone=reqsPerMinutePerIP:50m rate=500r/m;
  limit_req zone=reqsPerMinutePerIP burst=700 nodelay;
  limit_req_status 429;

  client_max_body_size 64k;
  client_header_timeout 10s;
  client_body_timeout 10s;
  client_body_buffer_size 16k;
  client_header_buffer_size 4k;

  send_timeout 10s;
  connection_pool_size 512;
  large_client_header_buffers 8 16k;
  request_pool_size 4k;

  http2_idle_timeout 60s;
  http2_recv_timeout 10s;
  http2_chunk_size 16k;

  server_tokens off;
  more_set_headers "Server: My-CDN";

  include /etc/nginx/mime.types;
  variables_hash_bucket_size 128;
  map_hash_bucket_size 256;

  gzip on;
  gzip_static on; # searches for the *.gz file and returns it directly from disk (compression is provided by our extra process in the background)
  gzip_disable "msie6";
  gzip_min_length 4096;
  gzip_buffers 16 64k;
  gzip_vary on;
  gzip_proxied any;
  gzip_types image/svg+xml text/plain text/css application/json application/x-javascript application/javascript text/xml application/xml application/xml+rss text/javascript text/x-component font/truetype font/opentype image/x-icon;
  gzip_comp_level 4;

  brotli on;
  brotli_static on; # searches for the *.br file and returns it directly from the disk (compression is provided by our extra process in the background)
  brotli_types text/plain text/css application/javascript application/json image/svg+xml application/xml+rss;
  brotli_comp_level 6;

  output_buffers 1 32k;
  postpone_output 1460;

  sendfile on;
  sendfile_max_chunk 1m;
  tcp_nopush on;
  tcp_nodelay on;

  keepalive_timeout 10 10;
  ignore_invalid_headers on;
  reset_timedout_connection on;

  open_file_cache          max=50000 inactive=30s;
  open_file_cache_valid    10s;
  open_file_cache_min_uses 2;
  open_file_cache_errors   on;

  proxy_buffering           on;
  proxy_buffer_size         16k;
  proxy_buffers             64 16k;
  proxy_temp_path           /var/lib/nginx/proxy;
  proxy_cache_min_uses      2;

  proxy_ignore_client_abort on;
  proxy_intercept_errors    on;
  proxy_next_upstream       error timeout invalid_header http_500 http_502 http_503 http_504;
  proxy_redirect            off;
  proxy_connect_timeout     60;
  proxy_send_timeout        180;
  proxy_cache_lock          on;
  proxy_read_timeout        10s;

  # setting up trusted IP subnets to respect X-Forwarded-For header (for multi-level proxy setup)
  set_real_ip_from          127.0.0.1/32;
  set_real_ip_from          10.1.2.0/24;
  real_ip_header            X-Forwarded-For;
  real_ip_recursive         on;

  ############################################################################
  ## Example configuration for:                                             ##
  ## https://cdn.mycompany.com/myorigin.com/* -> https://www.myorigin.com/* ##
  ############################################################################

  upstream up_www_myorigin_com {
    server www.myorigin.com:443 max_conns=50;

    keepalive 20;
    keepalive_requests 50;
    keepalive_timeout 5s;
  }

  proxy_cache_path /var/lib/nginx/tmp/proxy/www.myorigin.com levels=1:2 keys_zone=cache_www_myorigin_com:20m inactive=720h max_size=10g;

  server {

    server_name cdn.company.com;

    listen lan-ip:443 ssl default_server http2 reuseport deferred backlog=32768;
    ssl_prefer_server_ciphers on;
    ssl_ciphers EECDH+AESGCM:EDH+AESGCM;
    ssl_certificate /etc/nginx/ssl/cdn.company.com.nginx-bundle.crt;
    ssl_certificate_key /etc/nginx/ssl/cdn.company.com.key;
    ssl_session_cache shared:SSL_cdn_company_com:50m;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_dhparam /etc/ssl/webserver_dhparams.pem;
    ssl_early_data on;

    lingering_close on;
    lingering_time 10s;
    lingering_timeout 5s;

    resolver 127.0.0.1; # dnsmasq with logging to get an idea of the DNS traffic that Nginx is doing

    ...

    location ~* ^/myorigin\.com/(.+\.(css|js|jpg|jpeg|png|gif|ico))$ {
      set $origin_uri "/$1$is_args$args";
      root /var/www/myorigin.com;
      access_log  /var/log/nginx/www.myorigin.com/ssl.access.log main buffer=4k flush=5m;
      error_log   /var/log/nginx/www.myorigin.com/ssl.error.log notice;

      if ($request_method !~ ^(GET|HEAD|OPTIONS)$ ) {
        more_set_headers "Content-Type: application/json";
        return 405 '{"code": 405, "message": "Method Not Allowed"}';
      }

      more_clear_headers "Strict-Transport-Security";
      more_set_headers "Strict-Transport-Security: max-age=31536000";
      more_set_headers "X-Content-Type-Options: nosniff";
      more_set_headers 'Link: <https://www.myorigin.com$origin_uri>; rel="canonical"';

      expires 1y; # enforce caching in browsers for 1 year (use only consciously, if you are sure that when you change the content of the file on the original, the URL will also change)

      modsecurity on;
      modsecurity_rules_file /etc/nginx/modsecurity/myorigin.com.conf;

      # for requests that fall under CORS (e.g. fonts) we allow to load content only from selected domains
      set $headerCorsAllowOrigin "";
      if ($http_origin ~ '^https?://(localhost|cdn\.company\.com|www\.myorigin\.com)') {
          set $headerCorsAllowOrigin "$http_origin";
      }
      if ($request_method = 'OPTIONS') {
          more_set_headers "Access-Control-Allow-Origin: $headerCorsAllowOrigin";
          more_set_headers "Access-Control-Allow-Methods: GET, HEAD, OPTIONS";
          more_set_headers "Access-Control-Max-Age: 3600";
          more_set_headers "Content-Length: 0";
          return 204;
      }

      # we allow to load content only from the original domain (e.g. it prevents displaying our images on foreign domains)
      valid_referers none blocked server_names *.myorigin.com;
      if ($invalid_referer) {
          more_set_headers "Content-Type: application/json";
          return 403 '{"code": 403, "message": "Forbidden Resource - invalid referer"}';
      }

      set $webp "";
      set $file_for_webp "";
      if ($http_accept ~* webp) {
          set $webp "A";
      }
      if ($request_filename ~ (.+\.(png|jpe?g))$) {
          set $file_for_webp $1;
      }
      if (-f $file_for_webp.webp) {
          set $webp "${webp}E";
      }
      if ($webp = AE) {
          rewrite ^/(.+)$ /webp/$1 last;
      }

      proxy_cache cache_www_myorigin_com;
      proxy_cache_key "$request_uri"; # we don't need a schema or a host, because we store in per-origin cache and support only HTTPS
      proxy_cache_use_stale error timeout invalid_header updating http_429 http_500 http_502 http_503 http_504;
      proxy_read_timeout 20s;
      proxy_cache_valid 200              720h;
      proxy_cache_valid 301              4h;
      proxy_cache_valid 302              1h;
      proxy_cache_valid 400 401 403 404  30s;
      proxy_cache_valid 500 501 502 503  30s;
      proxy_cache_valid 429              10s;


      # due to keep-alive on origins
      proxy_http_version 1.1;
      proxy_set_header Connection "";

      proxy_set_header "Via" "My-CDN";
      proxy_set_header "Early-Data" $ssl_early_data; # for the ability to detect Replay attack on the application level
      proxy_set_header Accept-Encoding ""; # we always want to receive and cache RAW content from the origin, because we have a process for preparing static *.gz and *.br versions

      proxy_set_header        Host                    www.myorigin.com;
      proxy_set_header        X-Forwarded-For         $remote_addr;
      proxy_set_header        X-Forwarded-Host        $host:$server_port;
      proxy_set_header        X-Forwarded-Server      $host;
      proxy_set_header        X-Forwarded-Proto       $scheme;

      if (-f $request_filename) {
          more_set_headers "X-Cache: HIT";
      }

      if (!-f $request_filename) {
          proxy_pass https://up_www_myorigin_com$origin_uri;
      }

    }

    # internal location for webp
    location ~* ^/webp(/myorigin\.com/(.*))$ {
      internal;
      root /var/www/myorigin.com;
      set $origin_uri "/$1$is_args$args";
      access_log /var/log/nginx/www.myorigin.com/ssl.access.webp.log main buffer=4k flush=5m;
      expires 366d;
      more_set_headers 'Link: <https://www.myorigin.com$origin_uri>; rel="canonical"';
      more_clear_headers 'Vary';
      more_set_headers "Vary: Accept";
      more_set_headers "X-Cache: HIT";
      try_files $1.webp $1 =404;
    }

  }

}

Static compression as an essential helper

We did a random test of two commercial CDNs that have servers in Prague and neither provider is obviously using this great functionality/option. The commercial CDNs have to compress content using brotli or gzip on every request, which drastically drains their CPU and increases the response time several times, but the visitor pays for it.

We tested how long it takes our CDN and a commercial CDN to transfer eight javascript files (from 1 to 500 kB) in HTTP/2 stream — our CDN did it in 45 ms, commercial CDN in 170 to 200 ms. Moreover, even when using brotli compression, the files were 14% larger because we use the maximum compression level. We tested normally in Chrome and we got 1 ms latency to both CDNs because we and their PoPs are in Prague.

So how to solve the compression? In Nginx, you can enable static compression for both gzip and brotli (gzip_static on; brotli_static on; ). This, if understood and implemented correctly, can reduce the CPU load quite substantially and at the same time speed up the visitor’s loading time.

The way it works is that when static compression is active and the browser requests e.g. /js/file.js, Nginx looks at the disk to see if there is already a pre-compressed file /js/file.js.gz or /js/file.js. br. If such a file exists, it will send it straight away (without bothering the CPU with compression). The type of compression the browser supports is sent in the Accept-Encoding header (br takes precedence over gzip if the browser supports it).

Nginx does not create .br or .gz files for you. Nor does it try to download these files from the originals. Frontend builds often create these *.br or *.gz files for their JS/CSS as part of the build, but they are simply not used here. You have to provide this yourself with your CDN. We’ve made a background process that continuously parses access logs and extracts “200 OK” requests for text files that don’t have their *.br or *. gz yet.

Because this is a background process, you can afford to choose the highest, most efficient, but therefore slowest compression level for compression. You’ll put a bit of strain on the CPU for once, but the reward will be an additional 5–15% lower transfer rate. In addition, the decompression speed in browsers is minimally affected (you can find benchmarks for this). Don’t forget to figure out how you will clean up the already expired *.br or *.gz after they expire. Also, how and if at all you will handle the situation when the query string contains e.g. ?v=1.0.5 to force the download of a new version of the file.

However you implement static compression, ensure that your files behave atomically during compression. In other words, store the final *.br or *.gz file next to it first, and only when the file is finally done, rename it to the destination location where Nginx expects it. You won’t have someone download a non-valid (only partial) file if a visitor hits the moment you compress.

Since we usually cache content in the browser for months, such a visitor would have downloaded e.g. broken JS/CSS until the cache is cleared, which is very annoying. We all know how unprofessional it is when developers tell a client to clear their browser cache.

Hint: If you don’t have a background process that will handle static compression for you, you should leave static compression disabled. This is because you will unnecessarily increase your IOPS when Nginx will look for *.gz or *.br variants.

JPG/PNG to WebP/AVIF conversion

If you want to reduce image bitrates by 30% to 90% (depending on how much the source images are already optimized), you can arrange for smart image conversion to modern WebP or AVIF format.

Be careful about the AVIF format though — while it is fully supported and well-functioning in Google Chrome, support in Firefox is still experimental and there it still exhibits various bugs described in this ticket, which will manifest themselves e.g. in not displaying some images. However, this experimental support is disabled by default, so Firefox does not send the image/avif for the Accept request header.

For inspiration, this is how we implemented WebP/AVIF support:

The background process analyzes the access logs and searches for the most frequently retrieved images with a defined minimum data size.
Using converters cwebp a cavif convert the source image, e.g. /images/source.jpg, to /images/source.jpg.webp (atomically, as in static compression).
In Nginx we have logic that when image/avif or image/webp occurs in the Accept header of the request, it tries to send the requested file with the extension . avif or . webp, if it exists on the disk. The solution can be based on a combination of maps and try_files or composing the contents of a variable and IFs.

If we have a real need for this, we may eventually centralise the process. That is, this process will not be done by each server separately, but will be managed by some central system that can select suitable images for optimization from the central logs, keeping statistics of real data savings by transfers, etc. This brings a certain degree of flexibility and the possibility to perform some operations in bulk. However, on the other hand, we like that the decentralization of these processes and the maximum autonomy of the individual PoPs minimizes the risk that some bug will reach the whole CDN. Another advantage is that each PoP optimizes its most loaded content according to the visitors there.

Search engines

It’s important to note that if you deploy a CDN and suddenly HTML images are loaded from another domain (unless you happen to use the CDN as a proxy for the entire site/domain), search engines will not index them as belonging to your domain, but to the CDN domain. Of course, you don’t want that.

The solution is to provide canonicalization in Nginx using the HTTP Link header, which tells the search engine where the actual source (origin) is. This way it will not index the image under the CDN domain, but under the source domain specified in the Link header. For optimal image indexing, we recommend that you also generate sitemap for images.

Example: the URL https://cdn.company.com/myorigin.com/image.jpg should return the HTTP header:



Link: https://www.myorigin.com/image.jpg; rel="canonical"

Using CDN in projects

The primary and preferred way of using our CDN is very simple and is also evident from the sample Nginx configuration.

If we want to deploy a CDN for content e.g. on www.myorigin.com the web developers just need to ensure that instead of /js/script.js, for example, this file is addressed as https://cdn.company.com/myorigin.com/js/scripts.js.

The base URL is our GeoCDN domain, followed by the domain of the original (without the “www”) and ending with the path to the file on the original.

The CDN administrators control which origin domains our CDN supports through Ansible. In Ansible, administrators can also set some specific behavior for each origin. In addition, for each origin it is possible to specify what type of content is supported, restrict URL shapes, define custom WAF rules, etc.

Tip: if you want to deploy a CDN to your site without requiring a single intervention in the application code and you are using Nginx, you can very easily help yourself with the native Nginx sub module. This allows you to easily replace the paths to selected files so that they are addressed from the CDN (typically in HTML or CSS).

Example:



sub_filter '<link href='/' '<link href='https://cdn.company.com/myorigin.com/';
sub_filter '<script src="/' '<script src="https://cdn.company.com/myorigin.com/";
sub_filter '<img src="/' '<img src="https://cdn.company.com/myorigin.com/";

sub_filter_types 'text/css' 'application/json' 'application/javascript'; # text/html is included automatically, but we also want to replace content in JSON API or CSS styles and JavaScripts
sub_filter_once off; # we want to replace all found occurrences

The example shows that it requires href/src as the first attribute of the HTML tag. Unfortunately, regular expressions are not supported by sub_filter. If this is not sufficient for you, you can solve this substitution in the application code. You’re probably using a templating system that usually forces you to use some form of base-path variable, so this should be a piece of cake.

Note 1: for content substitution to work, you must also set proxy_set_header Accept-Encoding “”; , so that the original text content is uncompressed and strings can be substituted.

Note 2: since the CDN is not deployed as a reverse proxy for the entire origin domain, the content loads faster in the browser. This is because the browser allows for more parallelization (HTML and assets are loaded from different IP addresses), so the resulting page build and render time is shorter. In reverse proxy mode, HTTP/2 multiplexing and prioritization helps a lot before full origin, but when the browser can load content from multiple different IP addresses, it is still a bit more efficient.

Security, protection against DoS/DDoS attacks and monitoring

With the help of the previous article on CDN components and this article, you should be able to get your CDN up and running with all the basic functionality.

I hope that this article has helped you and that someone may have found some ideas or settings that will help them to improve their web or application server.

If anyone has additional tips when looking at the proposed settings, or if they see any threats in our configuration, we would be happy to share them in the discussion. We’ve been tweaking the settings ourselves for years, reflecting the different needs and attacks we’ve had on our projects, so it’s an ongoing and never-ending process. Additionally, simulating real traffic to verify the effect of some settings is very difficult, so every lived experience is welcomed and we will be grateful for sharing.

In the next and last article of the How to build a CDN series, we will focus on various operational aspects of CDN operation — how to protect the origins, how to defend against DoS/DDoS attacks and how to have the whole CDN operation under control.

Thanks for reading, and if you like the article, I will be happy if you share it or leave a comment.

If you are interested in any other CDN-related details, ask in the comments or ask on X/Twitter @janreges. I will be happy to answer.

Test your websites with my analyzer

Sharing this project with your colleagues and friends will be the greatest reward for me for writing these articles. Thank you and I wish you all the best in 2024.

Desktop Application

Command-line tool

HTML report - analysis results

How to build a CDN (1/3): introduction and basic components

Ján Regeš — Fri, 09 Apr 2021 09:37:07 +0000

If your projects have high traffic and you need to deliver a lot of static files, there is nothing easier than getting a commercial CDN. But if you’re a technology enthusiast like us, you can build a real CDN yourself.

This is the first in a series of three articles and aims to introduce you to the issue and describe the basic components that make up the CDN (Content Delivery Network). The next two articles will describe the technologies used and their configurations, as well as various other tips regarding the operation, functionality and monitoring of the CDN.

Motivation to use CDN

The main motivation for using the CDN is clear — to ensure fast and reliable loading of the website and its content for all visitors over the world. But if you care about the operation of projects with a monthly traffic of millions of users and the traffic just from JS/CSS files is tens of TB, then sooner or later you will get to a state where your 1 Gbps internet connection simply stops to be sufficient.

Websites are usually composed of dynamic and static content. Dynamic content usually includes generated HTML code and data from various APIs (typically REST or GraphQL). Static content is made up of files such as javascripts, styles, images, fonts, or audio/video. A typical ratio for our projects is that dynamic content makes up 10% and static 90% of the total data transfer.

If you have a really high number of visitors, the introduction of the rule that static files are cached in a browser valid for one year will not help you much. Changing the contents of a file then requires a file with a new name or some “version” in the query parameter to force the browser to download the new file. If you do a release every few days, even if you use JS/CSS chunks, at least some part of JS/CSS will be recompiled and every visitor must download it.

Then, when you reach gigabit at the peak of traffic, you start to deal with what to do next and thus look for a CDN.

The main benefits of CDN from our point of view

Speed for global visitors — if you have a project hosted on servers in only one specific country, this increases the loading time of the pages in proportion to the distance. So the further around the world, the slower on the screen. The reason is high latency and low baud rate. But be careful here — if you have the vast majority of visitors from the local country (for us it is Czech Republic), make sure that your CDN provider has servers (PoPs) in the Czech Republic as well. Otherwise, the load speed for your primary users may slow down after deploying CDN. The Czech Republic is a small country, but has TOP-quality data centers and connectivity providers. Loading content only from foreign countries PoPs would be disadvantageous to Czech visitors.
Speed for local visitors — all browsers have a limit on the maximum number of concurrent requests per server IP address. If the browser can load content from multiple different domains and IP addresses (domain sharding), it will allow more parallelization and the content will load into the browser faster. This is especially important for JS/CSS/images and fonts that are part of the initial rendering of the page. HTTP/2 with multiplexing helps to solve this problem very well, but only to a certain extent. Based on real requirements/rendering tests, we conclude that even with HTTP/2 streams, where there are dozens of files in one stream, the resulting page display is slower than with the involvement of a CDN on another domain than the site itself.
Reduce the load on primary servers — if you don’t have a CDN, your primary servers and their connectivity must handle both dynamic content requests and relatively trivial static data requests. This is inefficient because the optimal server configuration for dynamic content is little different than for handling static files.
Content Optimization — A good CDN also provides tools for data/binary optimization of static content. As a result, less data is transferred and pages load faster (brotli compression, WebP or AVIF images).
Cost savings — even though it has long been possible to get an almost unlimited “thick” line to your primary servers, the jumps are quite drastic — why pay 10 Gbit, when 1 Gbit is enough for us 90% of the time?
Simplifying the life of DevOps — if the configurations of file/web and application servers for maximum performance and security are fine-tuned, then it is necessary to have all possible metrics from real operation. If the traffic for dynamic and static content is strictly separated, then the statistics are cleaner. It is therefore possible to make better decisions and optimize performance and security parameters exactly tailored to the specific workload.

Why we decided to build our own CDN

There are many commercial CDNs on the market, for example see at CDNPerf . The best known include CloudFlare, Amazon CloudFront, Google, Fastly, Akamai, KeyCDN or our favorite and recommended BunnyCDN or CDN77.

Our projects are most often visited by clients from the Czech and Slovak Republics. Unrivaled in such a case, both in terms of function, price and immediate professional support, is CDN77 and their awesome network. It is one of the best CDN to cover traffic from around the world. Their very strong ability is also video streaming for the world’s largest high-traffic projects.

Because we don’t want to invent a wheel in SiteOne, we first looked to see if any of the above-mentioned providers would suit us. Our requirements were:

100 TB data transfer per month (majority from Europe).
Low latency and fast transfers in the Czech Republic/Slovakia.
Very good coverage of the whole of Europe, good coverage of North America and sufficient coverage of other continents.
HTTP/2 (and fast deployment of HTTP/3 after it is more standardized).
Brotli compression, which is even 15% — 30% more efficient on text files than gzip (LZ77 + dictionary).
Automatic JPG/PNG conversion → WebP/AVIF, if supported by the browser (reduces data transfer without noticeable loss of quality by 30% to 1,000% depending on how much the source JPG/PNG has already been optimized).
TLSv1.3 with 0-RTT (zero round-trip) significantly reduces the hand-shake communication time of browsers with servers.
API for selective cache invalidation using regular expressions. Ideally with support for cache tagging by response HTTP header like X-Cache-Tags or X-Key.
DDoS protection & Web Application Firewall (WAF)
Access to logs and statistics.
100 GB of storage (typically for videos and large image libraries).
Custom HTTPs certificates.

Finding a provider that meets most requirements was not a problem. The problem was the price. For progressive players (such as BunnyCDN or CDN77) you can buy a service for about 1 000 EUR/month, for other leaders in the CDN market, the costs start at 3–4 000 EUR/month and increase in multiples. If you start working with such amounts as a budget for building your own CDN, the return on investment (ROI) will become more than interesting. Of course, there are other price-friendly providers on the global market, but usually their coverage in the Czech Republic/Slovakia is very weak, so they cannot be recommended for primarily local projects.

Combining the above requirements with our enthusiasm for IT challenges, we have come to the conclusion that we will build our own CDN. The resulting (but our own) CDN is not nearly robust as that of commercial providers, yet it meets all our requirements. A big advantage is the fact that we can scale very quickly according to our real needs, at low cost.

Another of our motivations for our own CDN is that we use GraphQL for all web projects in recent years. Unlike REST, this cannot simply be cached on a reverse proxy or CDN, because everything is POST requests to one single URL endpoint. Of course, there are already attempts in the world, however, no commercial CDN offers a sophisticated cache of POST requests. We have types of projects where clever selective caching of POST requests at the CDN level (probably written in Lua) could greatly ease application servers. For us, this is another useful benefit that commercial CDNs will not offer for a long time.

At the end of this chapter, it should be noted that our CDN is designed primarily for handling static files and its deployment on the web does not require any changes in the DNS origin of the domain. Therefore, our CDN do not serve as a proxy for absolutely all requests to the website (which is the usual way of deploying commercial CDNs), only to static files. To deploy our CDN, it is necessary to prefix file paths with our main CDN domain, which can be solved also very easily without the need to intervene in the application itself, eg. using the output filter in Nginx (sub_filter).

CDN components

In order for our CDN to meet all the required parameters, we first had to provide all the components and processes that are needed to operate a quality CDN. And of course learn some new areas. Because we manage more than 120 servers for our other projects, we had everything we needed to handle it technically and procedurally.

The following chapters describe in more detail the individual components of the CDN that you will need:

Domain — used mainly for configuring GeoDNS rules and possible referencing of other domains via CNAME.
GeoDNS — a network service that will direct visitors to the nearest servers according to your settings and requirements.
Servers — strategically located around the world, in order to minimize latency for visitors and maximize transfer speed.
Technologies and their configurations — fine-tuned operating system and reverse proxy with caching and content optimization (brotli compression, WebP, AVIF).
Operational tools — you will have many servers and need to solve orchestration, backup, monitoring, metrics, logs and much more.
Auxiliary applications — background processes that provide, for example, static brotli compression or conversion of images to WebP/AVIF.

Domain

First, choose and buy the second-level domain on which you will run the CDN. It is ideal to choose a domain that you achieve “cookie-less” requests. During heavy traffic, every byte saved is counted. In the examples of the article we will use “company.com” and its subdomain “cdn.company.com”.

You will manage the DNS zone file for this domain with the GeoDNS provider(s) of your choice.

Get an SSL/TLS certificate for the domain, whether from Let’s Encrypt or a commercial Certification Authority (CA). Consider a wildcard certificate, which will make your life easier if you use more than one subdomain. You can get trusted wildcard certificates from as little as 40 USD/year. I recommend, for example, ssl2buy.com and give a few seconds google the discount code. You will often get an identical certificate from the same CA for 30–40% of the price than elsewhere.

To prevent attackers from spoofing other IP addresses for your domain, setup DNSSEC for your domain. Check the correct DNS configuration yourself with the Zonemaster tool from CZ.NIC. We had to temporarily deactivate DNSSEC on our CDN because we use two DNSs in primary-primary mode (for each of them, GeoDNS rules and failovers are defined differently). In this mode, setting up DNSSEC on both providers is difficult because they would both have to share the same private key, or some other solution. So far, this manual intervention is complicated for providers, but they have promised to allow it in the future.

Whether you use this domain directly in URLs or just as a hostname so that you can route to the CDNs of other domains via CNAME is up to you.

GeoDNS with failover support

What you need GeoDNS for

A critical component of a real CDN is an area of interest, let’s call it: GeoDNS . You can also find it under the names IP intelligence , GeoIP , Geo-based routing, Latency-based routing , etc.

GeoDNS is a network service that translates a domain name into an IP address(es), taking into account the location/country from which the visitor comes. If someone is interested in details, they can study them in RFC 7871 (Client Subnet in DNS Queries) .

We, as the administrator of the GeoDNS settings of our CDN domain, can define various rules from which continents/states the traffic should be directed to which IP addresses (PoPs in specific states). To be precise — PoP (Point of Presence) can technically mean only one server or more servers, in front of which is a load balancer (typically eg HAProxy).

Because we needed to rent servers abroad and from various providers, in addition, we do not have many years of experience with some, so we needed to solve the guarantee of high availability. Therefore, the critical functionality of GeoDNS is also automatic failover — the ability to monitor the availability of individual PoPs and the immediate elimination or replacement of unavailable or non-functional PoPs in the CDN topology.

In practice, it looks like our URL status is monitored every minute on each PoP. When it starts to fail from more than one place at once, the set failover scenario is automatically activated, which, according to our per-PoP consideration, has 2 main forms:

Deactivation of the DNS record — in such a case it will direct traffic only to the second secondary PoP in the given locality (if any), or visitors will start directing to the default PoPs (in our case all in the Czech Republic).
Replacing the IP address with another — with this setting you can say “If the PoP in Paris goes out in France, let the traffic go to the nearby PoP in the Netherlands instead, and if it doesn’t happen by accident, to the PoP in Germany“.

Due to the minute TTL, a really non-functional PoP is deactivated or replaced by another, no later than 2–3 minutes to all end visitors. However, if you have at least two PoPs defined for each location (DNS resolves to 2 IP addresses), then the browsers will be able to cope with such an outage, and visitors may not even know the critical 2–3 minute moment, which we describe in the next chapter. If you have only one PoP defined for a site and you do not have a backup PoP defined for it, then visitors from this site are in case of failure to route to the default PoPs, which are set as default for “the rest of the world”.

Even given the minute TTL, it is necessary to think about the speed of DNS translation, this also has a significant effect on the page load speed. We therefore recommend that you choose a DNS provider that has anycast NS (Name Servers) worldwide. Cloudflare leads in the speed ladder, see benchmarks on DNSPerf.com . With a global DNS provider, you can be sure that your domain will be translated into units of up to tens of milliseconds around the world.

Browsers also help with high availability

Because high availability is essential for us, we use the native functionality of browsers, which can work with the fact that our CDN domain will be translated in all major locations to multiple IP addresses from different providers. The real behavior of browsers is then such that the browser randomly selects one of the IP addresses and tries to make requests to it. If the IP address is unavailable, the browser will try another IP address after a few seconds.

Failure of one of the IP addresses / servers / providers will not cause the required content to malfunction. It will only take a little longer to load the page. Today’s browsers are already really smart and very helpful in terms of outage detection, connection recovery and auto-retry logic. The driving force of this area are mainly mobile/portable devices, where there are frequent mini outages due to switching connectivity between BTS in mobile networks, their alternation with WiFi networks, etc.

Unfortunately, we have not yet found any publicly available information/specifications that would specify exactly how these auxiliary functionalities are implemented in individual browsers. We therefore rely only on our own tests and analyzes of behavior from current versions of individual browsers.

If you have studied this unique issue, share the information in the discussion with us :-)

Which GeoDNS provider to choose?

There are many GeoDNS providers to choose from — it is worth mentioning Amazon Route53, ClouDNS, NS1, Constellix GeoDNS, FastDNS from Akamai, EasyDNS, UltraDNS from Neustar or DNS Made Easy.

Due to high availability, we do not recommend relying on only one DNS provider, even if it has NS servers worldwide, with anycast IP addresses. Likewise, the distribution of changes is usually solved by one “central brain” and once every few years there are defects that eventually affect more or all NS servers at once (real experience from 2019). Therefore, we decided to go the route of redundant primary-primary settings, where we run all GeoDNS settings at two completely independent providers.

This is a bit annoying, because the AXFR protocol for DNS synchronization of GeoDNS zones does not support the problem, so we have to manage everything manually with two independent providers. We tested six GeoDNS providers and due to their grasp of “GeoDNS rule modeling” and monitoring, we cannot imagine that someone would propose a uniform specification for GeoDNS issues in order to synchronize DNS zones.

We at SiteOne have chosen for GeoDNS as the first ClouDNS provider to offer excellent options for setting up the “geo rules” themselves and an automatic failover with multiple behavior options. The provider has DDoS protection, has anycast IP addresses and low latency from the Czech Republic/Slovakia. It also provides traffic statistics and has very decent limits and pricing due to the number of DNS requests (in the basic GeoDNS package there are 100M queries per month).

The big advantage is non-stop chat support 24/7, which can answer technical questions in a matter of minutes, or tailor the price program, even if you do not fit into any of the pre-prepared packages.

As the second DNS provider, we chose the company Constellix (sister of DNS Made Easy), which offers similar options for setting up GeoDNS issues, monitoring and failover as ClouDNS. The strength of Constellix is the definition of weights (traffic distribution) in some situations.

At first, we also liked Microsoft Azure and its Traffic Manager, but in the end we gave it up because it didn’t give us the ability to manage traffic in some countries the way we wanted. However, Azure pleasantly surprised us with its pricing policy in the area of DNS compared to other global cloud providers, such as Amazon or Google.

Route53 from Amazon is also worth considering, which is more cost-effective if DNS resolves to IP addresses in AWS. However, if you send tens of TB or more from AWS per month, then expect monthly costs in the thousands of USD/EUR. But you already have the same or more expensive as if you conveniently rent a commercial CDN.

For all GeoDNS providers, however, the price depends mainly on the number of DNS requests and the number and frequency of health checks. In other words, from the number of PoPs you have in the CDN, or from how many places around the world you have them monitored to eliminate false positives and, of course, the monitoring frequency, which can usually be set from 30 seconds to tens of minutes — our default is one minute. You can also reduce the price for DNS requests many times by increasing the TTL for individual DNS records. However, and of course at the expense of the speed of a possible auto-failover, because the recursive NS cache will keep the translations longer in their cache.

For the biggest pioneers, there is also a variant to build your own GeoDNS service with your own name servers. But for this to make sense and real value, anycast IP addresses would be needed. Also a number of other reliable servers around the world with DDoS protection and then understand, select and adapt eg EdgeDNS or Czech Knot DNS (which also uses Cloudflare). However, commercial GeoDNS services are relatively cheap and reliable, so we can’t imagine an ROI that would make sense with our own small, non-commercial DNS solution.

Servers

GEO server layout and provider selection

If you are going to build your own CDN, then take into account that if it is to be a real CDN, you will need 8–10 servers around the world in even the smallest setup. We currently have twenty production and three test ones. We also have two development PoPs, available only on the internal network, that developers can use to deploy CDNs to internal development domains as well.

The main goal of CDN is to provide visitors around the world with the lowest possible latency and the highest transfer rate to the data that CDN caches locally.

The ideal situation is if you have the opportunity to analyze visitors to projects for which you use the CDN. If you know from which continents/countries what traffic and which data transfers you handle, then you can strategically decide on which continents and in which countries you will place your PoPs.

In the beginning, you won’t have servers in every state, and probably not on every continent, so consider “catchment areas.” However, based on real latency and traceroute measurements, you will often be surprised that the latency between ISPs in each state does not correspond to geographical proximity. Peering between states and individual ISPs is different, very often “neighbor is not neighbor”. E.g. from Finland, you may have significantly lower latency to the Czech Republic than to Poland for some providers. If you do not yet have any servers abroad through which you could perform measurements, the WonderNetwork.com tool can also help you . This tool shows the latency between different cities of the world, vice versa. Of course, this is a fee for the ISP used in this tool, but it is sufficient for orientation.

Do a good market research when choosing a server provider and connectivity. Of course, price is not the only or last factor, but it must not be the first. We focused on:

Provider quality and reputation — In each state, 2–3 robust providers usually stand out, who should be the most reliable. Their robust infrastructure should be better able to withstand potential DDoS attacks. We do not recommend small and unverified providers.
Local and global connectivity of the provider — it is necessary to take into account that the servers will handle large traffic. Partly in their own country, some are catchment areas for other states as well. Therefore, focus on studying and comparing their connectivity abroad. A quality provider describes its connectivity on the web because it is usually proud of it. SuperHosting , which we have part of our infrastructure for 15 years, does great for us .
Quality support — sooner or later some problems will definitely occur and it is necessary to react quickly. As a first test, you can choose to communicate with support about what line the server will actually have available (usually 100 or 1,000 Mbps), what aggregation it has, and what they mean by “Unlimited traffic.” If this includes your estimated XY terabytes per month that the server will need to handle. You can ask the second question to the possibilities and functioning of their DDoS protection.
The expected data traffic on a given server should ideally be included in the price, or there should be a clear pricing policy in advance.

Our CDN currently counts 20 PoPs and each is from a different provider. So far, our primary Czech/Slovak visitors are covered by six PoPs (4 × Prague, Brno and Bratislava). Then Germany (two PoPs) and Poland (two PoPs) for part of Eastern and Northern Europe. We also have one PoP in France, Italy, England and Hungary. The two PoPs also cover North America. South America is covered by only one PoP in Sao Paulo. Africa is covered by one PoP in Cairo, Australia by one PoP in Sydney, the Russian Federation by one PoP in Moscow and Asia by one PoP in Mumbai. These PoPs also include selected neighboring states, where it made sense to us according to the measured latencies.

In the next chapter, you will also find information on how you can cover various secondary sites very effectively with the help of a commercial CDN, if it makes functional and economic sense. For our CDNs, this makes sense to us, so we have covered most of the non-redundant sites described above with commercial CDNs, and we only have some our PoPs as a backup.

Recommendation : select at least two independent providers in each important location — ideally with different foreign connectivity. Try to ensure that at least two independent PoPs (IP addresses) are resolved in each DNS site. In the event of a failure of one of the PoPs, visitors will not have to wait 2–3 minutes for DNS failover, because browsers detect this and immediately switch traffic to the other IP address. In current browser versions, you will only see “ Connecting…” for 2–3 seconds and the content will then be read immediately from the second IP address.

Tip: You can test the quality of your CDN topology (especially with regard to latencies from different parts of the world) using the online tool MapLatency.com . This is great in that it measures latency from endpoints at different ISPs, which means that it measures more realistic latency of visitors to your CDN, not just from servers/datacenters. For us, the coverage of Europe is key and we have it very good for our needs (see screenshot). The CDN Latency Test from CDNPerf fulfills the same purpose — but it measures latencies from data centers, not from end devices.

Use of commercial CDN for better coverage

At some point, you will be very sorry (as well as we) that you will not give visitors in remote corners of the world (for us it is mainly Africa, Asia, Australia and South America) such comfort (latency and transfer speed) as in Europe. But even that has its own effective and simple solution.

You can cover remote corners of the world with a commercial CDN provider that has a robust infrastructure and strong coverage in these locations as well. Because these are low-traffic secondary sites (hundreds of GB to TB units per month), you can take advantage of a pay-as-you-go CDN provider and cost you a few tens or hundreds of dollars a month. On the one hand, this may seem like parasitism, but on the other hand, when we examined the IP addresses of commercial CDNs in different countries, we found that some providers shared their own infrastructure in different locations. So it’s not unusual. We all want to deliver maximum value to our clients, but at the same time we have to think about the economy and operating costs.

How to set it up?

The commercial CDN will provide you with a hostname , usually under their 3rd order domain managed in their GeoDNS (eg “mycompany.cdn-provider.com”), to which you can point your CDN domain through CNAME.
For a commercial CDN, set it to “listen” to your “cdn.company.com” domain in addition to the hostname mentioned above . You will also need to set up an SSL/TLS certificate. The provider will probably offer you the opportunity to use Let’s Encrypt, but we recommend using your own SSL certificate purchased from a public CA, uniform for all PoPs. If you have different certificates in different locations and, moreover, with a short validity, it will not be possible to use SSL pinning, which you may need in some situations.
For your GeoDNS provider, route the CNAME of your domain in all secondary locations to the hostname of the commercial CDN. Technically illustrated: set it to “(Africa) cdn.company.com → CNAME mycompany.cdn-provider.com”.
You must avoid loops . You must not tell the commercial CDN to listen to “cdn.company.com” and at the same time set it as the original domain. The African PoP would have resolved the DNS origin to itself. To prevent such looping, you must ensure that a few major PoPs will listen on the domain, eg “cdn-src.company.com” (it directs A records to eg the three main PoPs in the EU). You then set “cdn-src.company.com” as the origin, so if the PoP commercial CDN does not have the file in its cache, it will download it from one of the main PoPs in the EU through “cdn-src.company.com”.
If, over time, you find out from statistics and billing that it will be more advantageous for you to cover a location with your PoPs due to increased traffic, then you always have the option and you can deploy it without an outage.

The disadvantage of secondary sites is that they are very far from the origin servers, and it is likely that most first visitors will wait quite a long time before the cache heats up. Therefore, it is advisable to prepare a background process that will regularly push these most queried files into the commercial CDN storage from the TOP requests statistics. There will be a good chance that remote visitors will be able to retrieve content from the local PoP immediately, even though it was called for the first time at that PoP.

Hardware

If you already have selected providers, you still have to choose a specific physical or virtual server from their menu. This of course depends on your budget. But also decide how important the site is to you and your visitors.

A few of our verified recommendations

Virtual vs. physical server — this is a rather controversial topic and it is not appropriate to generalize it. If the economy allows, choose physical servers for critical servers, even if only those from the basic menu. Redundant disks are a must, ideally with redundant power supply. With a physical server, you usually get a 1 Gbps uplink and a direct physical connection directly to the ToR switch. There is a much lower chance that you will struggle with sharing CPU and IO or connectivity on a physical hypervisor running hundreds, or dozens of virtual servers at best. If you’re lucky, they have a shared “tube” of * × 10 Gbit, or worse, they have 1 Gbit. With authenticated providers you don’t even have to worry about virtual servers, just watch the aggregation and performance (eg benchmark nench). Over time, the collected metrics will also tell you a lot, especially for redundant PoPs that will handle ± the same traffic (DNS round-robin). As a result, we have very quickly detected very aggressive CPU throttling or volatile IO performance at some providers.
CPU — if you do it smartly and tighten the static gzip and brotli compression correctly, you will be able to handle hundreds of Mbps even with 1–2 CPU cores. However, if you do not provide static compression and ad-hoc compress each request, you need at least 4–8 cores. It is good to choose a modern CPU with a high clock speed (turbo at 3 GHz+). By the way, the absence of static compression is something that, according to our benchmarks, commercial CDNs are often missing, and as a result, they send textual content much more slowly than with it.
RAM — the minimum is 1 GB, but the more, the better. This is because the cache filesystem (PageCache) is stored in RAM. Usually, this cache will contain most of the small but most downloaded files (typically JS/CSS/fonts). The more of them fit in the RAM, the lower the IOPS requirements, so you can more safely afford a larger rotating HDD instead of an SSD. When you have enough RAM, even with hundreds of Mbps, you can have almost zero IOPS on storage.
SSD/NVME vs. HDD — of course we recommend SSD/NVME for handling high IOPS. But the real need depends on the actual operation. We have preferred SSDs over high capacity everywhere. 100–200 GB per-server is enough for us. But it is also necessary to take into account the fact that you need to log in. It is optimal to rotate the logs continuously, send them to a collection point for further processing and clean them.
Connectivity — it is advisable to have a realistic idea of how much traffic and especially its peaks you will handle. As for the less important PoP, 100 Mbps will suffice. However, when it comes to PoP in an important location, prefer 1 Gbps and distribute the load among multiple PoPs (round-robin DNS, when more A records are returned). You will achieve overall higher throughput and lower load on specific ISPs, in addition to higher availability of the CDN as a whole. Whoever has the budget and the real need for this, of course, can choose a 10 Gbps port, but it is necessary to count on a high price.

Orchestration

Because you will manage several servers around the world with 99% identical configuration, you need to ensure automated installation, configuration, and mass orchestration.

We use and recommend Ansible. Historically, we’ve also used Puppet, Chef and SaltStack for a while, but only Ansible meets what we need for many years. Over the years of use, we have over 80 own roles in it, so when preparing each additional server, the most time-consuming is order and waiting for an activation e-mail. If we have 10 or 50 servers, it doesn’t matter from the orchestration point of view.

Whether you manage the servers with any orchestration tool, we recommend a few things to help you eliminate the “global outage”:

When deploying changes to all servers in bulk , be careful — deploy to individual servers should run in series rather than in parallel. Possibly also in parallel, but for example after three servers at once simultaneously (in the Ansible playbook this is controlled by the “serial” directive). If the deploy on one of the servers fails, force the deploy to abort (in the Ansible directive, “max_fail_percentage”).
Before restarting/reloading components, first check the validity of the configuration (configtest). Eliminate outages associated with invalid configuration. Some distributions and their init scripts do not do this automatically. Ideally, configtest should be performed before restarting the service to prevent the service from stopping and starting.
At the end of deployment to an individual server, perform a set of CDN functionality tests on that particular server. E.g. by calling the status URL and ideally also by calling some functional URL from one of the originals, which will be returned from the cache and also one URL, which, on the contrary, will not be in the cache and will have to be downloaded from the original. We also have one “service” origin domain for these purposes. In conjunction with serial deployment, you can be sure that you will not cause outages on more than 1 PoP at a time.

Server configuration and reverse proxy (cache)

If you already have prepared servers, the really interesting and
creative part awaits you — the preparation of configurations of individual SW components, of which the CDN is composed.

In the next article (in 2–3 weeks), we will focus on operating system settings (with real settings for Debian Linux), reverse proxy (Nginx as cache) and other aspects related to CDN traffic — content optimization, security, attack protections or settings that affect search engines behavior. And maybe also cache tagging and its invalidation based on Varnish (we are working on it these weeks). This is a very useful feature that very few CDN providers offer and only in their most expensive plans.

Thanks for reading, and if you like the article, I will be happy if you share it or leave a comment. Have a nice day :-)

If you are interested in any other CDN-related details, ask in the comments or ask on X/Twitter @janreges. I will be happy to answer.

Test your websites with my analyzer

Sharing this project with your colleagues and friends will be the greatest reward for me for writing these articles. Thank you and I wish you all the best in 2024.

Desktop Application

Command-line tool

HTML report - analysis results

Note: this article was written with the best intentions and without advertising purposes. However, it contains a few partner links in the text to specific providers with whom we have many years of excellent experience.

Forem: Ján Regeš

How to build a CDN (3/3): security, monitoring and practical tips

Security

DDoS attack protection

Monitoring

Other useful tools

Tips and highlights from the implementation

Conclusion and practical experience

I'm asking everyone - let's report bugs

Test your websites with my analyzer

SiteOne Crawler — website analyzer you will ♥

Links

Main features

For developers and QA engineers

For DevOps

For website owners and consultants

Videos

Desktop Application

Command-line Interface

HTML report

How to build a CDN (2/3): server and reverse proxy configuration

Operating system

Recommended kernel configuration

Reverse proxy and cache

Sample Nginx configuration

Static compression as an essential helper

JPG/PNG to WebP/AVIF conversion

Search engines

Using CDN in projects

Security, protection against DoS/DDoS attacks and monitoring

Test your websites with my analyzer

How to build a CDN (1/3): introduction and basic components

Motivation to use CDN

The main benefits of CDN from our point of view

Why we decided to build our own CDN

CDN components

Domain

GeoDNS with failover support

What you need GeoDNS for

Browsers also help with high availability

Which GeoDNS provider to choose?

Servers

GEO server layout and provider selection

Use of commercial CDN for better coverage

How to set it up?

Hardware

A few of our verified recommendations

Orchestration

Server configuration and reverse proxy (cache)

Test your websites with my analyzer