netpath optimization with split horizon dns

In the days of high speed networks, most users don't think - or care - about the path their traffic takes to land at their destination.

However for high traffic, low latency workloads such as media transmission or high performance data jobs, every hop and every millisecond counts.

For networks in which every VPC is directly peered with other VPCs in the network, where every application is routed and accessible through a service mesh, systems can be just a few hops away over the network backbone.

But for enterprise scale, inherited, cross cloud, and on-prem data centers, the reality of the situation is that many systems will be in fact geographically disparate and routed through multiple subnets, adding sometimes-vital hops and latency to every request and response.

While a more secure modern cloud-native microservice architecture would rely on a trustless network using multiple edge ingresses and security-by-default using mTLS, many legacy systems still rely on older - and sometimes nonexistent - network security mechanisms. This means that additional layers of security must be added aroud these systems not only internally, but also to external traffic.

This situation is made even more complex when handling external traffic and internal traffic in the same systems. Generally external traffic will traverse a much different route than internal traffic, to take into consideration additional WAF, firewall, ingress, auth middlewares, and proxy systems utilized to distance and secure destination systems from untrusted public access.

To optimize the path that network requests take for clients, not only can applications be run in multiple regions, but also by using multiple DNS servers, the route clients take can be further optimized.

Most networks have internal DNS servers which will provide additional hosts / systems to network-internal users which are simply never exposed to the external internet.

By utlitizing split-horizon DNS, operators can improve performance by resolving internal requests against their internal DNS servers, and then forwarding the request to the external DNS servers. External users will only ever to resolve against the external DNS servers.

The record on the external DNS servers may point to a WAF or proxy endpoint, while the record on the internal DNS servers will point to the actual application endpoint. Not only does this reduce load on WAFs, proxies, etc - it also improves performance and arguably security as well by keeiping all traffic "inside the house" (on the security front - this assumes you already have proper monitoring and logging at origin and don't solely rely on WAF/proxy for threat analysis / prevention).

This same design is used on a much smaller scale for this domain,

This site - along with the majority of my other internet resources - is hosted on a bare-metal Kubernetes cluster in my garage.

Of course I do not want to expose this directly on the public internet, but I do need to make select services available to the public - otherwise you wouldn't be reading this right now!

For my internal DNS services, I use CoreDNS, which is a highly performant and easily configurable DNS server written in Go. All internal devices resolve against these DNS servers, while external DNS is hosted in AWS Route53 (wary of the DNS bootstrapping issue - which recently affected Facebook in a very-public hours-long outage - I am still relying on a cloud provider for DNS).

Inside my network, resolving points directly to the Istio Ingress Gateway load balancer IPs.
        ~ dig +short
However when resolving against a public DNS server, you are pointed to the public ingress IP, which routes through a reverse proxy before ultimately ending up at the same Istio Ingress Gateway.
        ~ dig +short @
To make things even easier from an operational standpoint, since my proxy servers resolve against the internal DNS servers, as my internal network grows and changes, I can simply add new IPs to the internal DNS server and the proxy servers will automatically update.

For more detail on my homelab check out my recent post To the Cloud and Back.

last updated 2022-01-02T12:12:49-0800