What is the setting?

Well, I want to access my internal services while I am not at home, simple as that. The available options are basically:

  1. Set up a ordinary VPN like WireGuard or OpenVPN
  2. Setup Cloudflare Tunnels or similar
  3. Use Tailscale or similar

So why did I choose Tailscale?

  1. It is easy to maintain
  2. It is easy to deploy on devices (compared to some VPN solutions)
  3. Supported by many devices
  4. I dont need to open any ports
  5. Secure as client is opensource and Wireguard is used as underlying protocol

Generally speaking, I don’t wanna maintain more than I already do without sacrifying on security.

What does my setup look like?

{TODO: Add diagram}}

So, I have OPNsense as my firewall, router & DNS server. All the *.internal domains are resolving to my Traefik Reverse-Proxy running on Kubernetes. Traefik uses IngressRoutes to forward the requests to my services.

Tailscale describes four different approaches to integrate it within k8s:

  1. Run it as a sidecar to directly expose the pod of my service to the Tailnet. I don’t like to expose the whole pod if it is not necessary.
  2. Run a Tailscale proxy to access the service directly. This has the downside that I need to run all services on port 80/443 and cannot use the certificates of my CA
  3. Use the Tailscale Operator. This is quite straight-forward and also allows seamless integration into my Traefik IngressRoutes
  4. Run a Subnet router. I don’t like it as I do not want to expose all my services, but just some.

Furthermore, the Tailscale pods need to run as privileged containers - if possible I try to use as less privileged containers as possible. For sure, I could go that route, setup AppArmor, reduce the capabilities to the bare minimum etc., but I was sure the was a better way of handling it. So I was looking for another solution.

Ah well, and I was not too happy, that I cannot configure external users using my firewall rules. For sure, I could go with Tailscale ACLs, but having one common platform makes maintenance way easier and helps in terms of observability.

My approach

{{ diagram }}

Basically, I have outsourced Tailscale into a dedicated VM. All (relevant) requests of my TailNet are sent to this node. Local requests are then forwarded to my Traefik instance and the result is returned to the requester.

So, how does it work? At first, we need to make sure that all nodes resolve *.internal domains to the relay node. This is quite easy, you just need to setup an additional nameserver in the Tailscale Admin console.

{{ dns img }}

Now the requests are being sent to our relay node, but we somehow need to answer them, so that not only DNS requests, but also HTTP/S requests are being sent. To do so, we setup dnsmasq and return our own (tailscale) IP for all requests to *.internal.

{{ code }}

Okay, we can now see that our requests to test.internal are sent to our relay node. This can be checked by using tcpdump for instance: tcpdump -i eth0 tcp port 443. You should be able to see incoming packages.

Finally, we need to setup a reverse proxy that receives the requests from other Tailscale nodes, forwards the request to Traefik and returns the response. Now there are basically two ways of implementing this:

  1. The Relay receives the request, establishes a TLS connection to Traefik and forwards the request, decrypts the response and then sends it encrypted using another TLS connection to the requester.
  2. We have no TLS termination on the relay.

As I don’t plan to integrate a packet inspection between Tailscale traffic and Traefik, I have chosen the 2nd approach - it’s also way easier to setup & maintain.

My nginx config is as follows:

{{ nginx config }}

Digression: Running & deploying Tailscale on NixOS

Conclusion

That’s basically it. If you are interested in the whole code, you can see it here.