Running Traefik as a systemd service

Running traefik as a systemd service

Intro

According to its purveyors, "Traefik is a leading modern open source reverse proxy and ingress controller that makes deploying services and APIs easy." At a previous job, we were migrating our Hashicorp Nomad environment from Fabio to Traefik when I left. I didn't spend much time with it before I left, but I remembered it was fast, simple, and had great Nomad and Consul support (always a plus for this Hashicorp fanboy).

Fast forward a couple of years. As I attempt to build up my feeble SRE skills, I'm learning more about the Four Golden Signals of monitoring. But if I want to create fancy Grafana dashboards from fancy Prometheus metrics, nginx isn't going to cut it (the open-source version doesn't provide site-specific metrics). By contrast, Traefik provides the Prometheus metrics that plants crave .

But we'll get to the monitoring specifics in a future post. For now, let's talk installation!

Installing on baremetal

  • Download the binary from Traefik's github page and install (I typically use /usr/local/bin/ as my installation directory).

  • OS-level configuration:

  • Create a system group and user
  • Set up a systemd tmpfile. Place the following into /etc/tmpfiles.d/traefik.conf: d /run/traefik 0770, then run systemd-tmpfiles --create
  • Create the systemd unitfile and place in /etc/systemd/system/traefik.service. The below is what I use, it may need slight tweaking for your use case.
[Unit]
Description=Traefik (pronounced traffic) is a modern HTTP reverse proxy and load balancer that makes deploying microservices easy.
After=network.target

[Service]
User=traefik
Group=traefik

ExecReload=/bin/kill -HUP $MAINPID
ExecStart=/usr/local/bin/traefik --config-file=/etc/traefik/traefik.yaml
PIDFile=/run/traefik/traefik.pid

ProtectHome=true
ProtectSystem=full
PrivateTmp=true

CapabilityBoundingSet=CAP_NET_BIND_SERVICE
AmbientCapabilities=CAP_NET_BIND_SERVICE
NoNewPrivileges=True

WorkingDirectory=/var/lib/traefik
ReadWriteDirectories=/var/lib/traefik/plugins-storage

[Install]
WantedBy=default.target

Traefik File Provider (Configuration Discovery)

Traefik has a number of Configuration Discovery Providers. I use Hashicorp Consul in my lab, but I'm going to go over the file provider, since it doesn't require deploying another app and/or running on a public cloud.

Enabling

Put this into your static config:

providers:
  file:
# This is my personal config, feel free to
# use whichever directory you want
    directory: /etc/traefik/dynamic
    watch: true

Once that configuration is active, you can place Traefik config files into that directory and Traefik will apply them immediately, no restart needed. Sounds convenient, right? Of course, but there are a few footguns as well.

Perilous Permissions

Traefik needs write permissions of your dynamic config folder and every file in it. It wants to create inotify watchers on 'em (makes sense for dynamic stuff, right?).

If your permissions are bad, Traefik will log a permissions error, but it also logs a ton of tls: bad certificate errors immediately afterwards.

If your traefik-fronted sites all start serving up the default Traefik certificate, you might have this problem. Make sure you scroll up all the way and check for permissions errors.

Validation? Nah.

Traefik does not validate its file-based discovery config. If you put invalid config in your directory, Traefik will silently ignore it (at least, that was my experience). It seems like a pretty bad user experience, but there is a silver lining...

JSON Schema-based validation

As the Traefik docs mention, the Traefik file provider has a JSON schema. This means you can validate your config and even get helpful error messages about what might be wrong. All you need to do is convert the Traefik config to JSON and compare it to the JSON schema.

Because YAML is a superset of JSON, it's easy to convert from Traefik's YAML config to json. I use yq for this: yq -p yaml -o json grafana.yaml > grafana.json

Sample Traefik dynamic configfile. Can you spot the mistake?

http:
  routers:
    grafana:
      entryPoints:
        - https
      rule: Host(`grafana.example.com`)
      service: grafana@internal
      tls: true
  services:
    grafana:
      loadBalancer:
        - url: http://127.0.0.1:3000
tls:
  bad: example
  certificates:
    - certFile: /etc/ssl/certs/grafana.example.com.crt
      keyFile: /etc/ssl/private/grafana.example.com.key

Validation with python script

Here's a simple validation script courtesy of ChatGPT:

#!/usr/bin/env python3

import json
import sys
from jsonschema import validate, ValidationError

def main():
    schema_file, json_file = sys.argv[1], sys.argv[2]

    with open(schema_file, 'r') as f:
        schema = json.load(f)

    with open(json_file, 'r') as f:
        data = json.load(f)

    try:
        validate(instance=data, schema=schema)
        print("JSON data is valid against the schema")
    except ValidationError as e:
        print("JSON data is invalid:", e)
        sys.exit(1)

if __name__ == "__main__":
    main()

After that, you can run the script to validate the data:

./validate-json.py traefik-v2-file-provider.json grafana.json

If your syntax is wrong, you will probably get an error such as (edited for length):

JSON data is invalid: Additional properties are not allowed ('bad' was unexpected)
[...]
Failed validating 'additionalProperties' in schema['properties']['tls']:                                                              On instance['tls']:
    {'bad': 'example',
     'certificates': [{'certFile': '/etc/ssl/certs/grafana.example.com.crt',
                       'keyFile': '/etc/ssl/private/grafana.example.com.key'}]}

You can also validate the json schema with ansible's utils.validate:

- name: validate grafana config v2
  tags: ['validate']
  delegate_to: localhost
  become: no
  run_once: yes
  block:
    - name: validate grafana json
      ansible.utils.validate:
        data: "{{ lookup('ansible.builtin.file', '../files/grafana.json') | from_json }}"
        criteria: "{{ lookup('ansible.builtin.file', '../files/traefik-v2-file-provider.json')  | from_json }}"
        engine: ansible.utils.jsonschema
      register: result

Unfortunately, I haven't figured out how to get the helpful error messages that the Python script provides.

In Conclusion

Now that Traefik is running, it's time to take a look at the docs and see what cool stuff is available. Stay tuned for my next post about using Traefik's Prometheus metrics to create a "Four Golden Signals" Grafana dashboard!