Ansible Automation

Overview

In Homelab-Network-Architecture I alluded to using Ansible as my configuration management software of choice.

I have only recently started using Ansible for my homelab. However, I have been using it professionally in various roles for several years.

Before integrating Ansible, my normal approach for new homelab systems went something like this:

  1. Install OS.
  2. Ssh into box.
  3. Fiddle with the configuration until I am satisfied.
  4. Leave box alone for a few months.
  5. Update something or try something new.
  6. Break box.
  7. Repeat step 1.

This approach is great for learning and experimenting with new services. However, once you start depending on those services or machines, it becomes painful — especially if, like me, you have the memory of a fish 🐟.

Home Router Woes

At one point I was learning about dhcp and dns running inside podman and NATing behind a Linux bridge and thought to myself, hang on, I could technically configure this at home as my main router on something like a raspberry pi.

And yes, I did. This became my router for a while. The biggest issues I faced were:

  1. My wife complaining when the internet went down mid-show.
  2. Updating packages or the OS occasionally broke the router — and with my “fish 🐟 memory”, there was rarely a quick fix, which of course led straight back to point 1 😂.
Warning

There were other, more serious issues — such as running without properly configured firewalls or sensible default settings.

Tread carefully.

I switched over to using OpenWrt on the Raspberry Pi — and yes, even that is configured via Ansible.

Ansible

You might be asking why ansible 🤔?

From their site

Ansible provides open-source automation that reduces complexity and runs everywhere. Using Ansible lets you automate virtually any task.

Here are some common use cases for Ansible:

  1. Eliminate repetition and simplify workflows
  2. Manage and maintain system configuration
  3. Continuously deploy complex software
  4. Perform zero-downtime rolling updates

Ansible uses simple, human-readable scripts called playbooks to automate your tasks. You declare the desired state of a local or remote system in your playbook. Ansible ensures that the system remains in that state.

It is not all sunshine and rainbows. There are some drawbacks:

  1. Can take a very long time to configure your machine depending on what you do - this is because it essentially SSHs into a machine and needs to copy all configurations to that machine. In a homelab situation this is a non-issue.
  2. It does not save state in the same way Terraform does so you need to check, before creation in some cases see - name: "network | Check if exists - {{ lan_bridge_name }}" below.

What does Ansible look like?

This is a direct copy from my Cloud Server which is my main machine running all my service - hardware.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
---

- name: network | Get physical interface name
  ansible.builtin.set_fact:
    fresh_os_eth_name: "{{ ansible_facts.interfaces | select('match', '^' ~ fresh_os_eth_prefix ~ '[^.]*$') | list | first }}"

# Again seems like debian needs this
- name: network | Persist IPv6 disable
  become: true
  ansible.builtin.copy:
    dest: /etc/sysctl.d/99-disable-ipv6.conf
    mode: u=rw,g=rw,o=r
    content: |
      net.ipv6.conf.all.disable_ipv6=1
      net.ipv6.conf.default.disable_ipv6=1

# Disable IPv6
- name: network | Disable IPv6 on host
  ansible.posix.sysctl:
    name: "{{ item }}"
    value: '1'
    state: present
    reload: true
  loop:
    - net.ipv6.conf.all.disable_ipv6
    - net.ipv6.conf.default.disable_ipv6
    - net.ipv6.conf.lo.disable_ipv6

- name: network | Get stats of netplan file
  ansible.builtin.stat:
    path: /etc/netplan/50-cloud-init.yaml
  register: fresh_os_netplan

- name: network | Make Backup Of Existing file
  ansible.builtin.copy:
    src: /etc/netplan/50-cloud-init.yaml
    dest: /etc/netplan/50-cloud-init.yaml.bak
    owner: root
    group: root
    mode: u=rw,g=rw
    remote_src: true
  when: fresh_os_netplan.stat.exists

- name: network | Remove Network Setup
  ansible.builtin.file:
    path: /etc/netplan/50-cloud-init.yaml
    state: absent
  when: fresh_os_netplan.stat.exists

- name: "network | Check if exists - {{ lan_bridge_name }}"
  ansible.builtin.set_fact:
    fresh_os_lan_bridge_exist: true
  when: "lan_bridge_name in ansible_facts.interfaces"

- name: "network | Set default if not exist - {{ lan_bridge_name }}"
  ansible.builtin.set_fact:
    fresh_os_lan_bridge_exist: false
  when: "lan_bridge_name not in ansible_facts.interfaces"

- name: network | Display interface name
  ansible.builtin.debug:
    msg: "eth_name: {{ fresh_os_eth_name }}. Does {{ lan_bridge_name }} exist? {{ fresh_os_lan_bridge_exist }}"

- name: network | Check if exists - {{ dmz_bridge_name }}
  ansible.builtin.set_fact:
    fresh_os_dmz_bridge_exist: true
  when: "dmz_bridge_name in ansible_facts.interfaces"

- name: "network | Set default if does not exist - {{ dmz_bridge_name }}"
  ansible.builtin.set_fact:
    fresh_os_dmz_bridge_exist: false
  when: "dmz_bridge_name not in ansible_facts.interfaces"

- name: network | Display interface name
  ansible.builtin.debug:
    msg: "eth_name: {{ fresh_os_eth_name }}. Does {{ dmz_bridge_name }} exist? {{ fresh_os_dmz_bridge_exist }}"

- name: network | Ensure systemd-resolved is installed
  become: true
  ansible.builtin.apt:
    name: systemd-resolved
    state: present

- name: network | Enable and start systemd-resolved
  become: true
  ansible.builtin.systemd:
    name: systemd-resolved
    enabled: true
    state: started

- name: network | Only Run if either bridge does not exist
  when: not fresh_os_lan_bridge_exist or not fresh_os_dmz_bridge_exist
  notify:
    - Reboot
  block:
    - name: "network | Create Bridge {{ lan_bridge_name }}"
      ansible.builtin.template:
        src: lan_bridge.yaml.j2
        dest: /etc/netplan/01-lan_bridge.yaml
        owner: root
        group: root
        mode: u=rw,g=rw

    - name: "network | Create DMZ Bridge {{ dmz_bridge_name }}"
      ansible.builtin.template:
        src: dmz_bridge.yaml.j2
        dest: /etc/netplan/02-dmz_bridge.yaml
        owner: root
        group: root
        mode: u=rw,g=rw

This configures the networking and does the following:

  1. Finds the interface name - on this box I only have a single interface - yes this is consumer kit 😞.
  2. Disable IPv6 - There seems to be some differences in how this works between Ubuntu and Debian.
  3. Make backups of existing netplan configs - I originally wrote this for Ubuntu. I am now moving to Debian, which handles this like a champ and still uses systemd-networkd on the backend (see renderer: networkd below), though you need to install the netplan to systemd-netword renderer.
  4. Check if bridges exist, if no then create them.
  5. Copy over netplan templates.

A netplan network template looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#GENERATED BY ANSIBLE

network:
  version: 2
  renderer: networkd

  # This config is purely so that the machine is still accessible
  # even if we send untagged data
  ethernets:
    {{ fresh_os_eth_name }}:
      dhcp4: no

  vlans:
    {{ fresh_os_eth_name }}.{{ fresh_os_lan_vlan }}:
      id: {{ fresh_os_lan_vlan }}
      link: {{ fresh_os_eth_name }}

  # This config: {{ fresh_os_eth_name }} allows the machine to be accessible
  # even if we send untagged data via the switch
  bridges:
    {{ lan_bridge_name }}:
      dhcp4: yes
      macaddress: {{ lan_bridge_mac }}
      interfaces:
        - {{ fresh_os_eth_name }} # untagged

Here we do the following:

  1. Create VLAN interface from the interface.
  2. Create a bridge and add an interface. NOTE: I did have both interfaces added and man did that mess up my routing ☹️.

Molecule

If you have ever done any software development, you might wonder how do we test all these somewhat complicated Ansible scripts?

From their site

Molecule is an Ansible testing framework designed for developing and testing Ansible collections, playbooks, and roles.

Molecule leverages standard Ansible features including inventory, playbooks, and collections to provide flexible testing workflows.

Test scenarios can target any system or service reachable from Ansible, from containers and virtual machines to cloud infrastructure, hyperscaler services, APIs, databases, and network devices.

I must come clean — I was first exposed to Molecule at my current employer, and have adopted it enthusiastically for my own projects as its an awesome piece of kit 👏.

I use molecule in the following projects:

  1. Cloud Server - Yes I spin up a qemu VM with a debian image and run my ansible scripts against it, great way to test it before running against an actual machine.
  2. openWRT modem - Again I spin up a openWRT image in qemu and run my scripts against it.
  3. DMZ Server - Same here I spin up the DMZ image, debian in this case and test the scripts against it.

How does Molecule look like?

My configuration is somewhat unconventional as I am currently on macOS and use devcontainers per project and then spin up qemu inside the container, perhaps a future post 🤔.

Molecule runs through several phases:

  1. create – Stand up infrastructure used to test the role.
  2. prepare – Prepare prerequisites (e.g., certificates).
  3. converge – Apply your role against the infrastructure.
  4. idempotence – Ensure a second run produces no changes.
  5. verify – Confirm the role did what it was intended to do.
  6. destroy – Tear down the infrastructure created in create.

The following show the create phase config for my Cloud Server:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---

# https://cloud.debian.org/images/cloud/
- name: Create
  hosts: localhost
  gather_facts: false
  vars:
    vm_image: debian.img
    qemu_tmp_path: /tmp/qemu
  tasks:
    - name: Create tap0 device
      ansible.builtin.shell: |
        ip tuntap add dev tap0 mode tap
        ip addr add 192.168.100.1/24 dev tap0
        ip link set dev tap0 up
      changed_when: true

    - name: Allow vm access to the internet
      ansible.builtin.shell: |
        sysctl -w net.ipv4.ip_forward=1

        # eth0 is the net in the dev container
        iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
      changed_when: true

    - name: Copy dnsmasq config
      ansible.builtin.copy:
        src: files/dnsmasq-tap0.conf
        dest: /etc/dnsmasq.d/qemu-tap.conf
        owner: root
        group: root
        mode: u=rw,g=r,o=r

    - name: Start dnsmasq
      ansible.builtin.shell: |
        dnsmasq --conf-file=/etc/dnsmasq.d/qemu-tap.conf
      changed_when: true

    - name: Wait for dnsmasq to be ready on port 53
      ansible.builtin.wait_for:
        port: 53
        host: 127.0.0.1
        delay: 10
        timeout: 300
        state: started

    - name: Forcefully set DNS in /etc/resolv.conf
      ansible.builtin.shell: |
        # Backup resolv.conf
        cp /etc/resolv.conf /etc/resolv.conf.bak
        echo "nameserver 192.168.100.1" > /etc/resolv.conf
      args:
        executable: /bin/bash
      changed_when: true

    - name: Create directory {{ qemu_tmp_path }}
      ansible.builtin.file:
        path: "{{ qemu_tmp_path }}"
        state: directory
        mode: u=wrx,g=rwx,o=r

    - name: Check existance of file {{ vm_image }}
      ansible.builtin.stat:
        path: "/tmp/{{ vm_image }}"
      register: image_check

    - name: Download image {{ distro.code_name }}
      ansible.builtin.uri:
        url: "https://cloud.debian.org/images/cloud/{{ distro.code_name }}/latest/debian-{{ distro.version }}-genericcloud-amd64.qcow2"
        method: GET
        return_content: false
        dest: "/tmp/{{ vm_image }}"
      when: not image_check.stat.exists

    - name: Copy image to not overwrite original
      ansible.builtin.copy:
        src: "/tmp/{{ vm_image }}"
        dest: "{{ qemu_tmp_path }}/{{ vm_image }}"
        mode: u=wrx,g=rwx,o=r

    - name: Read SSH Public Key
      ansible.builtin.shell: |
        cat ~/.ssh/id_rsa.pub
      register: ssh_publickey
      changed_when: false

    - name: SSH Public Key
      ansible.builtin.set_fact:
        ssh_public_key: "{{ ssh_publickey.stdout }}"

    - name: Render user data
      ansible.builtin.template:
        src: templates/user-data.yaml.j2
        dest: "{{ qemu_tmp_path }}/user-data.yaml"
        mode: u=rwx,g=rwx,o=r

    - name: Create seed image
      ansible.builtin.shell: >
        cloud-localds {{ qemu_tmp_path }}/my-seed.img {{ qemu_tmp_path }}/user-data.yaml
      changed_when: true

    - name: Create overlay image
      ansible.builtin.shell: |
        qemu-img create -f qcow2 -b {{ qemu_tmp_path }}/{{ vm_image }} -F qcow2 {{ qemu_tmp_path }}/overlay.qcow2 10G
        qemu-img create -f qcow2 {{ qemu_tmp_path }}/disk1.qcow2 2G
        qemu-img create -f qcow2 {{ qemu_tmp_path }}/disk2.qcow2 2G
        qemu-img create -f qcow2 {{ qemu_tmp_path }}/disk3.qcow2 4G
        qemu-img create -f qcow2 {{ qemu_tmp_path }}/disk4.qcow2 1G
      args:
        creates: "{{ qemu_tmp_path }}/overlay.qcow2"
      changed_when: true

    - name: Start QEMU VM
      ansible.builtin.shell: >
        qemu-system-x86_64 \
          -name ubuntu \
          -serial file:boot.log \
          -smp cores=4 \
          -cpu qemu64 \
          -machine type=q35 \
          -m 4096 \
          -netdev tap,id=net0,ifname=tap0,script=no,downscript=no \
          -device virtio-net-pci,netdev=net0 \
          -drive if=virtio,format=qcow2,file={{ qemu_tmp_path }}/overlay.qcow2 \
          -drive if=virtio,format=raw,file={{ qemu_tmp_path }}/my-seed.img \
          -drive if=virtio,format=qcow2,file={{ qemu_tmp_path }}/disk1.qcow2 \
          -drive if=virtio,format=qcow2,file={{ qemu_tmp_path }}/disk2.qcow2 \
          -drive if=virtio,format=qcow2,file={{ qemu_tmp_path }}/disk3.qcow2 \
          -drive if=virtio,format=qcow2,file={{ qemu_tmp_path }}/disk4.qcow2 \
          -boot order=c \
          -display none \
          -daemonize
      changed_when: true

    - name: Wait for SSH to be available
      ansible.builtin.wait_for:
        host: "{{ hostname }}"
        port: 22
        delay: 10
        timeout: 300
        state: started

Here we do the following:

  1. Create virtual interfaces we can add to the qemu vm.
  2. Configure dnsmasq as dhcp server.
  3. Download a cloudinit version of debian.
  4. Add my ssh public key to the cloudinit config.
  5. Create all the various disks I have on the actual machine.
  6. Start up the qemu VM.
  7. Wait for SSH to become available.

The converge phase for all the projects looks like this:

1
2
3
4
---

- name: Converge
  ansible.builtin.import_playbook: ../../../main.yml

This simply runs the Ansible role — the component we want to test against the QEMU-based infrastructure.

The issues with Molecule

There are a few gotchas to be aware of when using Molecule — some of which I only discovered recently while moving my Cloud Server from Ubuntu 24.04 to Debian 13.

One of the biggest issues I ran into was not using the exact same image in QEMU for testing as the one running on the actual server.

For testing, I used the Debian cloud-init image in QEMU. It’s fantastic for quickly configuring SSH access and spinning up test environments. However, because it is designed for cloud infrastructure, it comes preloaded with packages that the minimal Debian image does not include.

This led to a frustrating debugging session where things worked perfectly in Molecule, but failed on the real machine. After some digging, I discovered that I needed systemd-resolved installed and running on the actual server — something the cloud image handled implicitly.

The lesson here is simple: try to keep your testing and production environments as close as possible. I know this isn’t always feasible in a limited homelab setup, and there will be trade-offs — but the closer they match, the fewer surprises you’ll encounter.

Recommendation

If, like me, you ❤️ fiddling with your computers and routers, this is a game changer for getting things back to baseline quickly and consistently.

You also get the added benefit of being able to run everything again with minimal changes when you inevitably need to update or patch your systems.

It also keeps the wife happy seeing that things get up and running in no time 😂.

Resources