VPP

Introduction

In a previous post (VPP Linux CP - Virtual Machine Playground), I wrote a bit about building a QEMU image so that folks can play with the Vector Packet Processor and the Linux Control Plane code. Judging by our access logs, this image has definitely been downloaded a bunch, and I myself use it regularly when I want to tinker a little bit, without wanting to impact the production routers at AS8298.

The topology of my tests has become a bit more complicated over time, and often just one router would not be enough. Yet, repeatability is quite important, and I found myself constantly reinstalling / recheckpointing the vpp-proto virtual machine I was using. I got my hands on some LAB hardware, so it’s time for an upgrade!

IPng Networks LAB - Physical

Physical

First, I specc’d out a few machines that will serve as hypervisors. From top to bottom in the picture here, two FS.com S5680-20SQ switches – I reviewed these earlier [ref], and I really like these, as they come with 20x10G, 4x25G and 2x40G ports, an OOB management port and serial to configure them. Under it, is its larger brother, with 48x10G and 8x100G ports, the FS.com S5860-48SC. Although it’s a bit more expensive, it’s also necessary because I often test VPP at higher bandwidth, and as such being able to make ethernet topologies by mixing 10, 25, 40, 100G is super useful for me. So, this switch is fsw0.lab.ipng.ch and dedicated to lab experiments.

Connected to the switch are my trusty Rhino and Hippo machines. If you remember that game Hungry Hungry Hippos that’s where the name comes from. They are both Ryzen 5950X on ASUS B550 motherboard, with each 2x1G i350 copper nics (pictured here not connected), and 2x100G i810 QSFP network cards (properly slotted in the motherboard’ss PCIe v4.0 x16 slot).

Finally, three Dell R720XD machines serve as the to be built VPP testbed. They each come with 128GB of RAM, 2x500G SSDs, two Intel 82599ES dual 10G NICs (four ports total), and four Broadcom BCM5720 1G NICs. The first 1G port is connected to a management switch, and it doubles up as an IPMI speaker, so I can turn on/off the hypervisors remotely. All four 10G ports are connected with DACs to fsw0-lab, as are two 1G copper ports (the blue UTP cables). Everything can be turned on/off remotely, which is useful for noise, heat and overall the environment 🍀.

IPng Networks LAB - Logical

Logical

I have three of these Dell R720XD machines in the lab, and each one of them will run one complete lab environment, consisting of four VPP virtual machines, network plumbing, and uplink. That way, I can turn on one hypervisor, say hvn0.lab.ipng.ch, prepare and boot the VMs, mess around with it, and when I’m done, return the VMs to a pristine state, and turn off the hypervisor. And, because I have three of these machines, I can run three separate LABs at the same time, or one really big one spanning all the machines. Pictured on the right is a logical sketch of one of the LABs (LAB id=0), with a bunch of VPP virtual machines, each four NICs daisychained together, with a few NICs left for experimenting.

Headend

At the top of the logical environment, I am going to be using one of our production machines (hvn0.chbtl0.ipng.ch) which will run a permanently running LAB headend, a Debian VM called lab.ipng.ch. This allows me to hermetically seal the LAB environments, letting me run them entirely in RFC1918 space, and by forcing the LAbs to be connected under this machine, I can ensure that no unwanted traffic enters or exits the network [imagine a loadtest at 100Gbit accidentally leaking, this may or totally may not have once happened to me before …].

Disk images

On this production hypervisor (hvn0.chbtl0.ipng.ch), I’ll also prepare and maintain a prototype vpp-proto disk image, which will serve as a consistent image to boot the LAB virtual machines. This main image will be replicated over the network into all three hvn0 - hvn2 hypervisor machines. This way, I can do periodical maintenance on the main vpp-proto image, snapshot it, publish it as a QCOW2 for downloading (see my [VPP Linux CP - Virtual Machine Playground] post for details on how it’s built and what you can do with it yourself!). The snapshots will then also be sync’d to all hypervisors, and from there I can use simple ZFS filesystem cloning and snapshotting to maintain the LAB virtual machines.

Networking

Each hypervisor will get an install of Open vSwitch, a production quality, multilayer virtual switch designed to enable massive network automation through programmatic extension, while still supporting standard management interfaces and protocols. This takes lots of the guesswork and tinkering out of Linux bridges in KVM/QEMU, and it’s a perfect fit due to its tight integration with libvirt (the thing most of us use in Debian/Ubuntu hypervisors). If need be, I can add one or more of the 1G or 10G ports as well to the OVS fabric, to build more complicated topologies. And, because the OVS infrastructure and libvirt both allow themselves to be configured over the network, I can control all aspects of the runtime directly from the lab.ipng.ch headend, not having to log in to the hypervisor machines at all. Slick!

Implementation Details

I start with image management. On the production hypervisor, I create a 6GB ZFS dataset that will serve as my vpp-proto machine, and install it using the exact same method as the playground [ref]. Once I have it the way I like it, I’ll poweroff the VM, and see to this image being replicated to all hypervisors.

ZFS Replication

Enter zrepl, a one-stop, integrated solution for ZFS replication. This tool is incredibly powerful, and can do snapshot management, sourcing / sinking replication, of course using incremental snapshots as they are native to ZFS. Because this is a LAB article, not a zrepl tutorial, I’ll just cut to the chase and show the configuration I came up with.

pim@hvn0-chbtl0:~$ cat << EOF | sudo tee /etc/zrepl/zrepl.yml 
global:
  logging:
    # use syslog instead of stdout because it makes journald happy
    - type: syslog
      format: human
      level: warn

jobs:
- name: snap-vpp-proto
  type: snap
  filesystems:
    'ssd-vol0/vpp-proto-disk0<': true
  snapshotting:
    type: manual
  pruning:
    keep:
      - type: last_n
        count: 10

- name: source-vpp-proto
  type: source
  serve:
    type: stdinserver
    client_identities:
    - "hvn0-lab"
    - "hvn1-lab"
    - "hvn2-lab"
  filesystems:
    'ssd-vol0/vpp-proto-disk0<': true # all filesystems
  snapshotting:
    type: manual
EOF

pim@hvn0-chbtl0:~$ cat << EOF | sudo tee -a /root/.ssh/authorized_keys
# ZFS Replication Clients for IPng Networks LAB
command="zrepl stdinserver hvn0-lab",restrict ecdsa-sha2-nistp256 <omitted> root@hvn0.lab.ipng.ch
command="zrepl stdinserver hvn1-lab",restrict ecdsa-sha2-nistp256 <omitted> root@hvn1.lab.ipng.ch
command="zrepl stdinserver hvn2-lab",restrict ecdsa-sha2-nistp256 <omitted> root@hvn2.lab.ipng.ch
EOF

To unpack this, there are two jobs configured in zrepl:

snap-vpp-proto - the purpose of this job is to track snapshots as they are created. Normally, zrepl is configured to automatically make snapshots every hour and copy them out, but in my case, I only want to take snapshots when I changed and released the vpp-proto image, not periodically. So, I set the snapshotting to manual, and let the system keep the last ten images.
source-vpp-proto - this is a source job that uses a lazy (albeit fine in this lab environment) method to serve the snapshots to clients. By adding these SSH keys to the authorized_keys file, but restricting them to be able to execute only the zrepl stdinserver command, and nothing else (ie. these keys cannot log in to the machine). If any given server were to present thesze keys, I can now map them to a zrepl client (for example, hvn0-lab for the SSH key presented by hostname hvn0.lab.ipng.ch. The source job now knows to serve the listed filesystems (and their dataset children, noted by the < suffix), to those clients.

For the client side, each of the hypervisors gets only one job, called a pull job, which will periodically wake up (every minute) and ensure that any pending snapshots and their incrementals from the remote source are slurped in and replicated to a root_fs dataset, in this case I called it ssd-vol0/hvn0.chbtl0.ipng.ch so I can track where the datasets come from.

pim@hvn0-lab:~$ sudo ssh-keygen -t ecdsa -f /etc/zrepl/ssh/identity -C "root@$(hostname -f)"
pim@hvn0-lab:~$ cat << EOF | sudo tee /etc/zrepl/zrepl.yml 
global:
  logging:
    # use syslog instead of stdout because it makes journald happy
    - type: syslog
      format: human
      level: warn

jobs:
- name: vpp-proto
  type: pull
  connect:
    type: ssh+stdinserver
    host: hvn0.chbtl0.ipng.ch
    user: root
    port: 22
    identity_file: /etc/zrepl/ssh/identity
  root_fs: ssd-vol0/hvn0.chbtl0.ipng.ch
  interval: 1m
  pruning:
    keep_sender:
      - type: regex
        regex: '.*'
    keep_receiver:
      - type: last_n
        count: 10
  recv:
    placeholder:
      encryption: off

After restarting zrepl for each of the machines (the source machine and the three pull machines), I can now do the following cool hat trick:

pim@hvn0-chbtl0:~$ virsh start --console vpp-proto
## Do whatever maintenance, and then poweroff the VM
pim@hvn0-chbtl0:~$ sudo zfs snapshot ssd-vol0/vpp-proto-disk0@20221019-release
pim@hvn0-chbtl0:~$ sudo zrepl signal wakeup source-vpp-proto

This signals the zrepl daemon to re-read the snapshots, which will pick up the newest one, and then without me doing much of anything else:

pim@hvn0-lab:~$ sudo zfs list -t all | grep vpp-proto
ssd-vol0/hvn0.chbtl0.ipng.ch/ssd-vol0/vpp-proto-disk0                   6.60G   367G     6.04G  -
ssd-vol0/hvn0.chbtl0.ipng.ch/ssd-vol0/vpp-proto-disk0@20221013-release   499M      -     6.04G  -
ssd-vol0/hvn0.chbtl0.ipng.ch/ssd-vol0/vpp-proto-disk0@20221018-release  24.1M      -     6.04G  -
ssd-vol0/hvn0.chbtl0.ipng.ch/ssd-vol0/vpp-proto-disk0@20221019-release     0B      -     6.04G  -

That last image was just pushed automatically to all hypervisors! If they’re turned off, no worries, as soon as they start up, their local zrepl will make its next minutely poll, and pull in all snapshots, bringing the machine up to date. So even when the hypervisors are normally turned off, this is zero-touch and maintenance free.

VM image maintenance

Now that I have a stable image to work off of, all I have to do is zfs clone this image into new per-VM datasets, after which I can mess around on the VMs all I want, and when I’m done, I can zfs destroy the clone and bring it back to normal. However, I clearly don’t want one and the same clone for each of the VMs, as they do have lots of config files that are specific to that one instance. For example, the mgmt IPv4/IPv6 addresses are unique, and the VPP and Bird/FRR configs are unique as well. But how unique are they, really?

Enter Jinja (known mostly from Ansible). I decide to make some form of per-VM config files that are generated based on some templates. That way, I can clone the base ZFS dataset, copy in the deltas, and boot that instead. And to be extra efficient, I can also make a per-VM zfs snapshot of the cloned+updated filesystem, before tinkering with the VMs, which I’ll call a pristine snapshot. Still with me?

First, clone the base dataset into a per-VM dataset, say ssd-vol0/vpp0-0
Then, generate a bunch of override files, copying them into the per-VM dataset ssd-vol0/vpp0-0
Finally, create a snapshot of that, called ssd-vol0/vpp0-0@pristine and boot off of that.

Now, returning the VM to a pristine state is simply a matter of shutting down the VM, performing a zfs rollback to the pristine snapshot, and starting the VM again. Ready? Let’s go!

Generator

So off I go, writing a small Python generator that uses Jinja to read a bunch of YAML files, merging them along the way, and then traversing a set of directories with template files and per-VM overrides, to assemble a build output directory with a fully formed set of files that I can copy into the per-VM dataset.

Take a look at this as a minimally viable configuration:

pim@lab:~/src/lab$ cat config/common/generic.yaml 
overlays:
  default:
    path: overlays/bird/
    build: build/default/
lab:
  mgmt:
    ipv4: 192.168.1.80/24
    ipv6: 2001:678:d78:101::80/64
    gw4: 192.168.1.252
    gw6: 2001:678:d78:101::1
  nameserver: 
    search: [ "lab.ipng.ch", "ipng.ch", "rfc1918.ipng.nl", "ipng.nl" ]
  nodes: 4

pim@lab:~/src/lab$ cat config/hvn0.lab.ipng.ch.yaml 
lab:
  id: 0
  ipv4: 192.168.10.0/24
  ipv6: 2001:678:d78:200::/60
  nameserver: 
    addresses: [ 192.168.10.4, 2001:678:d78:201::ffff ]
  hypervisor: hvn0.lab.ipng.ch

Here I define a common config file with fields and attributes which will apply to all LAB environments, things such as the mgmt network, nameserver search paths, and how many VPP virtual machine nodes I want to build. Then, for hvn0.lab.ipng.ch, I specify an IPv4 and IPv6 prefix assigned to it, some specific nameserver endpoints that will point at an unbound running on lab.ipng.ch itself.

I can now create any file I’d like which may use variable substition and other jinja2 style templating. Take for example these two files:

pim@lab:~/src/lab$ cat overlays/bird/common/etc/netplan/01-netcfg.yaml.j2 
network:
  version: 2
  renderer: networkd
  ethernets:
    enp1s0:
      optional: true
      accept-ra: false
      dhcp4: false
      addresses: [ {{node.mgmt.ipv4}}, {{node.mgmt.ipv6}} ]
      gateway4: {{lab.mgmt.gw4}}
      gateway6: {{lab.mgmt.gw6}}

pim@lab:~/src/lab$ cat overlays/bird/common/etc/netns/dataplane/resolv.conf.j2 
domain lab.ipng.ch
search{% for domain in lab.nameserver.search %} {{domain}}{%endfor %}

{% for resolver in lab.nameserver.addresses %}
nameserver {{resolver}}
{%endfor%}

The first file is a [NetPlan.io] configuration that substitutes the correct management IPv4 and IPv6 addresses and gateways. The second one enumerates a set of search domains and nameservers, so that each LAB can have their own unique resolvers. I point these at the lab.ipng.ch uplink interface, in the case of the LAB hvn0.lab.ipng.ch, this will be 192.168.10.4 and 2001:678:d78:201::ffff, but on hvn1.lab.ipng.ch I can override that to become 192.168.11.4 and 2001:678:d78:211::ffff.

There’s one subdirectory for each overlay type (imagine that I want a lab that runs Bird2, but I may also want one which runs FRR, or another thing still). Within the overlay directory, there’s one common tree, with files that apply to every machine in the LAB, and a hostname tree, with files that apply only to specific nodes (VMs) in the LAB:

pim@lab:~/src/lab$ tree overlays/default/
overlays/default/
├── common
│   ├── etc
│   │   ├── bird
│   │   │   ├── bfd.conf.j2
│   │   │   ├── bird.conf.j2
│   │   │   ├── ibgp.conf.j2
│   │   │   ├── ospf.conf.j2
│   │   │   └── static.conf.j2
│   │   ├── hostname.j2
│   │   ├── hosts.j2
│   │   ├── netns
│   │   │   └── dataplane
│   │   │       └── resolv.conf.j2
│   │   ├── netplan
│   │   │   └── 01-netcfg.yaml.j2
│   │   ├── resolv.conf.j2
│   │   └── vpp
│   │       ├── bootstrap.vpp.j2
│   │       └── config
│   │           ├── defaults.vpp
│   │           ├── flowprobe.vpp.j2
│   │           ├── interface.vpp.j2
│   │           ├── lcp.vpp
│   │           ├── loopback.vpp.j2
│   │           └── manual.vpp.j2
│   ├── home
│   │   └── ipng
│   └── root
├── hostname
    ├── vpp0-0
        └── etc
(etc)       └── vpp
                └── config
                    └── interface.vpp

Now all that’s left to do is generate this hierarchy, and of course I can check this in to git and track changes to the templates and their resulting generated filesystem overrides over time:

pim@lab:~/src/lab$ ./generate -q --host hvn0.lab.ipng.ch
pim@lab:~/src/lab$ find build/default/hvn0.lab.ipng.ch/vpp0-0/ -type f
build/default/hvn0.lab.ipng.ch/vpp0-0/home/ipng/.ssh/authorized_keys
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/hosts
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/resolv.conf
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/bird/static.conf
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/bird/bfd.conf
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/bird/bird.conf
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/bird/ibgp.conf
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/bird/ospf.conf
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/vpp/config/loopback.vpp
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/vpp/config/flowprobe.vpp
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/vpp/config/interface.vpp
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/vpp/config/defaults.vpp
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/vpp/config/lcp.vpp
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/vpp/config/manual.vpp
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/vpp/bootstrap.vpp
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/netplan/01-netcfg.yaml
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/netns/dataplane/resolv.conf
build/default/hvn0.lab.ipng.ch/vpp0-0/etc/hostname
build/default/hvn0.lab.ipng.ch/vpp0-0/root/.ssh/authorized_keys

Open vSwitch maintenance

The OVS installs on each Debian hypervisor in the lab is the same. I install the required Debian packages, create a switchfabric, add one physical network port (the one that will serve as the uplink (VLAN 10 in the sketch above) for the LAB), and all the virtio ports from KVM.

pim@hvn0-lab:~$ sudo vi /etc/netplan/01-netcfg.yaml
network:
  vlans:
    uplink:
      optional: true
      accept-ra: false
      dhcp4: false
      link: eno1
      id: 200
pim@hvn0-lab:~$ sudo netplan apply
pim@hvn0-lab:~$ sudo apt install openvswitch-switch python3-openvswitch
pim@hvn0-lab:~$ sudo ovs-vsctl add-br vpplan
pim@hvn0-lab:~$ sudo ovs-vsctl add-port vpplan uplink tag=10

The vpplan switch fabric and its uplink port will persist across reboots. Then I add a small change to libvirt defined virtual machines:

pim@hvn0-lab:~$ virsh edit vpp0-0
...
    <interface type='bridge'>
      <mac address='52:54:00:00:10:00'/>
      <source bridge='vpplan'/>
      <virtualport type='openvswitch' />
      <target dev='vpp0-0-0'/>
      <model type='virtio'/>
      <mtu size='9216'/>
      <address type='pci' domain='0x0000' bus='0x10' slot='0x00' function='0x0' multifunction='on'/>
    </interface>
    <interface type='bridge'>
      <mac address='52:54:00:00:10:01'/>
      <source bridge='vpplan'/>
      <virtualport type='openvswitch' />
      <target dev='vpp0-0-1'/>
      <model type='virtio'/>
      <mtu size='9216'/>
      <address type='pci' domain='0x0000' bus='0x10' slot='0x00' function='0x1'/>
    </interface>
... etc

That the only two things I need to do are ensure that the source bridge will be called the same as the OVS fabric, in my case vpplan, and the virtualport type is openvswitch, and that’s it! Once all four vpp0-* virtual machines each have all four of their network cards updated, when they boot, the hypervisor will add them each as new untagged ports in the OVS fabric.

To then build the topology that I have in mind for the LAB, where each VPP machine is daisychained to its siblin, all we have to do is program that into the OVS configuration:

pim@hvn0-lab:~$ cat << EOF > ovs-config.sh
#!/bin/sh
#
# OVS configuration for the `default` overlay

LAB=${LAB:=0}
for node in 0 1 2 3; do
  for int in 0 1 2 3; do
    ovs-vsctl set port vpp${LAB}-${node}-${int} vlan_mode=native-untagged
  done
done

# Uplink is VLAN 10
ovs-vsctl add port vpp${LAB}-0-0 tag 10
ovs-vsctl add port uplink tag 10

# Link vpp${LAB}-0 <-> vpp${LAB}-1 in VLAN 20
ovs-vsctl add port vpp${LAB}-0-1 tag 20
ovs-vsctl add port vpp${LAB}-1-0 tag 20

# Link vpp${LAB}-1 <-> vpp${LAB}-2 in VLAN 21
ovs-vsctl add port vpp${LAB}-1-1 tag 21
ovs-vsctl add port vpp${LAB}-2-0 tag 21

# Link vpp${LAB}-2 <-> vpp${LAB}-3 in VLAN 22
ovs-vsctl add port vpp${LAB}-2-1 tag 22
ovs-vsctl add port vpp${LAB}-3-0 tag 22
EOF

pim@hvn0-lab:~$ chmod 755 ovs-config.sh
pim@hvn0-lab:~$ sudo ./ovs-config.sh

The first block here wheels over all nodes and then for all of ther ports, sets the VLAN mode to what OVS calleds ‘native-untagged’. In this mode, the tag becomes the VLAN in which the port will operate, but, to add as well dot1q tagged additional VLANs, we can use the syntax add port ... trunks 10,20,30.

TO see the configuration, ovs-vsctl show port vpp0-0-0 will show the switch port configuration, while ovs-vsctl show interface vpp0-0-0 will show the virtual machine’s NIC configuration (think of the difference here as the switch port on the one hand, and the NIC (interface) plugged into it on the other).

Deployment

There’s three main points to consider when deploying these lab VMs:

Create the VMs and their ZFS datasets
Destroy the VMs and their ZFS datasets
Bring the VMs into a pristine state

Create

If the hypervisor doesn’t yet have a LAB running, we need to create it:

BASE=${BASE:=ssd-vol0/hvn0.chbtl0.ipng.ch/ssd-vol0/vpp-proto-disk0@20221019-release}
BUILD=${BUILD:=default}
LAB=${LAB:=0}

## Do not touch below this line
LABDIR=/var/lab
STAGING=$LABDIR/staging
HVN="hvn${LAB}.lab.ipng.ch"

echo "*  Cloning base"
ssh root@$HVN "set -x; for node in 0 1 2 3; do VM=vpp${LAB}-\${node}; \
  mkdir -p $STAGING/\$VM; zfs clone $BASE ssd-vol0/\$VM; done"
sleep 1

echo "*  Mounting in staging"
ssh root@$HVN "set -x; for node in 0 1 2 3; do VM=vpp${LAB}-\${node}; \
  mount /dev/zvol/ssd-vol0/\$VM-part1 $STAGING/\$VM; done"

echo "*  Rsyncing build"
rsync -avugP build/$BUILD/$HVN/ root@hvn${LAB}.lab.ipng.ch:$STAGING

echo "*  Setting permissions"
ssh root@$HVN "set -x; for node in 0 1 2 3; do VM=vpp${LAB}-\${node}; \
  chown -R root. $STAGING/\$VM/root; done"

echo "*  Unmounting and snapshotting pristine state"
ssh root@$HVN "set -x; for node in 0 1 2 3; do VM=vpp${LAB}-\${node}; \
  umount $STAGING/\$VM; zfs snapshot ssd-vol0/\${VM}@pristine; done"

echo "*  Starting VMs"
ssh root@$HVN "set -x; for node in 0 1 2 3; do VM=vpp${LAB}-\${node}; \
  virsh start \$VM; done"

echo "*  Committing OVS config"
scp overlays/$BUILD/ovs-config.sh root@$HVN:$LABDIR
ssh root@$HVN "set -x; LAB=$LAB $LABDIR/ovs-config.sh"

After running this, the hypervisor will have 4 clones, and 4 snapshots (one for each virtual machine):

root@hvn0-lab:~# zfs list -t all
NAME                                                                     USED  AVAIL     REFER  MOUNTPOINT
ssd-vol0                                                                6.80G   367G       24K  /ssd-vol0
ssd-vol0/hvn0.chbtl0.ipng.ch                                            6.60G   367G       24K  none
ssd-vol0/hvn0.chbtl0.ipng.ch/ssd-vol0                                   6.60G   367G       24K  none
ssd-vol0/hvn0.chbtl0.ipng.ch/ssd-vol0/vpp-proto-disk0                   6.60G   367G     6.04G  -
ssd-vol0/hvn0.chbtl0.ipng.ch/ssd-vol0/vpp-proto-disk0@20221013-release   499M      -     6.04G  -
ssd-vol0/hvn0.chbtl0.ipng.ch/ssd-vol0/vpp-proto-disk0@20221018-release  24.1M      -     6.04G  -
ssd-vol0/hvn0.chbtl0.ipng.ch/ssd-vol0/vpp-proto-disk0@20221019-release     0B      -     6.04G  -
ssd-vol0/vpp0-0                                                         43.6M   367G     6.04G  -
ssd-vol0/vpp0-0@pristine                                                1.13M      -     6.04G  -
ssd-vol0/vpp0-1                                                         25.0M   367G     6.04G  -
ssd-vol0/vpp0-1@pristine                                                1.14M      -     6.04G  -
ssd-vol0/vpp0-2                                                         42.2M   367G     6.04G  -
ssd-vol0/vpp0-2@pristine                                                1.13M      -     6.04G  -
ssd-vol0/vpp0-3                                                         79.1M   367G     6.04G  -
ssd-vol0/vpp0-3@pristine                                                1.13M      -     6.04G  -

The last thing the create script does is commit the OVS configuration, because when the VMs are shutdown or newly created, KVM will add them to the switching fabric as untagged/unconfigured ports.

But would you look at that! The delta between the base image and the pristine snapshots is about 1MB of configuration files, the ones that I generated and rsync’d in above, and then once the machine boots, it will have a read/write mounted filesystem as per normal, except it’s a delta on top of the snapshotted, cloned dataset.

Destroy

I love destroying things! But in this case, I’m removing what are essentially ephemeral disk images, as I still have the base image to clone from. But, the destroy is conceptually very simple:

BASE=${BASE:=ssd-vol0/hvn0.chbtl0.ipng.ch/ssd-vol0/vpp-proto-disk0@20221018-release}
LAB=${LAB:=0}

## Do not touch below this line
HVN="hvn${LAB}.lab.ipng.ch"

echo "*  Destroying VMs"
ssh root@$HVN "set -x; for node in 0 1 2 3; do VM=vpp${LAB}-\${node}; \
  virsh destroy \$VM; done"

echo "*  Destroying ZFS datasets"
ssh root@$HVN "set -x; for node in 0 1 2 3; do VM=vpp${LAB}-\${node}; \
  zfs destroy -r ssd-vol0/\$VM; done"

After running this, the VMs will be shutdown and their cloned filesystems (including any snapshots those may have) are wiped. To get back into a working state, all I must do is run ./create again!

Pristine

Sometimes though, I don’t need to completely destroy the VMs, but rather I want to put them back into the state they where just after creating the LAB. Luckily, the create made a snapshot (called pristine) for each VM before booting it, so bringing the LAB back to factory default settings is really easy:

BUILD=${BUILD:=default}
LAB=${LAB:=0}

## Do not touch below this line
LABDIR=/var/lab
STAGING=$LABDIR/staging
HVN="hvn${LAB}.lab.ipng.ch"

## Bring back into pristine state
echo "*  Restarting VMs from pristine snapshot"
ssh root@$HVN "set -x; for node in 0 1 2 3; do VM=vpp${LAB}-\${node}; \
  virsh destroy \$VM;
  zfs rollback ssd-vol0/\${VM}@pristine;
  virsh start \$VM; done"

echo "*  Committing OVS config"
scp overlays/$BUILD/ovs-config.sh root@$HVN:$LABDIR
ssh root@$HVN "set -x; $LABDIR/ovs-config.sh"

Results

After completing this project, I have a completely hands-off, automated and autogenerated, and very maneageable set of three LABs, each booting up in a running OSPF/OSPFv3 enabled topology for IPv4 and IPv6:

pim@lab:~/src/lab$ traceroute -q1 vpp0-3
traceroute to vpp0-3 (192.168.10.3), 30 hops max, 60 byte packets
e0.vpp0-0.lab.ipng.ch (192.168.10.5)  1.752 ms
e0.vpp0-1.lab.ipng.ch (192.168.10.7)  4.064 ms
e0.vpp0-2.lab.ipng.ch (192.168.10.9)  5.178 ms
vpp0-3.lab.ipng.ch (192.168.10.3)  7.469 ms
pim@lab:~/src/lab$ ssh ipng@vpp0-3

ipng@vpp0-3:~$ traceroute6 -q1  vpp2-3
traceroute to vpp2-3 (2001:678:d78:220::3), 30 hops max, 80 byte packets
e1.vpp0-2.lab.ipng.ch (2001:678:d78:201::3:2)  2.088 ms
e1.vpp0-1.lab.ipng.ch (2001:678:d78:201::2:1)  6.958 ms
e1.vpp0-0.lab.ipng.ch (2001:678:d78:201::1:0)  8.841 ms
lab0.lab.ipng.ch (2001:678:d78:201::ffff)  7.381 ms
e0.vpp2-0.lab.ipng.ch (2001:678:d78:221::fffe)  8.304 ms
e0.vpp2-1.lab.ipng.ch (2001:678:d78:221::1:21)  11.633 ms
e0.vpp2-2.lab.ipng.ch (2001:678:d78:221::2:22)  13.704 ms
vpp2-3.lab.ipng.ch (2001:678:d78:220::3)  15.597 ms

If you read this far, thanks! Each of these three LABs come with 4x10Gbit DPDK based packet generators (Cisco T-Rex), four VPP machines running either Bird2 or FRR, and together they are connected to a 100G capable switch.

These LABs are for rent, and we offer hands-on training on them. Please contact us for daily/weekly rates, and custom training sessions.

I checked the generator and deploy scripts in to a git repository, which I’m happy to share if there’s an interest. But because it contains a few implementation details and doesn’t do a lot of fool-proofing, as well as because most of this can be easily recreated by interested parties from this blogpost, I decided not to publish the LAB project github, but on our private git.ipng.ch server instead. Mail us if you’d like to take a closer look, I’m happy to share the code.