I did this test back in February, but can now finally publish the results! This little SBC is definitely going to be a hit in the ISP industry. See more information about it here.

PC Engines develops and sells small single board computers for networking to a worldwide customer base. This article discusses a new/unreleased product which PC Engines has developed, which has specific significance in the network operator community: an SBC which comes with three RJ45/UTP based network ports, and one SFP optical port.

Executive Summary

Due to the use of Intel i210-IS on the SFP port and i211-AT on the three copper ports, and due to it having no moving parts (fans, hard disks, etc), this SBC is an excellent choice for network appliances such as out-of-band or serial consoles in a datacenter, or routers in a small business or home office.

Detailed findings

APU6

The APU series boards typically ship with 2GB or 4GB of DRAM, 2, 3 or 4 Intel i211-AT network interfaces, and a four core AMD GX-412TC (running at 1GHz). This review is about the following APU6 unit, which comes with 4GB of DRAM (this preproduction unit came with 2GB, but that will be fixed in the production version), 3x i211-AT for the RJ45 network interfaces, and one i210-IS with an SFP cage.

One other significant difference is visible – the trusty rusty DB9 connector that exposes the first serial RS232 port is replaced with a modern CP2104 (USB vendor 10c4:ea60) from Silicon Labs which exposes the serial port as TTL/serial on a micro USB connector rather than RS232, neat!

Transceiver Compatibility

Optics

The small form-factor pluggable (SFP) is a compact, hot-pluggable network interface module used for both telecommunication and data communications applications. An SFP interface on networking hardware is a modular slot for a media-specific transceiver in order to connect a fiber-optic cable or sometimes a copper cable. Such a slot is typically called a cage.

The SFP port accepts most/any optics brand and configuration (Copper, regular 850nm/1310nm/1550nm based, BiDi as commonly used in FTTH deployments, CWDM for use behind an OADM). I tried 6 different vendors and types, see below for results. All modules worked, regardless of vendor or brand.

I tried 6 different SFP modules, all successfully. See the links in the list for an output of an optical diagnostics tool (using the SFF-8472 standard for SFP/SFP+ management).

Each module provided link and passed traffic. The loadtest below was done with the BiDi optics in one interface and a boring RJ45 copper cable in another. It’s going to be fantastic to be able to use these APU6’s in a datacenter setting as remote / out-of-band serial devices, specifically nowadays where UTP is becoming a scarcity and everybody has fiber infrastructure in their racks.

Vendor Type Description Details
Finisar FTLF8519P2BNL-RB 850nm duplex sfp0.txt
Generic Unknown(no DOM) 850nm duplex sfp1.txt
Cisco GLC-LH-SMD 1310nm duplex sfp2.txt
Cisco SFP-GE-BX-D 1490nm Bidirectional (FTTH CPE) sfp3.txt
Cisco SFP-GE-BX-U 1310nm Bidirectional (FTTH COR) sfp3.txt
Cisco BT-OC24-20A 1550nm OC24 SDH sfp4.txt
Finisar FTRJ1319P1BTL-C7 1310nm 20km (w/ 6dB attenuator) sfp5.txt

Network Loadtest

The choice of Intel i210/i211 network controller on this board allows operators to use Intel’s DPDK with relatively high performance, compared to regular (kernel) based routing. I loadtested Linux (Ubuntu 20.04), OpenBSD (6.8), and two lesser known but way cooler DPDK open source appliances called Danos (ref) and VPP (ref) respectively.

Specifically worth calling out that while Linux and OpenBSD struggled, both DPDK appliances had absolutely no problems filling a bidirectional gigabit stream of “regular internet traffic” (referred to as imix), and came close to line rate with “64b UDP packets”. The line rate of a gigabit ethernet is 1.48Mpps in one direction, and my loadtests stressed both directions simultaneously.

Methodology

For the loadtests, I used Cisco’s T-Rex (ref) in stateless mode, with a custom Python controller that ramps up and down traffic from the loadtester to the device under test (DUT) by sending traffic out port0 to the DUT, and expecting that traffic to be presented back out from the DUT to its port1, and vice versa (out from port1 -> DUT -> back in on port0). The loadtester first sends a few seconds of warmup, this is to ensure the DUT is passing traffic and offers the ability to inspect the traffic before the actual rampup. Then the loadteser ramps up linearly from zero to 100% of line rate (in our case, line rate is one gigabit in both directions), finally it holds the traffic at full line rate for a certain duration. If at any time the loadtester fails to see the traffic it’s emitting return on its second port, it flags the DUT as saturated; and this is noted as the maximum bits/second and/or packets/second.

usage: trex-loadtest.bin [-h] [-s SERVER] [-p PROFILE_FILE] [-o OUTPUT_FILE] [-wm WARMUP_MULT]
                         [-wd WARMUP_DURATION] [-rt RAMPUP_TARGET]
                         [-rd RAMPUP_DURATION] [-hd HOLD_DURATION]

T-Rex Stateless Loadtester -- pim@ipng.nl

optional arguments:
  -h, --help            show this help message and exit
  -s SERVER, --server SERVER
                        Remote trex address (default: 127.0.0.1)
  -p PROFILE_FILE, --profile PROFILE_FILE
                        STL profile file to replay (default: imix.py)
  -o OUTPUT_FILE, --output OUTPUT_FILE
                        File to write results into, use "-" for stdout (default: -)
  -wm WARMUP_MULT, --warmup_mult WARMUP_MULT
                        During warmup, send this "mult" (default: 1kpps)
  -wd WARMUP_DURATION, --warmup_duration WARMUP_DURATION
                        Duration of warmup, in seconds (default: 30)
  -rt RAMPUP_TARGET, --rampup_target RAMPUP_TARGET
                        Target percentage of line rate to ramp up to (default: 100)
  -rd RAMPUP_DURATION, --rampup_duration RAMPUP_DURATION
                        Time to take to ramp up to target percentage of line rate, in seconds (default: 600)
  -hd HOLD_DURATION, --hold_duration HOLD_DURATION
                        Time to hold the loadtest at target percentage, in seconds (default: 30)

It’s worth pointing out that almost all systems are pps-bound not bps-bound. A typical rant I have is that network vendors are imprecise when they specify their throughput “up to 40Gbit” they more often than not mean “under carefully crafted conditions” such as utilizing jumboframes (9216 bytes rather than “usual” 1500 byte MTU found on ethernet, which is easier on the router than a typical internet mixture (closer to 1100 bytes), and much easier yet than if the router is asked to forward 64 byte packets, for instance in a DDoS attack); and only in one direction; and only using exactly one source/destination IP address/port, which is a little bit easier to do than to look up a destination in a forwarding table containing 1M destinations – for context a current internet backbone router carries ~845K IPv4 destinations and ~105K IPv6 destinations.

Results

Product Loadtest Throughput (pps) Throughput (bps) % of linerate Details
Linux imix 150.21 Kpps 452.81 Mbps 45.28% apu6-linux-imix.json
OpenBSD imix 145.52 Kpps 444.51 Mbps 44.45% apu6-openbsd-imix.json
VPP imix 654.40 Kpps 2.00 Gbps 199.90% apu6-vpp-imix.json
Danos imix 655.53 Kpps 2.00 Gbps 200.24% apu6-danos-imix.json
Linux 64b 96.93 Kpps 65.14 Mbps 6.51% apu6-linux-64b.json
OpenBSD 64b 152.09 Kpps 102.20 Mbps 10.22% apu6-openbsd-64b.json
VPP 64b 1.78 Mpps 1.19 Gbps 119.49% apu6-vpp-64b.json
Danos 64b 2.30 Mpps 1.55 Gbps 154.62% apu6-danos-64b.json

Results

For more information on the methodology and the scripts that drew these graphs, take a look at my buddy Michal’s GitHub Page, which, given time, will probably turn into its own subsection of this website (I can only imagine the value of a corpus of loadtests of popular equipment in the consumer arena).

Caveats

The unit was shipped to me free of charge by PC Engines for the purposes of load- and systems integration testing. Other than that, this is not a paid endorsement and views of this review are my own.

Open Questions

SFP I2C

Considering the target audience, I wonder if there is a possibility to break out the I2C pins from the SFP cage into a header on the board, so that users can connect them through to the CPU’s I2C controller (or bitbang directly on GPIO pins), and use the APU6 as an SFP flasher. I think that would come in incredibly handy in a datacenter setting.

CPU bound

The DPDK based router implementations are CPU bound, and could benefit from a little bit more power. I am duly impressed by the throughput seen in terms of packets/sec/watt, but considering a typical router has a (forwarding) dataplane and needs as well a (configuration) controlplane, we are short about 30% CPU cycles. If a controlplane (like Bird or FRR (ref) is dedicated one core, that leaves us three cores for forwarding, with which we obtain roughly 154% of linerate, we’ll need that 200/154 == 1.298 to obtain line rate in both directions. That said, the APU6 has absolutely no problems saturating a gigabit in both directions under normal (==imix) circumstances.

Appendix

Appendix 1 - Terminology

Term Description
OADM optical add drop multiplexer – a device used in wavelength-division multiplexing systems for multiplexing and routing different channels of light into or out of a single mode fiber (SMF)
ONT optical network terminal - The ONT converts fiber-optic light signals to copper based electric signals, usually Ethernet.
OTO optical telecommunication outlet - The OTO is a fiber optic outlet that allows easy termination of cables in an office and home environment. Installed OTOs are referred to by their OTO-ID.
CARP common address redundancy protocol - Its purpose is to allow multiple hosts on the same network segment to share an IP address. CARP is a secure, free alternative to the Virtual Router Redundancy Protocol (VRRP) and the Hot Standby Router Protocol (HSRP).
SIT simple internet transition - Its purpose is to interconnect isolated IPv6 networks, located in global IPv4 Internet via tunnels.
STB set top box - a device that enables a television set to become a user interface to the Internet and also enables a television set to receive and decode digital television (DTV) broadcasts.
GRE generic routing encapsulation - a tunneling protocol developed by Cisco Systems that can encapsulate a wide variety of network layer protocols inside virtual point-to-point links over an Internet Protocol network.
L2VPN layer2 virtual private network - a service that emulates a switched Ethernet (V)LAN across a pseudo-wire (typically an IP tunnel)
DHCP dynamic host configuration protocol - an IPv4 network protocol that enables a server to automatically assign an IP address to a computer from a defined range of numbers.
DHCP6-PD Dynamic host configuration protocol: prefix delegation - an IPv6 network protocol that enables a server to automatically assign network prefixes to a customer from a defined range of numbers.
NDP NS/NA neighbor discovery protocol: neighbor solicitation / advertisement - an ipv6 specific protocol to discover and judge reachability of other nodes on a shared link.
NDP RS/RA neighbor discovery protocol: router solicitation / advertisement - an ipv6 specific protocol to discover and install local address and gateway information.
SBC single board computer - a compute computer with all peripherals and components directly attached to the board.