AWS popularized Arm-based CPUs with their Graviton line. When compared with x86, this new CPU architecture offers 40% better price/performance; this makes Arm particularly appealing for compute-intensive workloads such as HPC, video encoding, and data science. Arm-based CPUs also consume up to 60% less power, making them attractive for use in large data centers.
However, ARM adoption is hindered by the need for software compatibility. For example, thanks to Microsoft and Intel / AMD’s focus on backwards compatibility, many apps built a decade earlier can still run on Windows today. Additionally, many proprietary apps and libraries were traditionally only built to run on x86.
Running on arm64 requires support from the underlying OS and its build chain; and one needs source code access to build an app on arm64. Fortunately, Ubicloud runs on an open source software stack. Our virtualization stack had arm64 support since the beginning; this includes Linux, our hypervisor, firmware, machine images, and SPDK (block storage).
Still, we expected that enabling a new CPU architecture on a cloud platform would be tricky. We were pleasantly surprised that all it took was 151 lines of code. So, we wanted to share with you what we learned in enabling arm64 and the challenges we ran into in the process.
migrate/20231014_arm64.rb | 15 +++++++++++
prog/learn_arch.rb | 11 ++++++++
prog/vm/host_nexus.rb | 3 +++
prog/vm/nexus.rb | 7 ++---
rhizome/common/bin/arch | 6 +++++
rhizome/common/lib/arch.rb | 27 +++++++++++++++++++
rhizome/host/bin/prep_host.rb | 10 +++----
rhizome/host/bin/setup-spdk | 10 ++++++-
rhizome/host/lib/cloud_hypervisor.rb | 51 +++++++++++++++++++++++++++++++++---
rhizome/host/lib/vm_setup.rb | 24 ++++++++++++-----
spec/prog/learn_arch_spec.rb | 23 ++++++++++++++++
spec/prog/vm/host_nexus_spec.rb | 3 +++
spec/prog/vm/nexus_spec.rb | 5 ++--
13 files changed, 173 insertions (+), 22 deletions (-)
We were able to enable arm64 instances with the above diffset. With these changes, we standardized naming for CPU architectures, removed the need to manually enter new CPU names, updated our VM placement logic, and dropped in the new binaries required for ARM architectures. Here’s a bit more on each of these changes.
Normalize naming to x64 and arm64: The first problem was that nobody agreed on CPU architecture names. Example architecture names that we saw were arm64, AArch64, a64, x64, x86, x86-64, amd64, x86_64, and Intel 64.
There were many valid choices here. We chose a pair of names that were somewhat common, different from one another, short, and pronounceable. For example, Debian nomenclature uses amd64, but we found this name to be unfortunately close to arm64. Similarly, we weren’t quite sure how to pronounce AArch64 and removed that as an option.
In the end, we made some arbitrary choices and standardized on x64 and arm64. We then implemented 27 lines to standardize all target CPU architectures on x64 and arm64.
“Learn” and store normalized CPU architecture names: Next, we didn’t want to manually type in the CPU architecture name when an operator was adding new machines to the system. This was also time consuming when you were doing development. So instead, Ubicloud now asks the machine for its architecture, we use the previous step to create a normalized name, and save this state in a column in our control plane database.
We learn other machine attributes this way, like the number of CPU cores or total memory. This way, we can maintain symmetry in our system. This change took 35 lines of code.
VM allocation takes CPU architectures into account: We have a simple “allocator” that assigns virtual machine (VM) allocation requests to the underlying hardware. Today, this allocator is a simple PostgreSQL query that assigns the request to the most lightly loaded machine that meets the allocation constraints. Example allocation constraints included enough cores, huge pages, available disk, and cloud region. This change introduced the CPU architecture family as an additional constraint in the SQL query’s WHERE clause.
As a side note, we’ve now grown to the point that we need to further update our allocator logic. These updates will give us more flexibility in applying more sophisticated allocation algorithms; and also supporting more types of hardware constraints.
New arm64 binaries: Finally, our virtualization stack uses four open source projects that need separate binaries for arm64. These projects are the Cloud Hypervisor (our virtual machine monitor), firmware, guest OS images, and SPDK to handle our disk I/O.
When our control plane cloudifies an existing bare metal server, it now checks the server’s CPU architecture. We then download x64 or arm64 packages according to the underlying CPU architecture.
As we worked on enabling arm64 VMs, we also ran into two high level challenges, where there’s still room for improvement.
(1) Inflexible hardware configuration: Today, we use Hetzner as our bare metal provider. Hetzner allows you to customize x64 machines; and we add RAM, storage capacity, and network bandwidth to our x64 instances.
Comparatively, Hetzner’s RX series come with two options. RX170 has 80 cores, 128 GB of RAM, and 2 TB of storage. RX220 comes with 80 cores, 256 GB RAM, and 8 TB of storage. The only upgrade option on these instance types is increasing the network uplink, which we take. This inflexibility presents us with three problems.
(1)(i) Expensive development environment: On x64s, our team uses inexpensive Hetzner "auction" computers to create a production-like host/guest OS cloud environment on bare metal servers. When you need to test small changes, having 80 cores becomes an overkill. Also, setting up the arm64 machines and releasing them sometimes takes a few days, so the process isn’t very elastic.
Solutions? One option is nested virtualization. With this approach, we just lease one machine and then we could assign 4-16 cores to one development environment. This would also speed up our development cycle. Today, when we need to reimage the host OS, we need to do the reimaging and then do a power cycle. This introduces at least a few minutes of overhead into our development process. If we could do nesting, we’d both pay less in costs and also develop faster. However, nested virtualization brings additional complexity; and if we do something this complicated for development-only, the development environment may not model our production set-up.
Another option is to lease arm64 instances from Equinix. However, Equinix has a different network arrangement which we’d now need to support. Also, Equinix charges about 10x in hourly rates as Hetzner does. So, we’d need to remember to start / stop these bare metal servers, or incur meaningful cost. The simplest decision here is to keep a few RX220s around for development and pay the $250/month in development costs.
(1)(ii) Compute to memory ratio is high: Since we can configure x64 instances, we can ensure a ratio of 1 physical core (2 vCPUs) to 8 GB and define that as our standard instances.RX220s come with 80 physical cores and 256 GB of RAM. This makes it impractical to maintain the same compute to memory ratio between our x64 and arm64 instances.
For core counts, we follow AWS’s approach. 1 core means 2 vCPUs on x64 on account of Simultaneous Multithreading (SMT); and 1 core means 1 vCPU on arm64, which does not support SMT. We also reduce the amount of memory allocated per core from 8 GB to 6 GB to ensure every core can be used for GitHub Actions in parallel. You can see how the ratios compare for our GitHub Actions runners in our documentation.
(1)(iii) Wasteful NVMe usage: We use the Storage Performance Development Kit (SPDK) to handle our disk I/O. SPDK is an open source project for writing performant, scalable, and user-mode storage applications. We like SPDK because it sits in user land in the host OS; and not being in the kernel helps us iterate faster.
SPDK’s ideal use case is that you allocate a whole card to SPDK to control directly from user space. This optimizes out context switches. So when you’re booting, the host OS can use one of the cards for its activities. Further, we can dedicate the hardware channel to Ubicloud VMs and use the system disk for logging at the host OS level. With the inflexible RX220 configuration, we lose that option.
(2) Binary building method and versioning: The second challenge we faced with arm64 was with how we build binaries for our virtualization stack.
(2)(i) Binary building method: Ubicloud’s control plane uses four open source projects when cloudifying an existing bare metal. These are Cloud Hypervisor, firmware, guest OS images, and SPDK (to handle our disk I/O). Prior to introducing arm64, we didn’t have a formal way to build these binaries and their new versions. For example, this very early commit message alludes to how we built the binaries for the edk2 firmware.
When we introduced arm64 VMs, we doubled the number of binaries and versions we needed to build. So, we needed a more formal way to build x64 and arm64 packages. At Ubicloud, we rely on GitHub Actions as our official way to build packages. However, GitHub Actions doesn’t support building arm64 packages in a native way. We wanted to enable building arm64 packages on GitHub Actions through the Ubicloud integration and that’s why we started working on arm64 VMs in the first place. So, we had a circular dependency.
We broke this dependency by releasing first arm64 binaries with manual builds. Once we enabled Ubicloud’s arm64 integration, we could create release pipelines for various packages and their versions. For example, we started building and releasing arm64 binaries for the edk2 firmware through this open source repository, licensed under Apache 2.0.
https://github.com/ubicloud/build-edk2-firmware
(2)(ii) Version skew: For x64 bare metal instances, our control plane used to download the latest stable versions of the four binaries. We also wanted to group some of these binary versions together. For example, the Cloud Hypervisor and the firmware work together; and it makes sense to track their deployed versions.
With arm64, the versioning problem became more pronounced. So, we started versioning and installing Cloud Hypervisor, our firmware, and SPDK releases. Our release pipeline then tests new versions using regression and end-to-end tests. For SPDK, we also track which SPDK versions are installed on bare metal instances and the ones used by the virtual machines. We plan to do something similar for the Cloud Hypervisor and our firmware in the future.
We introduced arm64 VMs on Ubicloud earlier this year. The changes were pedestrian and took 151 lines of code. In the process, we also ran into two challenges associated with introducing a new CPU architecture. First, we used Hetzner’s arm64 bare metal machines and we couldn’t customize the underlying hardware. This led to us being somewhat wasteful in terms of the compute and disk usage on these machines. Second, we doubled the number of binaries to build, version, and deploy to Ubicloud’s data plane. This required us to improve our build and release pipeline.
As always, if arm64 VMs are interesting to you or you have comments about this blog post, you can reach us at [email protected]. We'd then be happy to continue the conversation.