Today, we are excited to announce transparent caching for GitHub Actions. This feature improves your GitHub Actions cache and Docker caching performance by 4x. You can enable it by clicking a button on the Ubicloud UI.
Transparent caching gives you immense flexibility when using Ubicloud’s runners. You can switch in and out of Ubicloud’s runners without making any changes to your workflow files. You can run your CI/CD pipelines in a matrix configuration across GitHub’s default runners and Ubicloud to evaluate their price / performance.
To make evaluating Ubicloud easier, we also provide 30 GB of free cache size per week with the transparent caching feature. At 3x of GitHub’s default cache size, 30 GB is the highest free cache size in the industry.
This blog explains how the GitHub Actions cache works, how we cracked its internals, and how Ubicloud introduced transparent caching. We’ll also describe the performance benefits of using Ubicloud’s caching infrastructure.
GitHub Actions uses Azure blob Storage to save cached files. Azure’s object store works well for default GitHub runners. Ubicloud’s runners however operate outside of Azure; and fetching files from Azure may increase job latencies.
To solve this challenge, we created our own cache infrastructure using Cloudflare’s R2. Since we had control over region selection, we could place the cache closer to Ubicloud. We also encrypted the cached data and stored each repository’s cache in an isolated R2 bucket.
To help you use this infrastructure, we then introduced a drop-in replacement for action/cache and named it ubicloud/cache. We also introduced support for caching with package managers. For example, GitHub had the cache action actions/setup-python to set up Python packages. You could simply switch to ubicloud/setup-python to benefit from Ubicloud’s cache.
This approach had two drawbacks however:
To overcome these issues, we decided to introduce “transparent caching”. For this, we first needed to reverse engineer the GitHub Actions cache.
Since each job runs on an ephemeral VM, caching files between runs requires storing them in remote storage. GitHub Actions saves these files in Azure Blob Storage. There are three main ways to cache files between runs:
Direct File/Folder Caching: The simplest method to cache files between runs is to use the actions/cache action, which lets you specify files or folders to cache.
Package Manager Caching with setup-* Actions: For package managers, GitHub offers language specific actions like actions/setup-go, actions/setup-python. For some languages, the open source community also offers third party actions, such as Swatinem/rust-cache.
Docker Layer Caching: For Docker builds, caching unchanged Docker layers helps speed up image builds. docker/build-push-action provides caching for Docker layers. You can pass gha as type to cache your files on GitHub’s cache infrastructure like
Behind the scenes, GitHub’s actions/toolkit handles cache save and restore operations via the GitHub Cache Server and Azure Blob Storage. All actions/cache, actions/setup-* ,and docker layer caching with docker/build-push-action is built on top of this infrastructure.
The above diagram shows how GitHub Actions caches files. On the VM, the runner application is responsible for running GitHub Actions jobs. When the application needs to save or restore files, it sends requests to the GitHub Cache server. The cache server in turn returns the Azure Blob Storage bucket URLs to use in caching.
Ubicloud’s transparent caching works by introducing another layer on the runner VM. This layer, a cache proxy server, redirects requests related to caching to Ubicloud’s cache server. The cache server in turn returns URLs that resolve to Ubicloud’s cache infrastructure.
To do this, we reviewed GitHub’s runner application code and discovered that it used an environment variable to specify the cache server address. We forked that code to set that environment variable to a local proxy cache server running on each VM. This way, each cache request hits our local cache proxy instead of GitHub’s cache server.
The local cache proxy server then redirects cache related requests to Ubicloud’s cache server. The cache server in turn provides URLs that resolve to Ubicloud’s cache infrastructure on Cloudflare R2. The cached files are encrypted and isolated into separate buckets in accordance with industry best practices.
The nice thing about this change is that it’s compatible with all GitHub cache actions. You can use actions/cache, actions/setup-* , or docker/build-push-action, and they should all see performance improvements of 4x.
To better quantify those performance benefits, we ran a series of benchmarks. Those benchmarks compared download and upload performance across three environments: (i) using the GitHub Actions cache on Ubicloud’s runners, (ii) today’s transparent caching feature, and (iii) the original ubicloud/* cache actions (Ubicloud cache v1).
In those benchmarks, transparent caching improves download speeds by 4x and upload speeds by 3x, compared to the baseline of using the GitHub Actions cache. The original ubicloud/* cache action performs similarly in download benchmarks, but outperforms the transparent cache by 65% on upload speeds. This is because the ubicloud/* actions have an optimization that increases concurrency during file uploads.
Github Cache | Transparent Cache | Using ubicloud/* actions | |
---|---|---|---|
Download | 20-25 MB/Sec | 80-100 MB/Sec | 80-100 MB/Sec |
Upload | 20-25 MB/Sec | 60-80 MB/Sec | 100-120 MB/Sec |
Cache Size | 10 GB | 30 GB | 30 GB |
Ubicloud caching’s second benefit comes with its free cache space. Ubicloud, starting today, offers 3x more cache space than the GitHub Cache. To our knowledge, 30 GB is also the largest free cache space offered in the industry.
We also won’t stop there. We’re going to work on making three improvements to transparent caching. First, GitHub Actions is working on the second version of their cache. We’ll integrate these changes into transparent caching. Second, we’re going to port the concurrency changes in the ubicloud/* cache to the transparent cache. Third, we’re looking into swapping Cloudflare’s R2 with Ubicloud’s own object storage solution that runs within our data centers.
With these upcoming changes, we expect Ubicloud caching to become an unarguable leader in its price / performance.
Today, we announced Ubicloud’s transparent caching feature for GitHub Actions. To our knowledge, this is the first solution that improves cache performance for file, package, and Docker layer caching together.
In this blog post, we talked about the internals of GitHub Actions cache. We then described transparent caching that brings the following benefits:
If you’re already using Ubicloud runners, you can enable this feature by clicking a button. Simply log into the Ubicloud console, go to the GitHub Runners tab, and click “Enable Ubicloud Cache” on the Settings section. Your next run will upload your cache files and subsequent runs will download from the Ubicloud cache.
As always, we’d also be happy to hear from you. If you have any questions or comments, please drop us an email anytime at [email protected].