13 Years of Building Infrastructure Control Planes in Ruby

August 16, 2024 · 9 min read
Daniel Farina
Founder/CTO

Ubicloud has the ambition of writing an open source alternative to AWS, Azure, or GCP. Its control plane is written in Ruby, a fact that surprises some people.

Our cofounder, Daniel, has been building control planes in Ruby for 13 years - first at Heroku and now at Ubicloud. He recently presented at the YC Ruby Meetup and SF Bay Area Ruby Meetup. We received solid feedback about Daniel’s talk, so we edited his talk into a blog post.

This blog post has two high level parts to it. The first part sets the context by describing Daniel’s unusual journey with Ruby and how he (re-)interprets Ruby’s strengths. The second part expands on that and describes five features that make Ruby particularly powerful for building infrastructure code.

Why Ruby?

Personally, I have never engaged with Ruby as a community member. I've never given any talks like this. I viewed myself as simply someone who used Ruby and found it practical for a certain class of problems and didn’t find alternatives that looked as practical, which I review on occasion.

Nowadays, people who don't use Ruby have some surprise when I mention Ubicloud’s control plane is written in Ruby. They usually find this fact interesting and probe me a bit to see if I have detailed justifications for it (which I do), or just never examined what I do closely.

More apparent is when I talk to people who do use Ruby presently -- and admittedly, this is few nowadays -- that there is an undercurrent of anxiety about the language's relevance or popularity. This is also reflected on, say, the Ruby Reddit or similar gatherings. Relevant to this subject, I wanted to contribute my unusual experience with Ruby.

Perspective outside Rails

The main thing that makes my experience of Ruby weird is that it has never involved Rails, except peripherally. 

If you look at this slide, I have a legend of "normalness." This can be roughly defined as how frequently people ask me about a dependency. Some dependencies at certain times need no further justification, others are seen as unusual.

Postgres has become more normal. It's hard to convey the magnitude of the change now, thirteen years later, but to set the historical scene, in 2011, people would ask "Why is Heroku shipping this somewhat known but weird database when Rails is a MySQL project? Occasionally Rails doesn't work right with it, and you are a Rails platform. Why are you rolling this boulder uphill?"

Conversely, Ruby has become less normal, and people ask about it. So the two have swapped places. Has this had a negative, or even noticeable impact on my engineering life? No, not really.

Reinterpreting Ruby

If life basically got better, why did Ruby become weird? Did something bad happen?

I don't know for sure. I never gave much thought to being popular, since various important parts of my dependency chain were unpopular for my entire career. In my world, nothing bad happened. Rather, the opposite, things got somewhat better, at least from the engineering priorities I had.

One important thing to understand is that the engineering priorities I hold aren’t unusual, but they do clash with people's conception of Ruby. This holds for both the median Ruby programmer and the median programmer who doesn’t use Ruby.

So, I want to talk about Reinterpreting Ruby.

Common Interpretations

Seen here are what I think are some common ways people interpret Ruby. Some are positives, some are negatives, and some can be seen as the positive and negative manifestations of the same phenomena.

Now, perhaps it's important to understand this about me personally: I was hired at Heroku for my knowledge of and affinity for Postgres. Having worked on it before, I decided I enjoyed Postgres. Reminder: Postgres was weird at this time. I had no Ruby experience, and frankly, the sense I got from Ruby's "vibe" of 2011 is that the values of the community were very different than the ones I found attractive in Postgres and identified with.

That said, I think some of these sublime or romantic qualities were important in developing parts of Ruby and libraries in Ruby that I relied on to execute my projects.

Reinterpretation

Ultimately, I kept my values. I also found Ruby to have the strongest virtues of any programming environment, from a set of priorities that are more closely identified with mature database or operating system software.

The rest of the talk will get a bit more specific, again, trying to ground these interpretations in tangible things. You might find that you appreciate and interpret these tangibles differently, and that's the point of reinterpretation.The following are five tangible reasons to use Ruby in the way that I reinterpret the language.

(1) Language Stability and Restraint

In 2011, I started writing Ruby reluctantly. The chaos of 1.8 to 1.9 was still fresh. Rails, and thus Ruby in North America, was MySQL-dominant. This wasn’t the scene that I found most natural. 

In 2013, when Ruby 2.0 introduced keyword arguments, I basically became satisfied with the language. I also noticed that the amount of fluctuation decreased, which was pleasing. Better still, this low fluctuation has continued.

In 2016, the Ruby 3x3 initiative was announced, beginning a long period of polishing for the interpreter on difficult features most can agree are good things. These included features such as just in time code generation and wrestling with multiple processors.

In 2021, YJIT went upstream. As an aside, I've been rather impressed with Shopify's system work in Ruby. Their RBI generator, Tapioca, made Sorbet usable for me. I'm not sure how Shopify does it, but they manage to produce good system software that works well for people outside the company than most organizations I've seen.

With the benefit of hindsight, I would classify 2013 as important for Ruby: it was the first time it began to stabilize. Though 2.0 was not as painful an upgrade from 1.9 as 1.9 from 1.8, it added a feature I thought well worthwhile that would change the look of Ruby programs that came after.

My Ruby upgrade costs after this point were minimal, and my code reading cost would also decrease as keyword arguments became more common.

(2) Stability in Key Libraries

Libraries are often mentioned as a strong point for Ruby, and it's sometimes accepted, but I find the vagueness of this statement unsatisfying.

For a very conventional type of problem, that is, applications that use a relational database and HTTP, Ruby libraries stand out. These include "key" libraries where the interface with my application is large. This makes stable library design important, to ease upgrades and not cause distortions in my own code.

Sequel, in particular, I've used for over a decade now. Sequel is feature rich: it adds new Postgres features while the next release is in beta. It is an existence proof that, with the right methodology, most of us could stand to both have more functionality, less code, and fewer bugs. Basically, be better in every dimension: most of us aren’t operating at the efficient boundary of software, where one virtue must be exchanged for another.

The same author is responsible for Roda and Rodauth, and the methodology by which he runs these projects is uniform. More than any other programmer, I am puzzled about his methodology the most.

- Jeremy Evans

Those who have heard of at least one Roda, Sequel, or Rodauth might have heard of its author, Jeremy Evans.

Sequel was his first well-known work, which we used with Sinatra. Within a year of using Sequel, we were so puzzled by the level of software quality that we invited Jeremy to give a talk at Heroku in 2012 about his development methodology. This talk is among his indexed presentations. I would suggest reading that one. Carefully.

Thoughts on building Sequel

https://www.flickr.com/photos/alexsingh/

He has good talks on each of the projects in detail. Since Jeremy maintains all three projects, it makes sense why you’d use these dependencies together and expect them to work.

- Check out roda-sequel-stack

A stand-alone HTTP routing library, like Sinatra or Roda, doesn’t offer this. But, Jeremy Evans has a repository that has a prescription on how to use Roda and Sequel together. Rodauth is a bit more constrained in how you integrate it into Roda, so for that, you read the manual.

Normally, such attempts to combine software are fragile as each project moves in its own direction, but, there's a special consideration here given the unification of maintenance and design theory.

(3) Rigorous Testing

Okay, we have to discuss Jeremy Evans just a bit more. Another practice he follows is one hundred percent line coverage. So, Sequel, Roda, and Rodauth are under this regime.

This is a practice of his I started to understand. I adopted into my projects starting at Citus Data in 2016, having been curious about this practice for a few years. Later, when the coverage code learned branch coverage, Jeremy upgraded to 100% branch coverage, and so did I.

I could...and have...given a whole talk about the theory by which this works. But practically, you should know that Jeremy does it with Minitest. I use RSpec, in what I see as an adequate but unexamined choice of mine. I think something better than both is possible to write, but hasn't been written yet.

 - Are tests a crutch for lack of static typing?

A common question I get when I mention the extremely high coverage levels is "Is this a coping mechanism for lack of static types?" The answer is no.

I say this, having both spent a lot of time reading and writing programs with static types, and using statically typed Ruby via Sorbet in a previous system.

The reason is this: my domain is in the systems software world. Things work "most" of the time, but I have to be most interested in the small fraction of time things don't work, often themselves operating outside their design envelope. The downside risks tend to be asymmetrically large.

If you have 100% branch coverage, it doesn't mean you've covered all the cases. But it does mean that, whenever an obscure fault is understood in production, or even merely observed in development, there is an incremental path to add it to the base of knowledge in the tests: there are no spans of code with no test model. You can always find a pre-existing test that lands you very close to the test you want to write. There is no code without a theory of testing.

- Is static typing interesting?

Though I stopped using Sorbet in the latest incarnation of the Ruby control plane I have been writing (this is Ubicloud’s control plane), I don't want to write off static typing for Ruby. But I think the benefits have more to do with code navigation and perhaps speed of testing for a certain class of bugs that are tedious, rather than truly obscure.

I just thought Sorbet had too many problems to make it a high priority to add again. Maybe I'll do it later, though I am watching the alternate Ruby type checkers on occasion with interest.

Unlike before I used Sorbet, I am reasonably confident I could statically type Ubicloud's control plane.

(4) Rigor & Economy in Operations

Next, I wanted to discuss rigor in operations.

In short, the REPL, the Read-Eval-Print-Loop, is essential for teamwork and notetaking.

Instead of abstracting how an assessment was made, paste the transcript.Instead of wondering precisely how a determination was made, read the contemporary notes with the code and output reflected from the history. Refer to contemporary notes from previous, related problems, and make your own assessment of relevance and danger.

Finally, when working with others on urgent issues, be able to print the exact methodology of what you want to do and get a review before doing it. Then, be able to refer to what you did.Along with the difficulty of capturing knowledge in tests, this is why I levy a massive penalty against programming environments without interpreters. Sometimes, it's efficient and economical to have fairly lengthy disposable programs that can combine ORM queries, API queries, and some imperative code in the REPL.

I will contend this: Ruby is not only better at this than other languages, but significantly better. Given similar languages with capable REPLs, it’s a fair question to ask: why?

(5) Blocks

The answer is simply blocks. "What is this, 2009?" When I first encountered Ruby, I thought blocks were an interesting variant of lambda expressions and higher order programming.

Ruby also supports lambda expressions; and I thought blocks were just strange lambda literals. Ick, Ick, Ick! Duplicate near-substitute features! What is this garbage?!

Blocks violated the common-case design theory that there should be few language features and they should be orthogonal. In spite of this, I've since come around that blocks are fundamental, and the fact you don't have blocks in Typescript or Python is a double-digit percentage efficiency that can’t be ignored. And it shows up in how concise flow and unified flow control is in Ruby.

The icing on the cake is that blocks are adequately whitespace insensitive, making them easier to use in the REPL. Though we can be more generous with whitespace in files, the differences from Python become more invisible thanks to blocks in the REPL.

Conclusion

I think Ruby fits my problem of choice very well, and it has gotten better over the years. Its concision via blocks, the capability for thorough tests without obscure programming, the rigorous documentation and information exchange via REPL transcripts, and the exceptionally high quality of key libraries reduce costs.

Those reduced costs translate into reduced personnel. And to accomplish what you need to accomplish with reduced personnel not only saves you money, but gives you the most cohesive and effective program.

These are not the standard ways to analyze Ruby: as stable, careful, and rigorous. But I think it can be used in a way that surpasses other programming environments on these grounds. Thank you.