The "Billion Dollar Mistake" Lives On In Rust
Tony Hoare famously calls the invention of NULL pointers to serve as a kind of universal empty or invalid value his “billion dollar mistake.” This unfortunate choice has since been reflected on by many modern languages, which have introduced various kinds of optionals to their type systems to annotate when values may be absent and encourage programmers to handle that case gracefully. Some compilers even provide niche optimizations to make these zero-cost!
So, problem solved, right? Well, not so fast. I’ll argue here that the same mindset that infected so many early programming languages lives on, even in communities as focused on type safety as the Rustaceans are. To do that, let’s first talk about what exactly made NULL so insidious.
What’s Wrong With NULL?
You see, it’s not just the fact that you could produce undefined behavior by dereferencing *NULL in C, or that you’d hit NullPointerException calling methods on it in Java: the ability for code to fail by requiring valid data is alive and well in Rust’s unwrap, which panics upon encountering None. No, its true problem is much more sinister.
No, the issue with NULL is that it corrupts the type system: it is a valid value for any pointer type, no matter what sort of data that type is supposed to represent. In doing so, it makes it impossible to express the idea of a value that is guaranteed not to be in this problematic exceptional state. A type theorist would say that a type is the set of possible values it can take, and NULL was required to be placed in every set. By granting it this universal power we forced ourselves to constantly worry “could that be NULL?”
But gosh was it convenient: for any type, no matter how big or small, we had a way to express an empty, invalid, or uninitialized state. That’s super common to want to reason about, and it’s convenient for that not to require a wrapper which changes the type. Who hasn’t reached a point in Rust code where they knew their value would be Some - and wouldn’t just being able to use the value be more concise and no less safe than unwrap? Why would we have the function if those cases weren’t common?
Introducing Default
Well, some folks seem to have found a way to have their cake and eat it too. Allow me to introduce you to the Default trait:
pub trait Default: Sized {
fn default() -> Self;
}
Seemingly innocuous enough, this trait consists of a single function which gives a sane default value for your type. E.g., Vec::default() will give us an empty vector. Don’t want to deal with the possibility your key isn’t present? Try map.entry().or_default().1 Bored when initializing your long struct? Try
let x = Large { field1 = a, field2 = b, ..Default::default() }
my coworkers are even in the habit of checks like if x == Default::default(). Note how Default::default(), like NULL, is a sort of universal value that works as a placeholder on any type (that implements Default, as most types are encouraged to - there’s even a linter rule against having a new without Default).
Is Default Better?
Default seems to have recovered some of the NULL’s utility as an empty value, but without our big pain points:
- It always gives a proper value of its type, and so can generally be used without just propagating failure like
NullPointerException. - That means it has not introduced an additional value to types which implement it, so the types retain their expected set of possible values and we need not always check for a special case.
But if not a special value that’s a member of every type, then what value, pray tell, will be the default? Well let’s start with the simple case of integers. Their default value is zero. Is that sensible?
Well it would depend on the use. Integers are commonly used as counters, or to sum values, and zero is the additive identity, so in those cases it probably works well. But if you’re instead multiplying numbers, you’d probably want one as your identity, and if you’re not doing math at all, but recording some other metric with interesting meaning, maybe neither would be safe to use in general? However, since traits are defined uniquely per type, Default can never be specific to its situation and must attempt to capture all possible uses.
Is Default Worse?
It cannot really do that of course, and so often the most you can say about a default value is that it is a legal member of the type, and you’ll be tasked with turning it into something meaningful on your own. This is all to say, that often Default can take the place of uninitialized data: that which the programmer is required to overwrite later with something meaningful. And now just like with NULL, we have escaped the type system, which cannot hope to tell us whether we appropriately overwrote our default value, or whether it may still be lurking as garbage in any of our values.
In fact, I’d almost say this can be more dangerous. While NULLs were prone to fail fast and loudly, uninitialized values guaranteed to be valid are much sneakier: they can hide out in your codebase and function mostly adequately, until one day you discover your application logic’s output is subtly wrong. This is explicitly counter to Rust’s usual philosophy, exposing errors in the compiler if possible, or otherwise at least tending to panic loudly rather than permit undefined behavior.
Takeaways
Using Default in Rust can genuinely make your code more concise and easier to write, which I do not mean to belittle. Not unlike NULL pointers in other languages. Nonetheless, outside of quick prototypes, most code is read many more times than it is written, and so the reader should usually be prioritized.
As a reader, when I see Default::default(), the main takeaway I get is that the writer didn’t care about the value it produces. If it were meaningful to the writer that the integer were zero, or that the set was empty, let alone any other less common type, they should have told me what that value is: if only because I might not have memorized what its default was.
Furthermore, I would generally argue that having data you don’t care about is an anti-pattern. If it’s in your code, it should mean something! So next time you’re initializing: consider being explicit, tell me what is the right default for you in your particular use case.
Footnotes
-
I’ve got a separate gripe that it’s not sufficiently explicit that this is a mutating operation, which inserts the result of
Default::default()intomap. ↩