I like this. Very much falls into the "make bad state unrepresentable".
The issues I see with this approach is when developers stop at this first level of type implementation. Everything is a type and nothing works well together, tons of types seem to be subtle permutations of each other, things get hard to reason about etc.
In systems like that I would actually rather be writing a weakly typed dynamic language like JS or a strongly typed dynamic language like Elixir. However, if the developers continue pushing logic into type controlled flows, eg:move conditional logic into union types with pattern matching, leverage delegation etc. the experience becomes pleasant again. Just as an example (probably not the actual best solution) the "DewPoint" function could just take either type and just work.
Yep. For this reason, I wish more languages supported bound integers. Eg, rather than saying x: u32, I want to be able to use the type system to constrain x to the range of [0, 10).
This would allow for some nice properties. It would also enable a bunch of small optimisations in our languages that we can't have today. Eg, I could make an integer that must fall within my array bounds. Then I don't need to do bounds checking when I index into my array. It would also allow a lot more peephole optimisations to be made with Option.
Weirdly, rust already kinda supports this within a function thanks to LLVM magic. But it doesn't support it for variables passed between functions.
Academic language designers do! But it takes a while for academic features to trickle down to practical languages—especially because expressive-enough refinement typing on even the integers leads to an undecidable theory.
Aren't most type systems in widely used languages Turing complete and (consequently) undecidable? Typescript and python are two examples that come to mind
But yeah maybe expressive enough refinement typing leads to hard to write and slow type inference engines
I think the reasons are predominantly social, not theoretical.
For every engineer out there that gets excited when I say the words "refinement types" there are twenty that either give me a blank stare or scoff at the thought, since they a priori consider any idea that isn't already in their favorite (primitivistic) language either too complicated or too useless.
Then they go and reinvent it as a static analysis layer on top of the language and give it their own name and pat themselves on the back for "inventing" such a great check. They don't read computer science papers.
Hello! I was curious if you would happen to have any advice or particular comp sci papers you would point as aspiring compiler developer towards.
I think I'm sort of who you're talking about. I have no formal education and I am excited to have my compiler up to the point I can run a basic web server. I think it's a fairly traditional approach with a lexer, recursive decent parser, static analysis, then codegen. I'm going for a balance between languages like Ruby and Rust to get the best of both worlds.
You'll probably find it funny that I don't know the name for the technique Im using for dynamic dispatch. The idea is that as long as a collection doesn't have mixed types then the compiler statically knows the type even in loops and such. Only for mixed type collections, or maybe trait functions, will the compiler be forced to fall back to runtime dynamic dispatch. I find this cool because experts can write fast static code, but beginners won't be blocked by the compiler complaining about things they shouldn't have to care about yet. But, syntax highlighting or something may hint there are improvements to be made. If there is a name for this, or if it's too small a piece to deserve one, I would be very curious to know!
On Refinement Types, I not sure they are a good idea for general purpose languages and would love to be challenged on this. Succinctly, I think it's a leaky abstraction. To elaborate, having something like a `OneThroughTen` type seems helpful at first, but in reality it's spreading behaviour potentially all over the app as opposed to having a single function with the desired behaviour. If a developer has multiple spots they're generating a number and one spot is missing a check and causes a bug, then hopefully a lesson was learned not to do that and instead have a single spot for that logic. The heavy handed complexity of Refinement Types is not worth it to solve this situation.
If there are any thoughts out there they would be greatly appreciated!
Range checks in Ada are basically assignment guards with some cute arithmetic attached. Ada still does most of the useful checking at runtime, so you're really just introducing more "index out of bounds". Consumer this example:
procedure Sum_Demo is
subtype Index is Integer range 0 .. 10;
subtype Small is Integer range 0 .. 10;
Arr : array(Index) of Integer := (others => 0);
X : Small := 0;
I : Integer := Integer'Value(Integer'Image(X)); -- runtime evaluation
begin
for J in 1 .. 11 loop
I := I + 1;
end loop;
Arr(I) := 42; -- possible out-of-bounds access if I = 11
end Sum_Demo;
This compile, and the compiler will tell you: "warning: Constraint_Error will be raised at run time".
It's a stupid example for sure. Here's a more complex one:
procedure Sum_Demo is
subtype Index is Integer range 0 .. 10;
subtype Small is Integer range 0 .. 10;
Arr : array(Index) of Integer := (others => 0);
X : Small := 0;
I : Integer := Integer'Value(Integer'Image(X)); -- runtime evaluation
begin
for J in 1 .. 11 loop
I := I + 1;
end loop;
Arr(I) := 42; -- Let's crash it
end Sum_Demo;
This again compiles, but if you run it: raised CONSTRAINT_ERROR : sum_demo.adb:13 index check failed
It's a cute feature, but it's useless for anything complex.
I proposed a primitive for this in TypeScript a couple of years ago [1].
While I'm not entirely convinced myself whether it is worth the effort, it offers the ability to express "a number greater than 0". Using type narrowing and intersection types, open/closed intervals emerge naturally from that. Just check `if (a > 0 && a < 1)` and its type becomes `(>0)&(<1)`, so the interval (0, 1).
My specific use case is pattern matching http status codes to an expected response type, and today I'm able to work around it with this kind of construct https://github.com/mnahkies/openapi-code-generator/blob/main... - but it's esoteric, and feels likely to be less efficient to check than what you propose / a range type.
There's runtime checking as well in my implementation, but it's a priority for me to provide good errors at build time
type
Foo = range[1 .. 10]
Bar = range[0.0 .. 1.0] # float works too
var f:Foo = 42 # Error: cannot convert 42 to Foo = range 1..10(int)
var p = Positive 22 # Positive and Natural types are pre-defined
yes … I find it as useful as I used to find perl (spent 4 years as a perl coder back in the day) and use for ad hoc scripts … data migration, PDF/CSV scraping, some LLM prompt engineering in a business setting, wrote a module to manage/install Wordpress on EC2 for example
But wouldn't that also require code execution? For example even though the compiler already knows the size of an array and could do a bounds check on direct assigment (arr[1] = 1) in some wild nested loop you could exceed the bounds that the compiler can't see.
Otherwise you could have
type level asserts more generally. Why stop at a range check when you could check a regex too? This makes the difficulty more clear.
For the simplest range case (pure assignment) you could just use an enum?
You can do this quite easily in Rust. But you have to overload operators to make your type make sense. That's also possible, you just need to define what type you get after dividing your type by a regular number and vice versa a regular number by your type. Or what should happen if when adding two of your types the sum is higher than the maximum value. This is quite verbose. Which can be done with generics or macros.
You can do it at runtime quite easily in rust. But the rust compiler doesn’t understand what you’re doing - so it can’t make use of that information for peephole optimisations or to elide array bounds checks when using your custom type. And you don’t get runtime errors instead of compile time errors if you try to assign the wrong value into your type.
ATS does this. Works quite well since multiplication by known factors and addition of type variables + inequalities is decidable (and in fact quadratic).
What the GP described could be achieved with dependent types, but could also be achieved with a less powerful type system, and the reduced power can sometimes lead to enormous benefits in terms of how pleasant it actually is to use. Check out "refinement types" (implemented in Liquid Haskell for example). Many constraints can be encoded in the type system, and an SMT solver runs at compile time to check if these constrains are guaranteed to be satisfied by your code. The result is that you can start with a number that's known to be in [0..10), then double it and add five, and then you can pass that to a function that expects a number in [10..20). Dependent types would typically require some annoying boilerplate to prove that your argument to the function would fall within that range, but an SMT solver can chew through that without any problem.
The full-blown version that guarantees no bounds-check errors at runtime requires dependent types (and consequently requires programmers to work with a proof assistant, which is why it's not very popular). You could have a more lightweight version that instead just crashes the program at runtime if an out-of-range assignment is attempted, and optionally requires such fallible assignments to be marked as such in the code. Rust can do this today with const generics, though it's rather clunky as there's very little syntactic sugar and no implicit widening.
AIUI WUFFS doesn't need a full blown proof assistant because instead of attempting the difficult problem "Can we prove this code is safe?" it has the programmer provide elements of such a proof as they write their program so it can merely ask "Is this a proof that the program is safe?" instead.
This is also approximately true of Idris. The thing that really helps Wuffs is that it's a pretty simple language without a lot of language features (e.g., no memory allocation and only very limited pointers) that complicate proofs. Also, nobody is particularly tempted to use it and then finds it unexpectedly forbidding, because most programmers don't ever have to write high-performance codecs; Wuffs's audience is people who are already experts.
Also, Wuffs doesn't let you prove arbitrary correctness properties, it aims only to prove the absence of memory corruption. That reduces how expressive the proof system has to be.
There's refinement types, which are less general than dependant types, but sufficient to provide ranges, and simpler to implement because the type only needs to be associated with a predicate.
Spec and Malli in Clojure feel like effing magic - so expressive, so simple - "data as code" philosophy means validations schemas are first-class citizens that can be manipulated, stored, transmitted, and reasoned about like any other data and that's just beautiful. For some type checks you can express in them, it is nearly impossible to achieve in TS, Rust or even Haskell. Clojure makes the complex feel simple because the language itself is designed around data transformation.
That power of course does come with a price, there does not exist a static analyzer that automatically checks things, even though you can pretty much generate beautiful tests based on specs. I think e.g. Rust teams can have more junior devs safely contribute due to enablement of less variability in code quality - the compiler enforces discipline. Clojure teams need higher baseline discipline but can move incredibly fast when everyone's aligned.
It's saddening to see when Clojure gets outright dismissed for being "untyped", even though it absolutely can change one's perspective about type systems.
This can be done in typescript. It’s not super well known because of typescripts association with frontend and JavaScript. But typescript is a language with one of the most powerful type systems ever.
Among the popular languages like golang, rust or python typescript has the most powerful type system.
How about a type with a number constrained between 0 and 10? You can already do this in typescript.
You can even programmatically define functions at the type level. So you can create a function that outputs a type between 0 to N.
type Range<N extends number, A extends number[] = []> =
A['length'] extends N ? A[number] : Range<N, [...A, A['length']]>;
The issue here is that it’s a bit awkward you want these types to compose right? If I add two constrained numbers say one with max value of 3 and another with max value of two the result should be max value of 5. Typescript doesn’t support this by default with default addition. But you can create a function that does this.
// Build a tuple of length L
type BuildTuple<L extends number, T extends unknown[] = []> =
T['length'] extends L ? T : BuildTuple<L, [...T, unknown]>;
// Add two numbers by concatenating their tuples
type Add<A extends number, B extends number> =
[...BuildTuple<A>, ...BuildTuple<B>]['length'];
// Create a union: 0 | 1 | 2 | ... | N-1
type Range<N extends number, A extends number[] = []> =
A['length'] extends N ? A[number] : Range<N, [...A, A['length']]>;
function addRanges<
A extends number,
B extends number
>(
a: Range<A>,
b: Range<B>
): Range<Add<A, B>> {
return (a + b) as Range<Add<A, B>>;
}
The issue is to create these functions you have to use tuples to do addition at the type level and you need to use recursion as well. Typescript recursion stops at 100 so there’s limits.
Additionally it’s not intrinsic to the type system. Like you need peanno numbers built into the number system and built in by default into the entire language for this to work perfectly. That means the code in the function is not type checked but if you assume that code is correct then this function type checks when composed with other primitives of your program.
Complexity is bad in software. I think this kind of thing does more harm than good.
I get an error that I can't assign something that seems to me assignable, and to figure out why I need to study functions at type level using tuples and recursion. The cure is worse than the disease.
It can work. It depends on context. Like let's say these types are from a well renowned library or one that's been used by the codebase for a long time.
If you trust the type, then it's fine. The code is safer. In the world of of the code itself things are easier.
Of course like what you're complaining about, this opens up the possibility of more bugs in the world of types, and debugging that can be a pain. Trade offs.
In practice people usually don't go crazy with type level functions. They can do small stuff, but usually nothing super crazy. So type script by design sort of fits the complexity dynamic you're looking for. Yes you can do type level functions that are super complex, but the language is not designed around it and it doesn't promote that style either. But you CAN go a little deeper with types then say a language with less power in the type system like say Rust.
Typescript's type system is turing complete, so you can do basically anything with it if this sort of thing is fun to you. Which is pretty much my problem with it: this sort of thing can be fun, feels intellectually stimulating. But the added power doesn't make coding easier or make the code more sound. I've heard this sort of thing called the "type puzzle trap" and I agree with that.
I'll take a modern hindley milner variant any day. Sophisticated enough to model nearly any type information you'll have need of, without blurring the lines or admitting the temptation of encoding complex logic in it.
>Which is pretty much my problem with it: this sort of thing can be fun, feels intellectually stimulating. But the added power doesn't make coding easier or make the code more sound.
In practice nobody goes too crazy with it. You have a problem with a feature almost nobody uses. It's there and Range<N> is like the upper bound of complexity I've seen in production but that is literally extremely rare as well.
There is no "temptation" of coding complex logic in it at all as the language doesn't promote these features at all. It's just available if needed. It's not well known but typescript types can be easily used to be 1 to 1 with any hindley milner variant. It's the reputational baggage of JS and frontend that keeps this fact from being well known.
In short: Typescript is more powerful then hindley milner, a subset of it has one to one parity with it, the parts that are more powerful then hindley milner aren't popular and used that widely nor does the flow of the language itself promote there usage. The feature is just there if you need it.
If you want a language where you do this stuff in practice take a look at Idris. That language has these features built into the language AND it's an ML style language like haskell.
I have definitely worked in TS code bases with overly gnarly types, seen more experienced devs spend an entire workday "refactoring" a set of interrelated types and producing an even gnarlier one that more closely modeled some real world system but was in no way easier to reason about or work with in code. The advantage of HM is the inference means there is no incentive to do this, it feels foolish from the beginning.
> 1 + "1"
(irb):1:in 'Integer#+': String can't be coerced into Integer (TypeError)
from (irb):1:in '<main>'
from <internal:kernel>:168:in 'Kernel#loop'
from /Users/george/.rvm/rubies/ruby-3.4.2/lib/ruby/gems/3.4.0/gems/irb-1.14.3/exe/irb:9:in '<top (required)>'
from /Users/george/.rvm/rubies/ruby-3.4.2/bin/irb:25:in 'Kernel#load'
from /Users/george/.rvm/rubies/ruby-3.4.2/bin/irb:25:in '<main>'
Some people mistakenly call dynamic typing "weak typing" because they don't know what those words mean. PSA:
Static typing / dynamic typing refers to whether types are checked at compile time or runtime. "Static" = compile time (eg C, C++, Rust). "Dynamic" = runtime (eg Javascript, Ruby, Excel)
Strong / weak typing refers to how "wibbly wobbly" the type system is. x86 assembly language is "weakly typed" because registers don't have types. You can do (more or less) any operation with the value in any register. Like, you can treat a register value as a float in one instruction and then as a pointer during the next instruction.
Ruby is strongly typed because all values in the system have types. Types affects what you can do. If you treat a number like its an array in ruby, you get an error. (But the error happens at runtime because ruby is dynamically typed - thus typechecking only happens at runtime!).
It's strongly typed, but it's also duck typed. Also, in ruby everything is an object, even the class itself, so type checking there is weird.
Sure it stops you from running into "'1' + 2" issues, but won't stop you from yeeting VeryRawUnvalidatedResponseThatMightNotBeAuthorized to a function that takes TotalValidatedRequestCanUseDownstream. You won't even notice an issue until:
- you manually validate
- you call a method that is unavailable on the wrong object.
I recall a type theorist once defined the terms as follows (can't find the source): "A strongly typed language is one whose type system the speaker likes. A weakly typed language is one whose type system the speaker dislikes."
So yeah I think we should just give up these terms as a bad job. If people mean "static" or "dynamic" then they can say that, those terms have basically agreed-upon meanings, and if they mean things like "the type system prohibits [specific runtime behavior]" or "the type system allows [specific kind of coercion]" then it's best to say those things explicitly with the details filled in.
It would be weak if that was actually mutating the first “a”. That second declaration creates a new variable using the existing name “a”. Rust lets you do the same[1].
Rust lets you do the same because the static typing keeps you safe. In Rust, treating the second 'a' like a number would be an error. In ruby, it would crash.
These are two entirely different a's you're storing reference to it in the same variable. You can do the same in rust (we agree it statically and strongly typed, right?):
let a = 1;
let a = '1';
Strongly typing means I can do 1 + '1' variable names and types has nothing to do with it being strongly typed.
In the dynamic world being able to redefine variables is a feature not a bug (unfortunately JS has broken this), even if they are strongly typed. The point of strong typing is that the language doesn't do implicit conversions and other shenanigans.
Well yeah, because variables in what you consider to be a
strongly typed language are allocating the storage for those variables. When you say int x you're asking the compiler to give you an
int shaped box. When you say x = 1 in Ruby all you're doing is saying is that in this scope the name x now refers to the box holding a 1. You can't actually store a string in the int box, you can only say that
from now on the name x refers to the string box.
The “Stop at first level of type implementation” is where I see codebases fail at this. The example of “I’ll wrap this int as a struct and call it a UUID” is a really good start and pretty much always start there, but inevitably someone will circumvent the safety. They’ll see a function that takes a UUID and they have an int; so they blindly wrap their int in UUID and move on. There’s nothing stopping that UUID from not being actually universally unique so suddenly code which relies on that assumption breaks.
This is where the concept of “Correct by construction” comes in. If any of your code has a precondition that a UUID is actually unique then it should be as hard as possible to make one that isn’t. Be it by constructors throwing exceptions, inits returning Err or whatever the idiom is in your language of choice, the only way someone should be able to get a UUID without that invariant being proven is if they really *really* know what they’re doing.
(Sub UUID and the uniqueness invariant for whatever type/invariants you want, it still holds)
> This is where the concept of “Correct by construction” comes in.
This is one of the basic features of object-oriented programming that a lot of people tend to overlook these days in their repetitive rants about how horrible OOP is.
One of the key things OO gives you is constructors. You can't get an instance of a class without having gone through a constructor that the class itself defines. That gives you a way to bundle up some data and wrap it in a layer of validation that can't be circumvented. If you have an instance of Foo, you have a firm guarantee that the author of Foo was able to ensure the Foo you have is a meaningful one.
Of course, writing good constructors is hard because data validation is hard. And there are plenty of classes out there with shitty constructors that let you get your hands on broken objects.
But the language itself gives you direct mechanism to do a good job here if you care to take advantage of it.
Functional languages can do this too, of course, using some combination of abstract types, the module system, and factory functions as convention. But it's a pattern in those languages where it's a language feature in OO languages. (And as any functional programmer will happily tell you, a design pattern is just a sign of a missing language feature.)
I find regular OOP language constructor are too restrictive. You can't return something like Result<CorrectObject,ConstructorError> to handle the error gracefully or return a specific subtype; you need a static factory method to do something more than guaranteed successful construction w/o exception.
Does this count as a missing language feature by requiring a "factory pattern" to achieve that?
The natural solution for this is a private constructor with public static factory methods, so that the user can only obtain an instance (or the error result) by calling the factory methods. Constructors need to be constrained to return an instance of the class, otherwise they would just be normal methods.
Convention in OOP languages is (un?)fortunately to just throw an exception though.
In languages with generic types such as C++, you generally need free factory functions rather than static member functions so that type deduction can work.
> You can't return something like Result<CorrectObject,ConstructorError> to handle the error gracefully
Throwing an error is doing exactly that though, its exactly the same thing in theory.
What you are asking for is just more syntactic sugar around error handling, otherwise all of that already exists in most languages. If you are talking about performance that can easily be optimized at compile time for those short throw catch syntactic sugar blocks.
Java even forces you to handle those errors in code, so don't say that these are silent there is no reason they need to be.
This is why constructors are dumb IMO and rust way is the right way.
Nothing stops you from returning Result<CorrectObject,ConstructorError> in CorrectObject::new(..) function because it's just a regular function struct field visibility takes are if you not being able to construct incorrect CorrectObject.
I don't see this having much to do with OOP vs FP but maybe the ease in which a language lets you create nominal types and functions that can nicely fail.
What sucks about OOP is that it also holds your hand into antipatterns you don't necessarily want, like adding behavior to what you really just wanted to be a simple data type because a class is an obvious junk drawer to put things.
And, like your example of a problem in FP, you have to be eternally vigilant with your own patterns to avoid antipatterns like when you accidentally create a system where you have to instantiate and collaborate multiple classes to do what would otherwise be a simple `transform(a: ThingA, b: ThingB, c: ThingC): ThingZ`.
Finally, as "correct by construction" goes, doesn't it all boil down to `createUUID(string): Maybe<UUID>`? Even in an OOP language you probably want `UUID.from(string): Maybe<UUID>`, not `new UUID(string)` that throws.
> Even in an OOP language you probably want `UUID.from(string): Maybe<UUID>`, not `new UUID(string)` that throws.
One way to think about exceptions is that they are a pattern matching feature that privileges one arm of the sum type with regards to control flow and the type system (with both pros and cons to that choice). In that sense, every constructor is `UUID.from(string): MaybeWithThrownNone<UUID>`.
The best way to think about exceptions is to consider the term literally (as in: unusual; not typical) while remembering that programmers have an incredibly overinflated sense of ability.
In other words, exceptions are for cases where the programmer screwed up. While programmers screwing up isn't unusual at all, programmers like to think that they don't make mistakes, and thus in their eye it is unusual. That is what sets it apart from environmental failures, which are par for the course.
To put it another way, it is for signalling at runtime what would have been a compiler error if you had a more advanced compiler.
Unfortunately many languages treat exceptions as a primary control flow mechanism. That's part of why Rust calls its exceptions "panics" and provides the "panic=abort" compile-time option which aborts the program instead of unwinding the stack with the possibility of catching the unwind. As a library author you can never guarantee that `catch_unwind` will ever get used, so its main purpose of preventing unwinding across an FFI boundary is all it tends to get used for.
Just Java (and Javascript by extension, as it was trying to copy Java at the time), really. You do have a point that Java programmers have infected other languages with their bad habits. For example, Ruby was staunchly in the "return errors as values and leave exception handling for exceptions" before Rails started attracting Java developers, but these days all bets are off. But the "purists" don't advocate for it.
Indeed, OOP and FP both allow and encourage attaching invariants to data structures.
In my book, that's the most important difference with C, Zig or Go-style languages, that consider that data structures are mostly descriptions of memory layout.
> Functional languages can do this too, of course, using some combination of abstract types, the module system, and factory functions as convention
In Haskell:
1. Create a module with some datatype
2. Don't export the datatype's constructors
3. Export factory functions that guarantee invariants
How is that more complicated than creating a class and adding a custom constructor? Especially if you have multiple datatypes in the same module (which in e.g. Java would force you to add multiple files, and if there's any shared logic, well, that will have to go into another extra file - thankfully some more modern OOP languages are more pragmatic here).
(Most) OOP languages treat a module (an importable, namespaced subunit of a program) and a type as the same thing, but why is this necessary? Languages like Haskell break this correspondence.
Now, what I'm missing from Haskell-type languages is parameterised modules. In OOP, we can instantiate classes with dependencies (via dependency injection) and then call methods on that instance without passing all the dependencies around, which is very practical. In Haskell, you can simulate that with currying, I guess, but it's just not as nice.
I've recently been following red-green-refactor but instead of with a failing test, I tighten the screws on the type system to make a production-reported bug cause the type checker to fail before making it green by fixing the bug.
I still follow TDD-with-a-test for all new features, all edge cases and all bugs that I can't trigger failure by changing the type system for.
However, red-green-refactor-with-the-type-system is usually quick and can be used to provide hard guarantees against entire classes of bug.
I like this approach, there are often calls for increased testing on big systems and what they really mean is increased rigor. Don't waste time testing what you can move into the compiler.
It is always great when something is so elegantly typed that I struggle to think of how to write a failing test.
What drives me nuts is when there are testing left around basically testing the compiler that never were “red” then “greened” makes me wonder if there is some subtle edge case I am missing.
As you move more testing responsibilities to the compiler, it can be valuable to test the compiler’s responsibilities for those invariants though. Otherwise it can be very hard to notice when something previously guaranteed statically ceases to be.
I found myself following a similar trajectory, without realizing that’s what I was doing. For a while it felt like I was bypassing the discipline of TDD that I’d previously found really valuable, until I realized that I was getting a lot of the test-first benefits before writing or running any code at all.
Now I just think of types as the test suite’s first line of defense. Other commenters who mention the power of types for documentation and refactoring aren’t wrong, but I think that’s because types are tests… and good tests, at almost any level, enable those same powers.
I dont think tests and types are the same "thing" per se - they work vastly better in conjunction with each other than alone and are weirdly symmetrical in the way that theyre bad substitutes for each other.
However, Im convinced that theyre both part of the same class of thing, and that "TDD" or red/green/refactor or whatever you call it works on that class, not specifically just on tests.
Documentation is a funny one too - I use my types to generate API and other sorts of reference docs and tests to generate how-to docs. There is a seemingly inextricable connection between types and reference docs, tests and how-to docs.
Types are a kind of test. Specifically they’re a way to assert certain characteristics about the interactions between different parts of the code. They’re frequently assertions you’d want to make another way, if you didn’t have the benefit of a compiler to run that set of assertions for you. And like all tests, they’re a means to gain or reinforce confidence in claims you could make about the code’s behavior. (Which is their symmetry with documentation.)
Union types!! If everything’s a type and nothing works together, start wrapping them in interfaces and define an über type that unions everything everywhere all at once.
Welcome to typescript. Where generics are at the heart of our generic generics that throw generics of some generic generic geriatric generic that Bob wrote 8 years ago.
Because they can’t reason with the architecture they built, they throw it at the type system to keep them in line. It works most of the time. Rust’s is beautiful at barking at you that you’re wrong. Ultimately it’s us failing to design flexibility amongst ever increasing complexity.
Remember when “Components” where “Controls” and you only had like a dozen of them?
Remember when a NN was only a few hundred thousand parameters?
As complexity increases with computing power, so must our understanding of it in our mental model.
However you need to keep that mental model in check, use it. If it’s typing, do it. If it’s rigorous testing, write your tests. If it’s simulation, run it my friend. Ultimately, we all want better quality software that doesn’t break in unexpected ways.
union types are great. But alone they are not sufficient for many cases. For example, try to define a datastructure that captures a classical evaluation-tree.
You might go with:
type Expression = Value | Plus | Minus | Multiply | Divide;
interface Value { type: "value"; value: number; }
interface Plus { type: "plus"; left: Expression; right: Expression; }
interface Minus { type: "minus"; left: Expression; right: Expression; }
interface Multiply { type: "multiply"; left: Expression; right: Expression; }
interface Divide { type: "divide"; left: Expression; right: Expression; }
And so on.
That looks nice, but when you try to pattern match on it and have your pattern matching return the types that are associated with the specific operation, it won't work. The reason is that Typescript does not natively support GADTs. Libs like ts-pattern use some tricks to get closish at least.
And while this might not be very important for most application developers, it is very important for library authors, especially to make libraries interoperable with each other and extend them safely and typesafe.
As you mentioned correctly: if you go for strongly typed types in a library you should go all the way.
And that means your strong type should provide clear functions for its conversion to certain other types. Some of which you nearly always need like conversion to a string or representation as a float/int.
The danger of that is of course that you provide a ladder over the wall you just built and instead of
They now go the shortcut route via numeric representation and may forget the conversion factor. In that case I'd argue it is best to always represent temperature as one unit (Kelvin or Celsius, depending on the math you need to do with it) and then just add a .display(Unit:: Fahrenheit) method that returns a string. If you really want to convert to TemperatureF for a calculation you would have to use a dedicated method that converts from one type to another.
One thing to consider as well is that you can mix up absolute values ("it is 28°C outside") and temperature deltas ("this is 2°C warmer than the last measurement"). If you're controlling high energy heaters mixing those up can ruin your day, which is why you could use different types for absolutes and deltas (or a flag within one type). Datetime libraries often do that as well (in python for example you have datetime for absolute and timedelta for relative time)
An adjacent point is to use checked exceptions and to handle them appropriate to their type. I don't get why Java checked exceptions were so maligned. They saved me so many headaches on a project where I forced their use as I was the tech lead for it.
Everyone hated me for a while because it forced them to deal with more than just the happy path but they loved it once they got in the rhythm of thinking about all the exceptional cases in the code flow. And the project was extremely robustness even though we were not particularly disciplined about unit testing
I think most complaints about checked exceptions in Java ultimately boil down to how verbose handling exceptions in Java is. Everytime the language forces you to handle an exception when you don't really need to makes you hate it a bit more.
First, the library author cannot reasonably define what is and isn't a checked exception in their public API. That really is up to the decision of the client. This wouldn't be such a big deal if it weren't so verbose to handle exceptions though: if you could trivially convert an exception to another type, or even declare it as runtime, maybe at the module or application level, you wouldn't be forced to handle them in these ways.
Second, to signature brittleness, standard advice is to create domain specific exceptions anyways. Your code probably shouldn't be throwing IOExceptions. But Java makes converting exceptions unnecessarily verbose... see above.
Ultimately, I love checked exceptions. I just hate the ergonomics around exceptions in Java. I wish designers focused more on fixing that than throwing the baby out with the bathwater.
If only Java also provided Either<L,R>-like in the standard library...
Personally I use checked exceptions whenever I can't use Either<> and avoid unchecked like a plague.
Yeah, it's pretty sad Java language designer just completely deserted exception handling. I don't think there's any kind of improvement related to exceptions between Java 8 and 24.
That's what I thought at first too. At first glance they look equivalent, telling API users what the expected result of a method call is. In that sense, both are equivalent.
But after experimenting a bit with checked exceptions, I realized how neglected exceptions are in Java.
- There's no other way to handle checked exceptions other than try-catch block
- They play very badly with API that use functional interfaces. Many APIs don't provide checked throws variant
- catch block can't use generic / parameterized type, you need to catch Exception or Throwable then operate on it at runtime
After rolling my own Either<L,R>, it felt like a customizable typesafe macro for exception handling. It addresses all the annoyances I had with checked exception handling, and it plays nicely with exhaustive pattern matching using `sealed`.
Granted, it has the drawback that sometimes I have to explicitly spell out types due to local type inference failing to do so. But so far it has been a pleasant experience of handling error gracefully.
I would say one we are allowed to bash upon, forgetting the history of key programming languages with checked exceptions predating Java (CLU, Modula-3 and C++), whereas the other is the cool FP programming concept that everyone coding is coffee shops is supposed to find cool.
Semantically from CS point of view in language semantics and type system modelling, they are equivalent in puporse, as you are very well asking about.
I point it out because I think the distinction is interesting.
Can we build tools that helps us work with the boundary between isosemantic and isomorphic? Like any two things that are isosemantic should be translatable between each other. And so it represents an opportunity to make the things isomorphic.
> Your code probably shouldn't be throwing IOExceptions. But Java makes converting exceptions unnecessarily verbose
The problem just compounds too. People start checking things that they can’t handle from the functions they’re calling. The callers upstream can’t possibly handle an error from the code you’re calling, they have no idea why it’s being called.
I also hate IOException. It’s so extremely unspecific. It’s the worst way to do exceptions. Did the entire disk die or was the file not just found or do I not have permissions to write to it? IOException has no meaning.
Part of me secretly hopes Swift takes over because I really like its error handling.
There usually are more specific exceptions, at least when it's easy enough to distinguish the root cause from OS APIs. But it often isn't. A more practical concern is that it is not always easy to find out which type it is. The identity of the specific types might not be part of the public API interface, perhaps intentionally so.
I think checked exceptions were maligned because they were overused. I like that Java supports both checked and unchecked exceptions. But IMO checked exceptions should only be used for what Eric Lippert calls "exogenous" exceptions [1]; and even then most of them should probably be converted to an unchecked exception once they leave the library code that throws them. For example, it's always possible that your DB could go offline at any time, but you probably don't want "throws SQLException" polluting the type signature all the way up the call stack. You'd rather have code assuming all SQL statements are going to succeed, and if they don't your top-level catch-all can log it and return HTTP 500.
Put another way: errors tend to either be handled "close by" or "far away", but rarely "in the middle".
So Java's checked exceptions force you to write verbose and pointless code in all the wrong places (the "in the middle" code that can't handle and doesn't care about the exception).
> So Java's checked exceptions force you to write verbose and pointless code in all the wrong places (the "in the middle" code that can't handle and doesn't care about the exception).
It doesn't, you can just declare that the function throws these as well, you don't have to handle it directly.
It pollutes type signatures. If some method deep down the call stack changes its implementation details from throwing exception A you don't care about to throwing exception B you also don't care about, you also have to change the type of `throws` annotation on your method.
This is annoying enough to deal with in concrete code, but interfaces make it a nightmare.
Yes, the exact same problem is present in languages where "errors are just values".
To solve this, Rust does allow you to just Box<dyn Error> (or equivalents like anyhow). And Go has the Error interface. People who list out all concrete error types are just masochists.
It's fine to let exceptions percolate to the top of the call stack but even then you likely want to inform the user or at least log it in your backend why the request was unsuccessful. Checked exceptions force both the handling of exceptions and the type checking if they are used as intended. It's not a problem if somewhere along the call chain an SQLException gets converted to "user not permitted to insert this data" exception. This is how it was always meant to work. What I don't recommend is defaulting to RuntimeException and derivatives for those business level exceptions. They should still be checked and have their own types which at least encourages some discipline when handling and logging them up the call stack.
In my experience, the top level exception handler will catch all incl Throwable, and then inspect the exception class and any nested exception classes for things like SQL error or MyPermissionsException etc and return the politically correct error to the end user. And if the exception isn’t in a whitelist of ones we don’t need to log, we log it to our application log.
Sometimes I feel like I actually wouldn't mind having any function touching the database tagged as such. But checked exceptions are such a pita to deal with that I tend to not bother.
Setting aside the objections some have to exceptions generally: Checked exceptions, in contrast to unchecked, means that if a function/method deep in your call stack is changed to throw an exception, you may have to change many function (to at least denote that they will throw that exception or some exception) between the handler and the thrower. It's an objection to the ergonomics around modifying systems.
Think of the complaints around function coloring with async, how it's "contagious". Checked exceptions have the same function color problem. You either call the potential thrower from inside a try/catch or you declare that the caller will throw an exception.
> Setting aside the objections some have to exceptions generally: Checked exceptions, in contrast to unchecked, means that if a function/method deep in your call stack is changed to throw an exception, you may have to change many function (to at least denote that they will throw that exception or some exception) between the handler and the thrower. It's an objection to the ergonomics around modifying systems.
And if you change a function deep in the call stack to return a different type on the happy path? Same thing. Yet, people don't complain about that and give up on statically type checking return values.
I honestly think the main reason that some people will simultaneously enjoy using Result/Try/Either types in languages like Rust while also maligning checked exceptions is because of the mental model and semantics around the terminology. I.e., "checked exception" and "unchecked exception" are both "exceptions", so our brains lumped those two concepts together; whereas returning a union type that has a success variant and a failure variant means that our brains are more willing lump the failure return and the successful return together.
To be fair, I do think it's a genuine design flaw to have checked and unchecked exceptions both named and syntactically handled similarly. The return type approach is a better semantic model for modelling expected business logic "failure" modes.
And as with async, the issue is a) the lack of the ability to write generic code that can abstract over the async-ness or throw signature of a function and b) the ability to type erase asyncness (by wrapping with stackful coroutines) or throw signature (by converting to unchecked exceptions).
Incidentally, for exceptions, Java had (b), but for a long time didn't have (a) (although I think this changed?), leading to (b) being abused.
That's a valid point but it's somewhere on a spectrum of "quick to write/change" vs "safe and validated" debate of strictly vs loosely typed systems. Strictly typed systems are almost by definition much more "brittle" when it comes to code editing. But the strictness also ensures that refactoring is usually less perilous than in loosely typed code.
> Checked exceptions, in contrast to unchecked, means that if a function/method deep in your call stack is changed to throw an exception, you may have to change many function (to at least denote that they will throw that exception or some exception) between the handler and the thrower.
That's the point! The whole reason for checked exceptions is to gain the benefit of knowing if a function starts throwing an exception that it didn't before, so you can decide how to handle it. It's a good thing, not a bad thing! It's no different from having a type system which can tell you if the arguments to a function change, or if its return type does.
Why are you screaming? All those wasted exclamation marks, you could have written something I didn't know. I didn't say it wasn't the point or that it was a bad thing.
C# went with properly typed but unchecked exceptions. IMO it gives you a clean error stacks without too much of an issue.
I also think its a bit cleaner to have a nicely pattern matched handler blocks than bespoke handling at every level. That said, if unwrapped error results have a robust layout then its probably pretty equivalent.
For the customer there is hardly any difference that the server keeps running if a critical workflow, especially with a payment in flight, crashes and burns.
Or if they are unable to work, because they keep getting a maintenance page, as the load balancer redirects them after several HTTP 500 responses.
For the customer the difference hardly matters, they cannot fulfill what they intended to do, because someone somewhere forgot to catch an exception, going all the way out of the MVC controller, providing a very bad UI/UX, and from security point of view, a possible DOS attack vector.
I prefer an happy customer, and not having to deal with support calls.
With Java, there are a lot of usability issues with checked types. For example streams to process data really don't play nicely if your map or filter function throws a checked exception. Also if you are calling a number of different services that each have their own checked exception, either you resort to just catching generic Exception or you end up with a comically large list of exceptions
That is why I am happy that rich errors (https://xuanlocle.medium.com/kotlin-2-4-introduces-rich-erro...) are coming to Kotlin. This expresses the possible error states very well, while programming for the happy path and with some syntactic sugar for destucturing the errors.
I rarely have more than handful of try..catch blocks in any application. These either wrap around an operation that can be retried in the case of temporary failure or abort the current operation with a logged error message.
Checked exceptions feel like a bad mix of error returns and colored functions to me.
For anyone who dislikes checked exceptions due to how clunky they feel: modern Java allows you to construct custom Result-like types using sealed interfaces.
Hard not to agree with the general idea. But also hard to ignore all of the terrible experiences I've had with systems where everything was a unique type.
In general, I think this largely falls when you have code that wants to just move bytes around intermixed with code that wants to do some fairly domain specific calculations. I don't have a better way of phrasing that, at the moment. :(
There are cases where you have the data in hand but now you have to look for how to create or instantiate the types before you can do anything with it, and it can feel like a scavenger hunt in the docs unless there's a cookbook/cheatsheet section.
One example is where you might have to use createVector(x, y, z): Vector when you already have { x, y, z }. And only then can you createFace(vertices: Vector[]): Face even though Face is just { vertices }. And all that because Face has a method to flip the normal or something.
Another example is a library like Java's BouncyCastle where you have the byte arrays you need, but you have to instantiate like 8 different types and use their methods on each other just to create the type that lets you do what you wish was just `hash(data, "sha256")`.
This stuff gets unbearable very fast. We have custom types for geometries at my work. We also use a bunch of JS libraries for e.g. coordinate conversions. They output as [number, number, number], whereas our internal type are number[].
The article is written in Go, in which - iirc - it's fairly easy and cheap to convert a type alias back to its original type (e.g. an AccountID to an int).
Using the right architecture, you could make it so your core domain type and logic uses the strictly typed aliases, and so that a library that doesn't care about domain specific stuff converts them to their higher (lower?) type and works with that. Clean architecture style.
Unfortunately, that involves a lot of conversion code.
"Phantom types" are useful for what you describe: that's where we add a parameter to a type (i.e. making it generic), but we don't actually use that parameter anywhere. I used this when dealing with cryptography in Scala, where everything is just an array of bytes, but phantom types prevented me getting them mixed up. https://news.ycombinator.com/item?id=28059019
Ideally though, the compiler lowers all domain specific logic into simple byte-moving, just after having checked that types add up. Or maybe I misunderstood what you meant?
Type systems, like any other tool in the toolbox, have an 80/20 rule associated with them. It is quite easy to overdo types and make working with a library extremely burdensome for little to no to negative benefit.
I know what a UUID (or a String) is. I don't know what an AccountID, UserID, etc. is. Now I need to know what those are (and how to make them, etc. as well) to use your software.
Maybe an elaborate type system worth it, but maybe not (especially if there are good tests.)
> I don't know what an AccountID, UserID, etc. is. Now I need to know what those are (and how to make them, etc. as well) to use your software.
Presumably you need to know what an Account and a User are to use that software in the first place. I can't imagine a reasonable person easily understanding a getAccountById function which takes one argument of type UUID, but having trouble understanding a getAccountById function which takes one argument of type AccountId.
UserID and AccountID could just as well be integers.
What he means is that by introducing a layer of indirection via a new type you hide the physical reality of the implementation (int vs. string).
The physical type matters if you want to log it, save to a file etc.
So now for every such type you add a burden of having to undo that indirection.
At which point "is it worth it?" is a valid question.
You made some (but not all) mistakes impossible but you've also introduced that indirection that hides things and needs to be undone by the programmer.
> There is a UI for memorialising users, but I assured her that the pros simply ran a bit of code in the PHP debugger. There’s a function that takes two parameters: one the ID of the person being memorialised, the other the ID of the person doing the memorialising. I gave her a demo to show her how easy it was....And that’s when I entered Clowntown....I first realised something was wrong when I went back to farting around on Facebook and got prompted to login....So in case you haven’t guessed what I got wrong yet, I managed to get the arguments the wrong way round.
Instead of me memorialising my test user, my test user memorialised me.
> I know what a UUID (or a String) is. I don't know what an AccountID, UserID, etc. is.
It's literally the opposite. A string is just a bag of bytes you know nothing about. An AccountID is probably... wait for it... an ID of an Account. If you have the need to actually know the underlying representation you are free to check the definition of the type, but you shouldn't need to know that in 99% of contexts you'll want to use an AccountID in.
> Now I need to know what those are (and how to make them, etc. as well) to use your software.
You need to know what all the types are no matter what. It's just easier when they're named something specific instead of "a bag of bytes".
Linking to that masterpiece is borderline insulting. Such a basic and easy to understand usage of the type system is precisely what the grug brain would advocate for.
I'd much rather deal with the 2nd version than the first. It's self-documenting and prevents errors like calling "foo(userId, accountId)" letting the compiler test for those cases. It also helps with more complex data structures without needing to create another type.
I now know I never know whenever "a UUID" is stored or represented as a GUIDv1 or a UUIDv4/UUIDv7.
I know it's supposed to be "just 128 bits", but somehow, I had a bunch of issues running old Java servlets+old Java persistence+old MS SQL stack that insisted, when "converting" between java.util.UUID to MS SQL Transact-SQL uniqueidentifier, every now and then, that it would be "smart" if it flipped the endianess of said UUID/GUID to "help me". It got to a point where the endpoints had to manually "fix" the endianess and insert/select/update/delete for both the "original" and the "fixed" versions of the identifiers to get the expected results back.
(My educated guess it's somewhat similar to those problems that happens when your persistence stack is "too smart" and tries to "fix timezones" of timestamps you're storing in a database for you, but does that wrong, some of the time.)
They are generated with different algorithms, if you find these distinctions to be semantically useful to operations, carry that distinction into the type.
The point is that you might pass a semantically invalid user ID. Not that you might pass an invalid UUID.
I generally agree that it's easy to over-do, but can be great if you have a terse, dense, clear language/framework/docs, so you can instantly learn about UserID.
More specifically, if all entities have a GUID, it's not impossible to accidentally map entity A ID to entity B ID accidentally, especially when working with relationships. Moving the issue to the compiler is nicer than the query returning 0 results and the developer staring endlessly for the subtle issue.
I think the example is just not very useful, because it illustrates a domain separation instead of a computational one, which is almost always the wrong approach.
It is however useful to return a UUID type, instead of a [16]byte, or a HTMLNode instead of a string etc. These discriminate real, computational differences. For example the method that gives you a string representation of an UUID doesn't care about the surrounding domain it is used in.
Distinguishing a UUID from an AccountID, or UserID is contextual, so I rather communicate that in the aggregate. Same for Celsius and Fahrenheit. We also wouldn't use a specialized type for date times in every time zone.
There are a few languages where this is not too tedious (although other things tend to be a bit more tedious than needed in those)
The main problem with these is how do you actually get the verification needed when data comes in from outside the system. Check with the database every time you want to turn a string/uuid into an ID type? It can get prohibitively expensive.
> I know what a UUID (or a String) is. I don't know what an AccountID, UserID, etc. is. Now I need to know what those are (and how to make them, etc. as well) to use your software.
Yes, that’s exactly the point. If you don’t know how to acquire an AccountID you shouldn’t just be passing a random string or UUID into a function that accepts an AccountID hoping it’ll work, you should have acquired it from a source that gives out AccountIDs!
If your system is full of stringly typed network interfaces then yes there is no point in trying to make it good. You can make things a bit better by using a structured RPC protocol like gRPC, but the only real solution is to not do that.
My team recently did this to some C++ code that was using mixed numeric values. It started off as finding a bug. The bug was fixed but the fixer wanted to add safer types to avoid future bugs. They added them, found 3 more bugs where the wrong values were being used unintentionally.
This reminds me of the mp-units [1] library which aims to solve this problem focusing on the physical quantities.
The use of strong quantities means that you can have both safety and complex conversion logic handled automatically, while having generic code not tied to single set of units.
I have tried to bring that to the prolog world [2] but I don't think my fellow prolog programmers are very receptive to the idea ^^.
I remember a long, long time ago, working on a project that handled lots of different types of physical quantities: distance, speed, temperature, pressure, area, volume, and so on. But they were all just passed around as "float" so you'd every so often run into bugs where a distance was passed where a speed was expected, and it would compile fine but have subtle or obvious runtime defects. Or the API required speed in km/h, but you passed it miles/h, with the same result. I always wanted to harden it up with distinct types so we could catch these problems during development rather than testing, but I was a junior guy and could never articulate it well and justify the engineering effort, and nobody wanted to go through the effort of explicitly converting to/from primitive types to operate on the numbers.
I had kind of written off using types because of the complexity of physical units, so I will be having a look at that!
My biggest problem has been people not specifying their units. On our own code end I'm constantly getting people to suffix variables with the units. But there's still data from clients, standard library functions, etc. where the units aren't specified!
readonly struct Id32<M> {
public readonly int Value { get; }
}
Then you can do:
public sealed class MFoo { }
public sealed class MBar { }
And:
Id32<MFoo> x;
Id32<MBar> y;
This gives you integer ids that can’t be confused with each other. It can be extended to IdGuid and IdString and supports new unique use cases simply by creating new M-prefixed “marker” types which is done in a single line.
I’ve also done variations of this in TypeScript and Rust.
I've done something like that too. I also noticed that enums are even lower-friction (or were, back in 2014) if your IDs are integers, but I never put this pattern into real code because I figured it might be too confusing: https://softwareengineering.stackexchange.com/questions/3090...
Is it "overkill" if it's already written and tested?
Once you have several of these types, and they have validation and other concerns then the cost-benefit might flip.
FYI, In modern c#, you could try using "readonly record struct" in order to get lots of equality and other concerns generated for you. It's like a "whole library" but it's a compiler feature.
Yes: more code to compile, more stuff to learn, more complexity. I gave like a 5-line-of-code example, I don’t understand why I’d want to replace that with a library.
I completely agree that libraries do have to prove their worth, and that you should not add them as though they are all zero-cost and zero weight - that is not true.
However I disagree in this case - if you have the problem that the library solves and it is ergonomic, then why not use it. Your "5-line-of-code example" doers not cover validation, and serialisation and casting concerns. As another commenter put it: "a properly constructed ID type has a non-trivial amount of code".
If you don't need more lines of code than that, then do your thing. But in the example that I looked at, I definitely would. As I said elsewhere in the thread, it is where all customer ids are strings, but only very specific strings are customer ids.
The larger point is that people who write c# and are reading this thread should know that these toolkits exist - that url links to other similar libraries and further reading. So they can can then make their own informed choices.
Have you used this in production? It seems appealing but seems so anti-thetical to the common sorts of engineering cultures I've seen where this sort of rigorous thinking does not exactly abound.
Source generators hide too many details from the user.
I prefer to have the generated code to be the part of the code repo. That's why I use code templates instead of source generators. But a properly constructed ID type has a non-trivial amount of code: https://github.com/vborovikov/pwsh/blob/main/Templates/ItemT...
> a properly constructed ID type has a non-trivial amount of code
That is correct, I've looked at the generated code and it's non-trivial, especially when validation, serialisation and casting concerns are present. And when you have multiple id types, and the allowed casts can change over time (i.e. lock it down when the migration is complete)
Sadly I have not. I have played with it and it seems to hold up quite well.
I want it for a case where it seems very well suited - all customer ids are strings, but only very specific strings are customer ids. And there are other string ids around as well.
IMHO Migration won't be hard - you could allow casts to/from the primitive type while you change code. Temporarily disallowing these casts will show you where you need to make changes.
I don't know yet how "close to the edges" you would have to go back to the primitive types in ordered for json and db serialisation to work.
But it would be easier to get in place in a new "green field" codebase.
I pitched it as a refactoring, but the other people were well, "antithetical" is a good word.
I've actually seen this before and didn't realize this is exactly what the goal was. I just thought it was noise. In fact, just today I wrote a function that accepted three string arguments and was trying to decide if I should force the caller to parse them into some specific types, or do so in the function body and throw an error, or just live with it. This is exactly the solution I needed (because I actually don't NEED the parsed values.)
This is going to have the biggest impact on my coding style this year.
My friend Lukas has written about this before in more detail, and describes the general technique as "Safety Through Incompatibility". I use this approach in all of my golang codebases now and find it invaluable — it makes it really easy to do the right thing and really hard to accidentally pass the wrong kinds of IDs around.
Swift has a typealias keyword but it's not really useful for this since two distinct aliased types with the same underlying type can be freely interchanged. Wrong code may look wrong but it will still compile.
Wrapper structs are the idiomatic way to achieve this, and with ExpressibleByStringLiteral are pretty ergonomic, but I wonder if there's a case for something like a "strong" typealias ("typecopy"?) that indicates e.g. "this is just a String but it's a particular kind of String and shouldn't be mixed with other Strings".
Yeah, most languages I've used are like this. E.g. rust/c/c++.
I guess the examples in TFA are golang? It's kind of nice that you don't have to define those wrapper types, they do make things a bit more annoying.
In C++ you have to be extra careful even with wrapper classes, because types are allowed to implicitly convert by default. So if Foo has a constructor that takes a single int argument, then you can pass an int anywhere Foo is expected. Fine as long as you remember to mark your constructors as explicit.
Both clang-tidy and cpplint can be configured to require all single-argument constructors (except move, copy, and initializer-list constructors) to be marked explicit, in order to avoid this pitfall.
This sounds elegant in theory but very thorny in practice even with a standards change, at least in C++ (though I don't believe the issues are that particular to the language). Like how do you want the equivalent of std::cout << your_different_str to behave? What about with third-party functions and extension points that previously took strings?
In OOP languages as long as the type you want to specialize isn't final you can just create a subclass. It's cheap (no additional wrappers or boxes), easy, and you can specialize behavior if you want to.
Unfortunately for various good reasons Java makes String final, and String is one of the most useful types to specialize on.
No I want to block both. I don't want to give devs the option of creating a function doSomething(String) that happens to accept MyType. If I need to call trim then I'll do
Im on the opposite extreme here in that I believe typing obsession is the root of much of our problems as an industry.
I think Rich Hickey was completely right, this is all information and we just need to get better at managing information like we are supposed to.
The downside of this approach is that these systems are tremendously brittle as changing requirements make you comfort your original data model to fit the new requirements.
Most OOP devs have seen atleast 1 library with over 1000 classes. Rust doesn't solve this problem no matter how much I love it. Its the same problem of now comparing two things that are the same but are just different types require a bunch of glue code which can itself lead to new bugs.
Data as code seems to be the right abstraction. Schemas give validation a-la cart while still allowing information to be passed, merged, and managed using generic tools rather than needing to build a whole api for every new type you define in your mega monolith.
A lot of us programmer folk are indefinitely in search of that one thing that will finally let us write the perfect, bug-free, high performance software. We take these concepts to the extreme and convince ourselves that it will absolutely work as long as we strictly do it the Right Way and only the Right Way. Then we try to convince to our fellow programmers that the Right Way will solve all of our problems and that it is the Only Way.
It will be great, it will be grand, it will be amazing.
A wise person once told me that if you ever find yourself saying "if only everyone would just do X...", then you should stop right there. Never, ever, in the history of the world has everyone done X. No matter how good an idea X is, there will always be some people who say "No, I'm going to do Y instead." Maybe they're stupid, maybe they're evil, maybe they're just ignorant... or maybe, just maybe, X was not the best thing for their particular needs and Y was actually better for them.
This is an important concept to keep in mind. It applies to programming, it applies to politics, it applies to nearly every situation you can think of. Any time you find yourself wishing that everyone would just do X and the world would be a better place, realize that that is never going to happen, and that some people will choose to do Y — and some of them will even be right to do so, because you do not (and cannot) know the specific needs of every human being on the planet, so X will not actually be right for some of them.
> Its the same problem of now comparing two things that are the same but are just different types require a bunch of glue code which can itself lead to new bugs.
Uhuh, so my age and my weight are the same (integers), but just have different types. Okay.
I was doing this and used it for a year in https://github.com/bbkane/warg/, but ripped it out since Go auto-casts underlying types to derived types in function calls:
Type userID int64
func Work(u userID) {...}
Work(1) // Go accepts this
I think I recalled that correctly. Since things like that were most of what I was doing I didn't feel the safety benefit in many places, but had to remember to cast the type in others (iirc, saving to a struct field manually).
This is a little misleading. Go will automatically convert a numeric literal (which is a compile time idea not represented at runtime) into the type of the variable it is being assigned to.
Go will not automatically cast a variable of one type to another. That still has to be done explicitly.
func main() {
var x int64 = 1
Func(SpecialInt64(x)) // this will work
Func(x) // this will not work
}
type SpecialInt64 int64
func Func(x SpecialInt64) {
}
Arrggg- that's the best reason, so far, to avoid Go.
Almost nothing is a number. A length is not a number, an age is not a number, a phone number is not a number - sin(2inches) is meaningless, 30years^2 is meaningless, phone#*2 is meaningless, and 2inches+30years is certainly meaningless - but most of our languages permit us to construct, and use, and confuse these meaningless things.
This only happens for literal values. Mixing up variables of different types will result in a type error.
When you write 42 in Go, it’s not an int32 or int64 or some more specific type. It’s automatically inferred to have the correct type. This applies even for user-defined numeric types.
Yep in the same way it would allow `var u userID = 1` it allows `Work(1)` rather than insisting on `var u userID = userID(1)` and `Work(userID(1))`.
I teach Go a few times a year, and this comes up a few times a year. I've not got a good answer why this is consistent with such an otherwise-explicit language.
In order to run into this "issue" you'd have to do exactly what you did here - pass in a literal "1" to a function that expects a userID. That is not an issue of any sort.
This seems like a conclusion derived from the ideas in parse don't validate[1].
The goal is to encode the information you learn while parsing your data into your type system. This unlocks so many capabilities: better error handling, making illegal states unrepresentable, better compiler checking, better autocompletion etc.
I've used the approach described for uuids on a project and I liked it. We were using typescript so we went further using template literal types [1]
type UserId = `user:${uuid}`;
type OrgId = `org:${uuid}`;
This had the benefit that we could add validation (basic begins with kind of logic) and it was obvious upon visual inspection (e.g. in logs/debugging).
If writing the check is too tricky, sometimes it can just be easier to track the type of a value with the value (if you can be told the type externally) with tagged unions (AKA: Discriminated unions). See: https://www.typescriptlang.org/docs/handbook/typescript-in-5...
We were using Mongo and stored the ids with the prefix in the DB as the primary key. Pretty much everywhere we were passing them around as strings, never as 128 bit int, so there was no integrity checking outside of the app layer.
The only drawback was marshalling the types when they come out of the db layer. Since the db library types were string we had to hard cast them to the correct types, really my only pain. That isn't such a big deal, but it means some object creation and memory waste, like:
We normally didn't do it, but it would be at that time you could have some `function isObjectId(id:string) : id is ObjectId { id.beginsWith("object:"); }` wrapper for formal verification (and maybe throw exceptions on bad keys). And we were probably doing some type conversions anyway (e.g. `new Date(result.createdAt)`).
If we were reading stuff from the client or network, we would often do the verification step with proper error handling.
Not because it's a bad idea. Quite the contrary. I've sung the praises of it myself.
But because it's like the most basic way you can use a type system to prevent bugs. In both the sense used in the article, and in the sense that it is something you have to do to get the even more powerful tools brought to bear on the problem that type systems often.
And yet, in the real world, I am constantly explaining this to people and constantly fighting uphill battles to get people to do it, and not bypass it by using primitives as much as possible then bashing it into the strict type at the last moment, or even just trying to remove the types.
Here on HN we debate the finer points of whether we should be using dependent typing, and in the real world I'm just trying to get people to use a Username type instead of a string type.
Not always. There are some exceptions. And considered over my entire career, the trend is positive overall. But there's still a lot of basic explanations about this I have to give.
I wonder what the trend of LLM-based programming will result in after another few years. Will the LLMs use this technique themselves, or will people lean on LLMs to "just" fix the problems from using primitive types everywhere?
Yeah I agree with wvenable. It's definitely safer but most languages have very poor support for doing it ergonomically. In fact I have yet to find one that makes it genuinely nice.
I think if it were better supported in the majority of strictly typed programming languages then it would be used more. Most languages make it a big hassle.
Does anyone know the term for this? I had "Type Driven Development" in my head, but I don't know if that's a broadly used term for this.
It's a step past normal "strong typing", but I've loved this concept for a while and I'd love to have a name to refer to it by so I can help refer others to it.
Using basic types for domain concepts is called 'primitive obsession'. It's been considered code smell for at least 25 years. So this would be... not being primitive obsessed. It isn't anything driven development.
Different people draw the line in different places for this. I've never tried writing code that takes every domain concept, no matter how small, and made a type out of it. It's always been on my bucket list though to see how it works out. I just never had the time or in-the-moment inclination to go that far.
I think often times it's enough to have enums for known ints, for example and have some parameter checking for ranges when known.
Some languages like C++ made a contracts concept where you could make these checks more formal.
As some people indicated the auto casting in many languages could make the implementation of these primitive based types complicated and fragile and provide more nuisance than it provides value.
Yep! I recently started playing with Ada and they make tightly specifying your types based upon primitives pretty easy. You also have some control over auto conversion based upon the specifics of how you declare them.
The overall idea of using your type system to enforce invariants is called typeful programming [1]. The first few sentences of that paper are:
"There exists an identifiable programming style based on the widespread use of type information handled through mechanical typechecking techniques. This typeful programming style is in a sense independent of the language it is embedded in; it adapts equally well to functional, imperative, object-oriented, and algebraic programming, and it is not incompatible with relational and concurrent programming."
> The strongly typed identifier commonly wraps the data type used as the primary key in the database, such as a string, an integer or universally unique identifier (UUID).
It's taking the original idea behind Hungarian notation (now called "Apps Hungarian notation" to distinguish from "Systems Hungarian notation" which uses datatype) and moving it into the type system.
To keep building on history, I'd suggest Hungarian types.
They were already called "painted types" by same the inventor of Hungarian notation. Simonyi used this term in his PhD thesis, Meta-programming, in 1976! (Published 1977).
Meta-programming also introduced a notation which was the precursor to Hungarian Notation (page 44,45), so painted types technically pre-date Hungarian Notation.
> These examples show that the idea of types is independent of how the objects belonging to the type are represented. All scalar quantities appearing above - column numbers, indices and so forth, could be represented as integers, yet the set of operations defined for them, and therefore their types, are different. We shall denote the assignment of objects to types, independent of their representations, by the term painting. When an object is painted, it acquires a distinguishing mark (or color) without changing its underlying representation. A painted type is a class of values from an underlying type, collectively painted a unique color. Operations on the underlying type are available for use on painted types as the operations are actually performed on the underlying representation; however, some operations may not make sense within the semantics of the painted type or may not be needed. The purpose of painting a type is to symbolize the association of the values belonging to the type with a certain set of operations and the abstract objects represented by them.
"Type driven development" is usually meant to say you will specify your system behavior in the types. Often by writing the types first and having the actual program determined by them. Some times so completely determined that you can use some software (not an LLM) to write it. (The name is a joke about the other TDD.)
Relevant terms are "Value object" (1) and avoiding "Primitive obsession" where everything is "stringly typed".
Strongly typed ids should be Value Objects, but not all value objects are ids. e.g. I might have a value object that represents an x-y co-ordinate, as I would expect an object with value (2,3) to be equal to a different object with the same value.
If you have a TypeError it's already too late. Decorating everything with beartype means I can catch things way in advance. It also forces me to specify types to all my functions which really helps LLM.
See for example in wdoc, my advanced personal RAG library:
Now imagine that inputB is directly received from some library you imported. So pyright might not be able to check its type and inputB is actually a List. Then you will never get a crash in the version 1. But its types are wrong as inputB is a List.
The version 2 on the other hand will crash as List don't have a shape attribute. Notice also how func returns inputB, propagating the wrong type.
Sure that means the code still works until you modify version 1, but any developpers or LLM that reads func would get the wrong idea about how to modify such code. Also this example is trivial but it can become much much more complicated of course.
This would not be caught in pyright but beartype would. I'm basically using beartype absolutely everywhere I can and it really made me way more sure of my code.
If you're not convinced I'm super curious about your reasoning!
PS: also try/except in python are extremely slow so figuring out types in advance is always AFAIK a good idea performance wise.
It makes sense I guess mostly if Python doesn't have good third party library typechecking... But how would beartype catch it? I mean a runtime error is a runtime error... You trade one for another?
I'm using typed IDs in TypeScript with template literal types. I.e. `type UserId = `user_${string}` I've written a blog post about it a while ago [1] One can argue that this pollutes runtime, however, it is a feature to me. When the ID pops up in the logs, it is instantly obvious what the object is, error messages are more meaningful. You can notice that Stripe uses such approach in their API.
they constantly try to escape
from the darkness outside & within
by dreaming of type systems so perfect
that no one will need to be good
but the strings that are will shadow
the abstract datatype that pretends to be
There are benefits of such things, especially if it can be handled by the compiler so that it does not make the code inefficient. In some cases it might even automatically convert the type, but often it is better to not do so. Furthermore, there may be an operator to ignore the type and use the representation directly, which must be specified explicitly (in order to avoid bugs in the software involving doing it by mistake).
In the example, they are (it seems) converting between Celsius and Fahrenheit, using floating point. There is the possibility of minor rounding errors, although if you are converting between Celsius and Kelvin with integers only then these rounding errors do not occur.
In some cases, a function might be able to work with any units as long as the units match.
> Public and even private functions should often avoid dealing in floats or integers alone
In some cases it makes sense to use those types directly, e.g. many kind of purely mathematical functions (such as checking if a number is prime). When dealing with physical measurements, bit fields, ID numbers, etc, it does make sense to have types specifically for those things, although the compiler should allow to override the requirement of the more specific type in specific cases by an explicit operator.
There is another article about string types, but I think there is the problem of using text-based formats, that will lead to many of these problems, including needing escaping, etc.
I’ve been using hacks to do this for a long time. I wish it was simpler in C++. I love C++ typing but hate the syntax and defaults. It’s so complicated to get started with.
The problem with this post is that the author is conflating two different things. Using a type system to capture the units of a measurement or metric is straightforwardly better than having them be implied. Stripping a numeric value and unit down to just a value involves an obvious loss of information. That situation is wholly different than just wrapping your UUID in some bespoke type which doesn't provide any extra information. They just look the same because you're mechanically doing something similar (ie. wrapping some primitive or standard type in something else). Not to mention unless you want to make your wrappers monads you're going to have to unwrap them at some point anyway at which point you can still transpose the actual type when you have to call any external library/function. I would love to know what the test suites of these 'many bugs in real systems' projects looked like. I suspect the test suite coverage wasn't very good.
> I would love to know what the test suites of these 'many bugs in real systems' projects looked like. I suspect the test suite coverage wasn't very good.
The same argument gets brought up in favor of dynamic typing. The point of typing is that you don't need all those repetitive tests.
Moreover, the coding feedback loop gets shorter since there's no need to wait until the tests run to find out a string was passed in instead of an int (or UserID).
Arguably, you can get rid of "primitive types" entirely. xtc-lang's "Turtles type system" does so, where all the built-in types are defined in the standard library, where the definitions are infinitely recursive, but we have a fixpoint which we can use as if it were "primitive".
> The Ecstasy type system is called the Turtles Type System, because the entire type system is bootstrapped on itself, and -- lacking primitives -- solely on itself. An Int, for example, is built out of an Array of Bit, and a Bit is built out of an IntLiteral (i.e. 0 or 1), which is built out of a String, which is an Array of Char, and a Char is built out of an Int. Thus, an Int is built out of many Ints. It's turtles, the whole way down.
Unfortunately, this can be somewhat awkward to implement in certain structural typed languages like TypeScript. I often find myself writing something along the lines of
type UserID = string & { readonly __tag: unique symbol }
I never understood why people are so keen to do that in TypeScript. With that definition a `UserID` can still be silently "coerced" to a `string` everywhere. So you only get halfway there to an encapsulated type.
I think it's a much better idea to do:
type UserID = { readonly __tag: unique symbol }
Now clients of `UserID` no longer knows anything about the representation. Like with the original approach you need a bit of casting, but that can be neatly encapsulated as it would be in the original approach anyway.
There was a post a decade or more ago, I think written with Java, that used variables like "firstname", "lastname", "fullname", and "nickname" in its example, including some functions to convert between them. Does this sound familiar to anyone?
The examples were a bit less contrived than this, encoding business rules where you'd want nickname for most UI but real name for official notifications, and the type system prevented future devs from using the wrong one when adding new UI or emails.
Go is a great language because it has distinct types by default, it's not about "making invalid states unrepresentable", it's about recording relationships about a particular type of value and where it can be used ie. it doesn't matter that UserID is just a string, what matters, is that now you can see what string values are UserIDs without making assumptions based on naming conventions.
Supoose you make two simple types one for Kelvin K and the other for Fahrenheit F or degrees D.
And you implement the conversions between them in the types.
But then you have something like
d: D = 10;
For i=1...100000:
k=f_Take_D_Return_K(d)
d=g_Take_K_Return_D(k)
end
Then you will implicitly have many many automatic conversions that are not useful.
How to handle this? Is it easily catched by the compiler when the functions are way more complex?
In F#, which has measure types, the types are checked at compile time but erased at runtime, so they have no additional runtime cost. Measures are a kind of painted type.
[<Measure>] type degC;
[<Measure>] type K;
let degrees_to_kelvin (degrees : float<degC>) : float<K> =
degrees * 1<K/degC> + 273.15<K>
let d = 10.0<degC>
let k : float<K> = degrees_to_kelvin d
The .NET runtime only sees `float`, as the measures have been erased, and constant folding will remove the `*1` that we used to change the measure. The `degrees_to_kelvin` call may also be inlined by the JIT compiler. We could potentially add `[<MethodImpl(MethodImplOptions.AggressiveInlining)>]` to force it to inline when possible, then constant folding may reduce the whole expression down to its result in the binary.
The downside to adding the SI into the type system is the SI is not a sound type system. For example:
[<Measure>] type m
[<Measure>] type s
[<Measure>] type kg
[<Measure>] type N = kg*m/s^2
[<Measure>] type J = kg*m^2/s^2
[<Measure>] type Nm = N*m
let func_expecting_torque (t : float<Nm>) = ...
let x = 10.0<J>
func_expecting_torque x
The type system will permit this: using torque where energy is expected, and vice-versa, because they have the same SI unit, but they don't represent the same thing, and ideally it should be rejected. A potential improvement is to include Siano's Orientational Analysis[1], which can resolve this particular unsoundness because the orientations of Nm and J would be incompatible.
I interpret your question as «given that I am doing many conversions between temperature, because that makes it easier to write correct code, then I worry that my code will be slow because I am doing many conversions».
My response is: these conversions are unlikely to be the slow step in your code, don’t worry about it.
I do agree though, that it would be nice if the compiler could simplify the math to remove the conversions between units. I don’t know of any languages that can do that.
That's exactly the problem, in the software I have in mind, the conversions are actually very slow, and I can't easily change the content of the functions that process the data, they are very mathematical, it would take much time to rewrite everything.
For example, it's not my case but it's like having to convert between two image representations (matrix multiply each pixel) every time.
I'm scared that this kind of 'automatic conversion' slowness will be extremely difficult to debug and to monitor.
Why would it be difficult to monitor the slowness? Wouldn’t a million function calls to the from_F_to_K function be very noticeable when profiling?
On your case about swapping between image representations: let’s say you’re doing a FFT to transform between real and reciprocal representations of an image - you probably have to do that transformation in order to do the the work you need doing on reciprocal space. There’s no getting around it. Or am I misunderstanding?
Please don’t take my response as criticism, I’m genuinely interested here, and enjoying the discussion.
I have many functions written by many scientists in a unique software over many years, some expect a data format the others another, it's not always the same function that is called, but all the functions could have been written using a unique data format. However, they chose the data format when writing the functions based on the application at hand at that moment and the possible acceleration of their algorithms with the selected data structure.
When I tried to refactor using types, this kind of problems became obvious. And forced more conversions than intended.
So I'm really curious because, a part from rewriting everything, I don't see how to avoid this problem. It's more natural for some applications to have the data format 1 and for others the data format 2. And forcing one over the other would make the application slow.
The problem arises only in 'hybrid' pipelines when new scientist need to use some existing functions some of them in the first data format, and the others in the other.
As a simple example, you can write rotations in a software in many ways, some will use matrix multiply, some Euler angles, some quaternions, some geometric algebra. It depends on the application at hand which one works the best as it maps better with the mental model of the current application. For example geometric algebra is way better to think about a problem, but sometimes Euler angles are output from a physical sensor. So some scientists will use the first, and the others the second. (of course, those kind of conversions are quite trivial and we don't care that much, but suppose each conversion is very expensive for one reason or another)
If I understood the problem correctly, you should try calculating each format of the data once and reusing it. Something like:
type ID {
AsString string
AsInt int
AsWhatever whatever
}
function new type ID:
return new ID {
AsString: calculateAsString()
AsInt: calculateAsInt()
AsWhatever: calculateAsWhatever()
}
This does assume every representation will always be used, but if that's not the case it's a matter of using some manner of a generic only-once executor, like Go's sync.Once.
This pattern is exactly the pattern I recommended two weeks ago in a thread about a nearly catastrophic OpenZFS bug https://news.ycombinator.com/item?id=44531524 in response to someone saying we should use AI to detect this class of bugs. I'm glad there are still people who think alike and opt for simpler, more deterministic solutions such as using the type system.
This is also an incredibly useful technique with LLMs. If you alias types (e.g. str to DateStr) the LLM can better infer which functions to select and how to compose them
This is an accidental benefit of golang for those coming from python , perl or php. At first making structs and types is a pain. But within few hundred lines it’s a blessing .
Being forced to think early on types has a payoff at the medium complexity scale
I've seen experienced programmers do this a lot. It's the kind of thing that someone thinks is annoying, without realizing that it was preventing them from doing something incorrect.
I think Rich Hickey has a point that bugs like this almost certain get caught by running the program. If they make it into production it usually results in an obscure edge case.
I’m sure there are exceptions but unless you’re designing for the worst case (safety critical etc) rather than average case (web app), types come with a lot of trade offs.
I’ve been on the fence about types for a long time, but having built systems fast at a startup for years, I now believe dynamic typing is superior. Folks I know who have built similar systems and are excellent coders also prefer dynamic typing.
In my current startup we use typescript because the other team members like it. It does help replace comments when none are available, and it stops some bugs, but it also makes the codebase very hard to read and slows down dev.
A high quality test suite beats everything else hands down.
An engineer getting up to speed on a 10 year old web app that uses dynamic types will likely have a very different opinion.
No types anywhere, so making a change is SCARY! And all the original engineers have usually moved on. Fun times. Types are a form of forced documentation after all, and help catch an entire class of bugs. If you’re really lucky, the project has good unit tests.
I think dynamic typing is wonderful for making software quickly, and it can be a force multiplier for startups. I also enjoy it when creating small services or utilities. But for a large web app, you’ll pay a price eventually. Or more accurately…the poor engineer that inherits your code in 10 years will pay the price. God bless them if they try to do a medium sized refactor without types lol. I’ve been on both ends of the spectrum here.
Pros and cons. There’s _always_ a tradeoff for the business.
But most startups aren’t building for 10 years out. If you use a lot of typing, you’ll probably die way before then. But yeah if you’re building a code base for the long term then use types unless you’re disciplined enough to write comments and good code.
As for refactoring, that is exactly what test suites are for.
When a bug like this can cause real world harm, we can't just bumper car program our way out of things. As engineers we should be able to provide real guarantees.
> In any nontrivial codebase, this inevitably leads to bugs when, for example, a string representing a user ID gets used as an account ID
Inevitably is a strong word. I can't recall the last time I've seen such bug in the wild.
> or when a critical function accepts three integer arguments and someone mixes up the correct order when calling it.
Positional arguments suck and we should rely on named/keyword arguments?
I understand the line of reasoning here, but the examples are bad. Those aren't good reasons to introduce new types. If you follow this advice, you'll end up with an insufferable codebase where 80% LoC is type casting.
Types are like database schemas. You should spend a lot of time thinking about semantics, not simply introduce new types because you want to avoid (hypothetical) programmer errors.
"It is better to have 100 functions operate on one data structure than to have 10 functions operate on 10 data structures."
I've been using this technique in Rust for years, really helps catch bugs early and makes code more readable. Wish more languages had similar type systems.
Does this apply in Java where adding a type means every ID has to have a class instance in the heap? ChatGPT says I might want to wait for Project Valhalla value types.
Personally I like it, and it catches bugs right away, especially when there are multiple possible ids, e.g.
func AddMessage(u UserId, m MessageId)
If it's just
func AddMessage(userId, messageId string)
it's very easy to accidentally call as
AddMessage(messageId, userId)
and then best-case you are wasting time figuring out a test failure, and worst case trying to figure out the bug IRL.
V.S. an instant compile error.
I have seen errors like this many times, both written by myself and others. I think it's great to use the type system to eliminate this class of error!
(Especially in languages like Go that make it very low-friction to define the newtype.)
Another benefit if you're working with any sort of static data system is it makes it very easy to validate the data -- e.g. just recursively scan for instances of FooId and make sure they are actually foo, instead of having to write custom logic or schema for everywhere a FooId might occur.
Types give you static proof where tests only give partial inductive evidence. I cannot _fathom_ why people would prefer tests over types where types do the job, outside anything but sheer ignorance.
The compiler tests the type is correct wherever you use it. It is also documentation.
Still have tests! But types are great.
But sadly, in practice I don't often use a type per ID type because it is not idiomatic to code bases I work on. It's a project of its own to move a code base to be like that if it wasn't in the outset. Also most programming languages don't make it ergonomic.
The idea is to just wrap them in a unique type per intended use case, like AccountID, SessionID, etc. and inside the may contain a single field with String.
I'm not familiar with Go. Please correct me, but this reads like object oriented programming i.e. OOP for every kind of data?
Coming from C++, this kind of types with classes make sense. But also are a maintenance task with further issues, were often proper variable naming matters. Likely a good balance is the key.
This isn't an OO thing at all. In C, to contrast with Go, a typedef is an alias. You can use objects (general sense) of the type `int` interchangeably with something like `myId` which is created through `typedef int myId`.
That is, this is perfectly acceptable C:
int x = 10;
myId id = x; // no problems
In Go the equivalent would be an error because it will not, automatically, convert from one type to another just because it happens to be structurally identical. This forces you to be explicit in your conversion. So even though the type happens to be an int, an arbitrary int or other types which are structurally ints cannot be accidentally converted to a myId unless you somehow include an explicit but unintended conversion.
I generally agree, but I think the real strength in types come from the way in which they act as documentation and help you refactor. If you see a well laid out data model in types you supercharge your ability to understand a complex codebase. Issues like the one in the example should have been caught by a unit test.
Also validation. In Java, you can have almost seamless validation on instantiation of your very objects. That's why having a class for IBAN instead of String containing IBAN is the right way to do.
Well, "newtype pattern" is itself a reinvention of a decades old concept. Presumably "newtype pattern" gets its name from the `newtype` keyword in Haskell, which was introduced in Haskell 1.3 (1996).
To the best of my knowledge, the earliest specific term for the concept is painted types (Simonyi, 1976)[0], but I believe this was a new term for a concept that was already known. Simonyi himself quotes (Hoare, 1970)[1]. Hoare doesn't provide a specific term like painted type for the type being defined, but he describes new types as being built from constintuent types, where a singular constituent type is known as the base type.
Simonyi uses the term underlying type rather than base type. Hoare eludes to painted types by what they contain - though he doesn't explicitly require that the new type has the same representation as it's base type - so they're not necessarily equivalent, though in practice they often are. Simonyi made this explicit, which is what we expect from newtype in Haskell - and specifically why we'd specifically use `newtype` instead of `data` with a single (base) constituent.
If you're aware of any other early references, please share them.
> These examples show that the idea of types is independent of how the objects belonging to the type are represented. All scalar quantities appearing above - column numbers, indices and so forth, could be represented as integers, yet the set of operations defined for them, and therefore their types, are different. We shall denote the assignment of objects to types, independent of their representations, by the term painting. When an object is painted, it acquires a distinguishing mark (or color) without changing its underlying representation. A painted type is a class of values from an underlying type, collectively painted a unique color. Operations on the underlying type are available for use on painted types as the operations are actually performed on the underlying representation; however, some operations may not make sense within the semantics of the painted type or may not be needed. The purpose of painting a type is to symbolize the association of the values belonging to the type with a certain set of operations and the abstract objects represented by them.
> In most cases, a new type is defined in terms of previously defined constituent types; the values of such a new type are data structures, which can be built up from component values of the constituent types, and from which the component values can subsequently be extracted. These component values will belong to the constituent types in terms of which the structured type was defined. If there is only one constituent type, it is known as the base type.
Wait until you find out about Julia, the typesystem is brilliant. It encourages this kind of thing to allow you to give more info about what the number represents. It ensures there are no downsides because you can define functions to run on families of types.
Actually, not really. In this case UserId is still an integer, which means any method that takes an integer can also take a UserId. Which means your co-workers are likely to just use integer out of habit.
Also, you can still do integer things with them, such as
This can solve a lot of problems, but also introduce awkward situations where it is hard to make a square shape or panel because the width measure must first be converted explicitly into a height measure in order to be used as such which might be considered correct but also expensively awkward and pedantic.
Most static type systems that I know of disappear at runtime. You literally cannot "use" them once deployed to production.
(Typescript's Zed and Clojure's Malli are counterexamples. Although not official offerings)
Following OP's example, what prevents you from getting a AccountID parsed as a UserID at runtime, in production? In production it's all UUIDs, undistinguishable from one another.
A truly safe approach would use distinct value prefixes – one per object type. Slack does this I believe.
> Most static type systems that I know of disappear at runtime. You literally cannot "use" them once deployed to production.
That's part of the point of being static. If we can statically determine properties of the system and use that information in the derived machine code (or byte code or whatever), then we may be able to discard that information at runtime (though there are reasons not to discard it).
> Following OP's example, what prevents you from getting a AccountID parsed as a UserID at runtime, in production? In production it's all UUIDs, undistinguishable from one another.
If you're receiving information from the outside and converting it into data in your system you have to parse and validate it. If the UUID does not correspond to a UserID in your database or whatever, then the attempted conversion should fail. You'd have a guard like this:
if user_db.contains(UserID(uuid)) {
return UserID(uuid)
}
// signal an error or return a None, zero value, null, etc.
There are infinitely many runtime properties that are simply impossible to determine statically.
Static typing is just a tool, aiming to help with a subset of all possible problems you may find. If you think it's an absolute oracle of every possible problem you may find, sorry, that's just not true, and trivially demonstrable.
Your example already is a runtime check that makes no particular use of the type system. It's a simple "set contains" check (value-oriented, not type-oriented) which also is far more expensive than simply verifying the string prefix of a Slack-style object identifier.
Ultimately I'm not even saying that types are bad, or that static typing is bad. If you truly care about correctness, you'd use all layers at your disposition - static and dynamic.
The issues I see with this approach is when developers stop at this first level of type implementation. Everything is a type and nothing works well together, tons of types seem to be subtle permutations of each other, things get hard to reason about etc.
In systems like that I would actually rather be writing a weakly typed dynamic language like JS or a strongly typed dynamic language like Elixir. However, if the developers continue pushing logic into type controlled flows, eg:move conditional logic into union types with pattern matching, leverage delegation etc. the experience becomes pleasant again. Just as an example (probably not the actual best solution) the "DewPoint" function could just take either type and just work.
This would allow for some nice properties. It would also enable a bunch of small optimisations in our languages that we can't have today. Eg, I could make an integer that must fall within my array bounds. Then I don't need to do bounds checking when I index into my array. It would also allow a lot more peephole optimisations to be made with Option.
Weirdly, rust already kinda supports this within a function thanks to LLVM magic. But it doesn't support it for variables passed between functions.
*Checks watch*
We're going on 45 years now.
But yeah maybe expressive enough refinement typing leads to hard to write and slow type inference engines
I think the reasons are predominantly social, not theoretical.
For every engineer out there that gets excited when I say the words "refinement types" there are twenty that either give me a blank stare or scoff at the thought, since they a priori consider any idea that isn't already in their favorite (primitivistic) language either too complicated or too useless.
Then they go and reinvent it as a static analysis layer on top of the language and give it their own name and pat themselves on the back for "inventing" such a great check. They don't read computer science papers.
I think I'm sort of who you're talking about. I have no formal education and I am excited to have my compiler up to the point I can run a basic web server. I think it's a fairly traditional approach with a lexer, recursive decent parser, static analysis, then codegen. I'm going for a balance between languages like Ruby and Rust to get the best of both worlds.
You'll probably find it funny that I don't know the name for the technique Im using for dynamic dispatch. The idea is that as long as a collection doesn't have mixed types then the compiler statically knows the type even in loops and such. Only for mixed type collections, or maybe trait functions, will the compiler be forced to fall back to runtime dynamic dispatch. I find this cool because experts can write fast static code, but beginners won't be blocked by the compiler complaining about things they shouldn't have to care about yet. But, syntax highlighting or something may hint there are improvements to be made. If there is a name for this, or if it's too small a piece to deserve one, I would be very curious to know!
On Refinement Types, I not sure they are a good idea for general purpose languages and would love to be challenged on this. Succinctly, I think it's a leaky abstraction. To elaborate, having something like a `OneThroughTen` type seems helpful at first, but in reality it's spreading behaviour potentially all over the app as opposed to having a single function with the desired behaviour. If a developer has multiple spots they're generating a number and one spot is missing a check and causes a bug, then hopefully a lesson was learned not to do that and instead have a single spot for that logic. The heavy handed complexity of Refinement Types is not worth it to solve this situation.
If there are any thoughts out there they would be greatly appreciated!
procedure Sum_Demo is subtype Index is Integer range 0 .. 10; subtype Small is Integer range 0 .. 10;
begin for J in 1 .. 11 loop I := I + 1; end loop; end Sum_Demo;This compile, and the compiler will tell you: "warning: Constraint_Error will be raised at run time".
It's a stupid example for sure. Here's a more complex one:
This again compiles, but if you run it: raised CONSTRAINT_ERROR : sum_demo.adb:13 index check failedIt's a cute feature, but it's useless for anything complex.
While I'm not entirely convinced myself whether it is worth the effort, it offers the ability to express "a number greater than 0". Using type narrowing and intersection types, open/closed intervals emerge naturally from that. Just check `if (a > 0 && a < 1)` and its type becomes `(>0)&(<1)`, so the interval (0, 1).
I also built a simple playground that has a PoC implementation: https://nikeee.github.io/typescript-intervals/
[1]: https://github.com/microsoft/TypeScript/issues/43505
My specific use case is pattern matching http status codes to an expected response type, and today I'm able to work around it with this kind of construct https://github.com/mnahkies/openapi-code-generator/blob/main... - but it's esoteric, and feels likely to be less efficient to check than what you propose / a range type.
There's runtime checking as well in my implementation, but it's a priority for me to provide good errors at build time
mainly I find Raku (and the community) much -Ofun
Otherwise you could have type level asserts more generally. Why stop at a range check when you could check a regex too? This makes the difficulty more clear.
For the simplest range case (pure assignment) you could just use an enum?
https://play.rust-lang.org/?version=stable&mode=debug&editio...
rust-analyzer gives an error directly in IDE.
That power of course does come with a price, there does not exist a static analyzer that automatically checks things, even though you can pretty much generate beautiful tests based on specs. I think e.g. Rust teams can have more junior devs safely contribute due to enablement of less variability in code quality - the compiler enforces discipline. Clojure teams need higher baseline discipline but can move incredibly fast when everyone's aligned.
It's saddening to see when Clojure gets outright dismissed for being "untyped", even though it absolutely can change one's perspective about type systems.
Among the popular languages like golang, rust or python typescript has the most powerful type system.
How about a type with a number constrained between 0 and 10? You can already do this in typescript.
You can even programmatically define functions at the type level. So you can create a function that outputs a type between 0 to N. The issue here is that it’s a bit awkward you want these types to compose right? If I add two constrained numbers say one with max value of 3 and another with max value of two the result should be max value of 5. Typescript doesn’t support this by default with default addition. But you can create a function that does this. The issue is to create these functions you have to use tuples to do addition at the type level and you need to use recursion as well. Typescript recursion stops at 100 so there’s limits.Additionally it’s not intrinsic to the type system. Like you need peanno numbers built into the number system and built in by default into the entire language for this to work perfectly. That means the code in the function is not type checked but if you assume that code is correct then this function type checks when composed with other primitives of your program.
I get an error that I can't assign something that seems to me assignable, and to figure out why I need to study functions at type level using tuples and recursion. The cure is worse than the disease.
If you trust the type, then it's fine. The code is safer. In the world of of the code itself things are easier.
Of course like what you're complaining about, this opens up the possibility of more bugs in the world of types, and debugging that can be a pain. Trade offs.
In practice people usually don't go crazy with type level functions. They can do small stuff, but usually nothing super crazy. So type script by design sort of fits the complexity dynamic you're looking for. Yes you can do type level functions that are super complex, but the language is not designed around it and it doesn't promote that style either. But you CAN go a little deeper with types then say a language with less power in the type system like say Rust.
I'll take a modern hindley milner variant any day. Sophisticated enough to model nearly any type information you'll have need of, without blurring the lines or admitting the temptation of encoding complex logic in it.
https://youtu.be/0mCsluv5FXA
In practice nobody goes too crazy with it. You have a problem with a feature almost nobody uses. It's there and Range<N> is like the upper bound of complexity I've seen in production but that is literally extremely rare as well.
There is no "temptation" of coding complex logic in it at all as the language doesn't promote these features at all. It's just available if needed. It's not well known but typescript types can be easily used to be 1 to 1 with any hindley milner variant. It's the reputational baggage of JS and frontend that keeps this fact from being well known.
In short: Typescript is more powerful then hindley milner, a subset of it has one to one parity with it, the parts that are more powerful then hindley milner aren't popular and used that widely nor does the flow of the language itself promote there usage. The feature is just there if you need it.
If you want a language where you do this stuff in practice take a look at Idris. That language has these features built into the language AND it's an ML style language like haskell.
Static typing / dynamic typing refers to whether types are checked at compile time or runtime. "Static" = compile time (eg C, C++, Rust). "Dynamic" = runtime (eg Javascript, Ruby, Excel)
Strong / weak typing refers to how "wibbly wobbly" the type system is. x86 assembly language is "weakly typed" because registers don't have types. You can do (more or less) any operation with the value in any register. Like, you can treat a register value as a float in one instruction and then as a pointer during the next instruction.
Ruby is strongly typed because all values in the system have types. Types affects what you can do. If you treat a number like its an array in ruby, you get an error. (But the error happens at runtime because ruby is dynamically typed - thus typechecking only happens at runtime!).
Sure it stops you from running into "'1' + 2" issues, but won't stop you from yeeting VeryRawUnvalidatedResponseThatMightNotBeAuthorized to a function that takes TotalValidatedRequestCanUseDownstream. You won't even notice an issue until:
- you manually validate
- you call a method that is unavailable on the wrong object.
Related Stack Overflow post: https://stackoverflow.com/questions/2690544/what-is-the-diff...
So yeah I think we should just give up these terms as a bad job. If people mean "static" or "dynamic" then they can say that, those terms have basically agreed-upon meanings, and if they mean things like "the type system prohibits [specific runtime behavior]" or "the type system allows [specific kind of coercion]" then it's best to say those things explicitly with the details filled in.
It says:
> I give the following general definitions for strong and weak typing, at least when used as absolutes:
> Strong typing: A type system that I like and feel comfortable with
> Weak typing: A type system that worries me, or makes me feel uncomfortable
https://news.ycombinator.com/item?id=42367644
A month before that:
https://news.ycombinator.com/item?id=41630705
I've given up since then.
[1] https://doc.rust-lang.org/book/ch03-01-variables-and-mutabil...
let a = 1;
let a = '1';
Strongly typing means I can do 1 + '1' variable names and types has nothing to do with it being strongly typed.
This is where the concept of “Correct by construction” comes in. If any of your code has a precondition that a UUID is actually unique then it should be as hard as possible to make one that isn’t. Be it by constructors throwing exceptions, inits returning Err or whatever the idiom is in your language of choice, the only way someone should be able to get a UUID without that invariant being proven is if they really *really* know what they’re doing.
(Sub UUID and the uniqueness invariant for whatever type/invariants you want, it still holds)
This is one of the basic features of object-oriented programming that a lot of people tend to overlook these days in their repetitive rants about how horrible OOP is.
One of the key things OO gives you is constructors. You can't get an instance of a class without having gone through a constructor that the class itself defines. That gives you a way to bundle up some data and wrap it in a layer of validation that can't be circumvented. If you have an instance of Foo, you have a firm guarantee that the author of Foo was able to ensure the Foo you have is a meaningful one.
Of course, writing good constructors is hard because data validation is hard. And there are plenty of classes out there with shitty constructors that let you get your hands on broken objects.
But the language itself gives you direct mechanism to do a good job here if you care to take advantage of it.
Functional languages can do this too, of course, using some combination of abstract types, the module system, and factory functions as convention. But it's a pattern in those languages where it's a language feature in OO languages. (And as any functional programmer will happily tell you, a design pattern is just a sign of a missing language feature.)
Does this count as a missing language feature by requiring a "factory pattern" to achieve that?
Convention in OOP languages is (un?)fortunately to just throw an exception though.
Throwing an error is doing exactly that though, its exactly the same thing in theory.
What you are asking for is just more syntactic sugar around error handling, otherwise all of that already exists in most languages. If you are talking about performance that can easily be optimized at compile time for those short throw catch syntactic sugar blocks.
Java even forces you to handle those errors in code, so don't say that these are silent there is no reason they need to be.
Nothing stops you from returning Result<CorrectObject,ConstructorError> in CorrectObject::new(..) function because it's just a regular function struct field visibility takes are if you not being able to construct incorrect CorrectObject.
What sucks about OOP is that it also holds your hand into antipatterns you don't necessarily want, like adding behavior to what you really just wanted to be a simple data type because a class is an obvious junk drawer to put things.
And, like your example of a problem in FP, you have to be eternally vigilant with your own patterns to avoid antipatterns like when you accidentally create a system where you have to instantiate and collaborate multiple classes to do what would otherwise be a simple `transform(a: ThingA, b: ThingB, c: ThingC): ThingZ`.
Finally, as "correct by construction" goes, doesn't it all boil down to `createUUID(string): Maybe<UUID>`? Even in an OOP language you probably want `UUID.from(string): Maybe<UUID>`, not `new UUID(string)` that throws.
One way to think about exceptions is that they are a pattern matching feature that privileges one arm of the sum type with regards to control flow and the type system (with both pros and cons to that choice). In that sense, every constructor is `UUID.from(string): MaybeWithThrownNone<UUID>`.
In other words, exceptions are for cases where the programmer screwed up. While programmers screwing up isn't unusual at all, programmers like to think that they don't make mistakes, and thus in their eye it is unusual. That is what sets it apart from environmental failures, which are par for the course.
To put it another way, it is for signalling at runtime what would have been a compiler error if you had a more advanced compiler.
Just Java (and Javascript by extension, as it was trying to copy Java at the time), really. You do have a point that Java programmers have infected other languages with their bad habits. For example, Ruby was staunchly in the "return errors as values and leave exception handling for exceptions" before Rails started attracting Java developers, but these days all bets are off. But the "purists" don't advocate for it.
'null' (and to a large extent mutability) drives a gigantic hole through whatever you're trying to prove with correct-by-construction.
You can sometimes annotate against mutability in OO, but even then you're probably not going to get given any persistent collections to work with.
The OO literature itself recommends against using constructors like that, opting for static factory pattern instead.
In my book, that's the most important difference with C, Zig or Go-style languages, that consider that data structures are mostly descriptions of memory layout.
In Haskell:
1. Create a module with some datatype
2. Don't export the datatype's constructors
3. Export factory functions that guarantee invariants
How is that more complicated than creating a class and adding a custom constructor? Especially if you have multiple datatypes in the same module (which in e.g. Java would force you to add multiple files, and if there's any shared logic, well, that will have to go into another extra file - thankfully some more modern OOP languages are more pragmatic here).
(Most) OOP languages treat a module (an importable, namespaced subunit of a program) and a type as the same thing, but why is this necessary? Languages like Haskell break this correspondence.
Now, what I'm missing from Haskell-type languages is parameterised modules. In OOP, we can instantiate classes with dependencies (via dependency injection) and then call methods on that instance without passing all the dependencies around, which is very practical. In Haskell, you can simulate that with currying, I guess, but it's just not as nice.
I still follow TDD-with-a-test for all new features, all edge cases and all bugs that I can't trigger failure by changing the type system for.
However, red-green-refactor-with-the-type-system is usually quick and can be used to provide hard guarantees against entire classes of bug.
It is always great when something is so elegantly typed that I struggle to think of how to write a failing test.
What drives me nuts is when there are testing left around basically testing the compiler that never were “red” then “greened” makes me wonder if there is some subtle edge case I am missing.
Now I just think of types as the test suite’s first line of defense. Other commenters who mention the power of types for documentation and refactoring aren’t wrong, but I think that’s because types are tests… and good tests, at almost any level, enable those same powers.
However, Im convinced that theyre both part of the same class of thing, and that "TDD" or red/green/refactor or whatever you call it works on that class, not specifically just on tests.
Documentation is a funny one too - I use my types to generate API and other sorts of reference docs and tests to generate how-to docs. There is a seemingly inextricable connection between types and reference docs, tests and how-to docs.
You can always enforce nominal types if you really need it.
Welcome to typescript. Where generics are at the heart of our generic generics that throw generics of some generic generic geriatric generic that Bob wrote 8 years ago.
Because they can’t reason with the architecture they built, they throw it at the type system to keep them in line. It works most of the time. Rust’s is beautiful at barking at you that you’re wrong. Ultimately it’s us failing to design flexibility amongst ever increasing complexity.
Remember when “Components” where “Controls” and you only had like a dozen of them?
Remember when a NN was only a few hundred thousand parameters?
As complexity increases with computing power, so must our understanding of it in our mental model.
However you need to keep that mental model in check, use it. If it’s typing, do it. If it’s rigorous testing, write your tests. If it’s simulation, run it my friend. Ultimately, we all want better quality software that doesn’t break in unexpected ways.
You might go with:
And so on.That looks nice, but when you try to pattern match on it and have your pattern matching return the types that are associated with the specific operation, it won't work. The reason is that Typescript does not natively support GADTs. Libs like ts-pattern use some tricks to get closish at least.
And while this might not be very important for most application developers, it is very important for library authors, especially to make libraries interoperable with each other and extend them safely and typesafe.
The danger of that is of course that you provide a ladder over the wall you just built and instead of
They now go the shortcut route via numeric representation and may forget the conversion factor. In that case I'd argue it is best to always represent temperature as one unit (Kelvin or Celsius, depending on the math you need to do with it) and then just add a .display(Unit:: Fahrenheit) method that returns a string. If you really want to convert to TemperatureF for a calculation you would have to use a dedicated method that converts from one type to another.The unit thing is of course an example, for this finished libraries like pythons pint (https://pint.readthedocs.io/en/stable/) exist.
One thing to consider as well is that you can mix up absolute values ("it is 28°C outside") and temperature deltas ("this is 2°C warmer than the last measurement"). If you're controlling high energy heaters mixing those up can ruin your day, which is why you could use different types for absolutes and deltas (or a flag within one type). Datetime libraries often do that as well (in python for example you have datetime for absolute and timedelta for relative time)
First, the library author cannot reasonably define what is and isn't a checked exception in their public API. That really is up to the decision of the client. This wouldn't be such a big deal if it weren't so verbose to handle exceptions though: if you could trivially convert an exception to another type, or even declare it as runtime, maybe at the module or application level, you wouldn't be forced to handle them in these ways.
Second, to signature brittleness, standard advice is to create domain specific exceptions anyways. Your code probably shouldn't be throwing IOExceptions. But Java makes converting exceptions unnecessarily verbose... see above.
Ultimately, I love checked exceptions. I just hate the ergonomics around exceptions in Java. I wish designers focused more on fixing that than throwing the baby out with the bathwater.
Personally I use checked exceptions whenever I can't use Either<> and avoid unchecked like a plague.
Yeah, it's pretty sad Java language designer just completely deserted exception handling. I don't think there's any kind of improvement related to exceptions between Java 8 and 24.
To me they seem completely isomorphic?
But after experimenting a bit with checked exceptions, I realized how neglected exceptions are in Java. - There's no other way to handle checked exceptions other than try-catch block - They play very badly with API that use functional interfaces. Many APIs don't provide checked throws variant - catch block can't use generic / parameterized type, you need to catch Exception or Throwable then operate on it at runtime
After rolling my own Either<L,R>, it felt like a customizable typesafe macro for exception handling. It addresses all the annoyances I had with checked exception handling, and it plays nicely with exhaustive pattern matching using `sealed`.
Granted, it has the drawback that sometimes I have to explicitly spell out types due to local type inference failing to do so. But so far it has been a pleasant experience of handling error gracefully.
Semantically from CS point of view in language semantics and type system modelling, they are equivalent in puporse, as you are very well asking about.
try/catch has significantly more complex call sites because it affects control flow.
Can we build tools that helps us work with the boundary between isosemantic and isomorphic? Like any two things that are isosemantic should be translatable between each other. And so it represents an opportunity to make the things isomorphic.
https://news.ycombinator.com/item?id=44551088
https://news.ycombinator.com/item?id=44432640
> Your code probably shouldn't be throwing IOExceptions. But Java makes converting exceptions unnecessarily verbose
The problem just compounds too. People start checking things that they can’t handle from the functions they’re calling. The callers upstream can’t possibly handle an error from the code you’re calling, they have no idea why it’s being called.
I also hate IOException. It’s so extremely unspecific. It’s the worst way to do exceptions. Did the entire disk die or was the file not just found or do I not have permissions to write to it? IOException has no meaning.
Part of me secretly hopes Swift takes over because I really like its error handling.
[1] https://ericlippert.com/2008/09/10/vexing-exceptions/
So Java's checked exceptions force you to write verbose and pointless code in all the wrong places (the "in the middle" code that can't handle and doesn't care about the exception).
It doesn't, you can just declare that the function throws these as well, you don't have to handle it directly.
This is annoying enough to deal with in concrete code, but interfaces make it a nightmare.
To solve this, Rust does allow you to just Box<dyn Error> (or equivalents like anyhow). And Go has the Error interface. People who list out all concrete error types are just masochists.
It took until version 1.13 to have something better, and even now too many people still do errors.New("....."), because so is Go world.
A problem easily solved by writing business logic in pure java code without any IO and handling the exceptions gracefully at the boundary.
Think of the complaints around function coloring with async, how it's "contagious". Checked exceptions have the same function color problem. You either call the potential thrower from inside a try/catch or you declare that the caller will throw an exception.
And if you change a function deep in the call stack to return a different type on the happy path? Same thing. Yet, people don't complain about that and give up on statically type checking return values.
I honestly think the main reason that some people will simultaneously enjoy using Result/Try/Either types in languages like Rust while also maligning checked exceptions is because of the mental model and semantics around the terminology. I.e., "checked exception" and "unchecked exception" are both "exceptions", so our brains lumped those two concepts together; whereas returning a union type that has a success variant and a failure variant means that our brains are more willing lump the failure return and the successful return together.
To be fair, I do think it's a genuine design flaw to have checked and unchecked exceptions both named and syntactically handled similarly. The return type approach is a better semantic model for modelling expected business logic "failure" modes.
Incidentally, for exceptions, Java had (b), but for a long time didn't have (a) (although I think this changed?), leading to (b) being abused.
That's the point! The whole reason for checked exceptions is to gain the benefit of knowing if a function starts throwing an exception that it didn't before, so you can decide how to handle it. It's a good thing, not a bad thing! It's no different from having a type system which can tell you if the arguments to a function change, or if its return type does.
In fact, at each layer, if you want to propagate an error, you have to convert it to one specific to that layer.
I also think its a bit cleaner to have a nicely pattern matched handler blocks than bespoke handling at every level. That said, if unwrapped error results have a robust layout then its probably pretty equivalent.
Maybe you mean requests are failing on uncaught exceptions, in which case I'd say it's working well.
Or if they are unable to work, because they keep getting a maintenance page, as the load balancer redirects them after several HTTP 500 responses.
Anyway, you prefer critical workflow like payment to show a success but actually be an unhandled error?
I prefer an happy customer, and not having to deal with support calls.
Checked exceptions feel like a bad mix of error returns and colored functions to me.
But for one, Java checked exceptions don't work with generics.
In general, I think this largely falls when you have code that wants to just move bytes around intermixed with code that wants to do some fairly domain specific calculations. I don't have a better way of phrasing that, at the moment. :(
There are cases where you have the data in hand but now you have to look for how to create or instantiate the types before you can do anything with it, and it can feel like a scavenger hunt in the docs unless there's a cookbook/cheatsheet section.
One example is where you might have to use createVector(x, y, z): Vector when you already have { x, y, z }. And only then can you createFace(vertices: Vector[]): Face even though Face is just { vertices }. And all that because Face has a method to flip the normal or something.
Another example is a library like Java's BouncyCastle where you have the byte arrays you need, but you have to instantiate like 8 different types and use their methods on each other just to create the type that lets you do what you wish was just `hash(data, "sha256")`.
Using the right architecture, you could make it so your core domain type and logic uses the strictly typed aliases, and so that a library that doesn't care about domain specific stuff converts them to their higher (lower?) type and works with that. Clean architecture style.
Unfortunately, that involves a lot of conversion code.
I know what a UUID (or a String) is. I don't know what an AccountID, UserID, etc. is. Now I need to know what those are (and how to make them, etc. as well) to use your software.
Maybe an elaborate type system worth it, but maybe not (especially if there are good tests.)
https://grugbrain.dev/#grug-on-type-systems
Presumably you need to know what an Account and a User are to use that software in the first place. I can't imagine a reasonable person easily understanding a getAccountById function which takes one argument of type UUID, but having trouble understanding a getAccountById function which takes one argument of type AccountId.
What he means is that by introducing a layer of indirection via a new type you hide the physical reality of the implementation (int vs. string).
The physical type matters if you want to log it, save to a file etc.
So now for every such type you add a burden of having to undo that indirection.
At which point "is it worth it?" is a valid question.
You made some (but not all) mistakes impossible but you've also introduced that indirection that hides things and needs to be undone by the programmer.
> There is a UI for memorialising users, but I assured her that the pros simply ran a bit of code in the PHP debugger. There’s a function that takes two parameters: one the ID of the person being memorialised, the other the ID of the person doing the memorialising. I gave her a demo to show her how easy it was....And that’s when I entered Clowntown....I first realised something was wrong when I went back to farting around on Facebook and got prompted to login....So in case you haven’t guessed what I got wrong yet, I managed to get the arguments the wrong way round. Instead of me memorialising my test user, my test user memorialised me.
It's literally the opposite. A string is just a bag of bytes you know nothing about. An AccountID is probably... wait for it... an ID of an Account. If you have the need to actually know the underlying representation you are free to check the definition of the type, but you shouldn't need to know that in 99% of contexts you'll want to use an AccountID in.
> Now I need to know what those are (and how to make them, etc. as well) to use your software.
You need to know what all the types are no matter what. It's just easier when they're named something specific instead of "a bag of bytes".
> https://grugbrain.dev/#grug-on-type-systems
Linking to that masterpiece is borderline insulting. Such a basic and easy to understand usage of the type system is precisely what the grug brain would advocate for.
I'd much rather deal with the 2nd version than the first. It's self-documenting and prevents errors like calling "foo(userId, accountId)" letting the compiler test for those cases. It also helps with more complex data structures without needing to create another type.
I now know I never know whenever "a UUID" is stored or represented as a GUIDv1 or a UUIDv4/UUIDv7.
I know it's supposed to be "just 128 bits", but somehow, I had a bunch of issues running old Java servlets+old Java persistence+old MS SQL stack that insisted, when "converting" between java.util.UUID to MS SQL Transact-SQL uniqueidentifier, every now and then, that it would be "smart" if it flipped the endianess of said UUID/GUID to "help me". It got to a point where the endpoints had to manually "fix" the endianess and insert/select/update/delete for both the "original" and the "fixed" versions of the identifiers to get the expected results back.
(My educated guess it's somewhat similar to those problems that happens when your persistence stack is "too smart" and tries to "fix timezones" of timestamps you're storing in a database for you, but does that wrong, some of the time.)
They are generated with different algorithms, if you find these distinctions to be semantically useful to operations, carry that distinction into the type.
Seems like 98% of the time it wouldn’t matter.
I generally agree that it's easy to over-do, but can be great if you have a terse, dense, clear language/framework/docs, so you can instantly learn about UserID.
It is however useful to return a UUID type, instead of a [16]byte, or a HTMLNode instead of a string etc. These discriminate real, computational differences. For example the method that gives you a string representation of an UUID doesn't care about the surrounding domain it is used in.
Distinguishing a UUID from an AccountID, or UserID is contextual, so I rather communicate that in the aggregate. Same for Celsius and Fahrenheit. We also wouldn't use a specialized type for date times in every time zone.
The main problem with these is how do you actually get the verification needed when data comes in from outside the system. Check with the database every time you want to turn a string/uuid into an ID type? It can get prohibitively expensive.
Yes, that’s exactly the point. If you don’t know how to acquire an AccountID you shouldn’t just be passing a random string or UUID into a function that accepts an AccountID hoping it’ll work, you should have acquired it from a source that gives out AccountIDs!
never escape anything, either
just hand my users a raw SQL connection
https://news.ycombinator.com/item?id=44677515
I have tried to bring that to the prolog world [2] but I don't think my fellow prolog programmers are very receptive to the idea ^^.
[1] https://mpusz.github.io/mp-units/latest/
[2] https://github.com/kwon-young/units
https://kotlinlang.org/docs/inline-classes.html
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3320.htm
https://doc.rust-lang.org/rust-by-example/generics/new_types...
My biggest problem has been people not specifying their units. On our own code end I'm constantly getting people to suffix variables with the units. But there's still data from clients, standard library functions, etc. where the units aren't specified!
I’ve also done variations of this in TypeScript and Rust.
[1] enum class from C++11, classic enums have too many implicit conversions to be of any use.
They're fairly useful still (and since C++11 you can specify their underlying type), you can use them as namespaced macro definitions
Kinda hard to do "bitfield enums" with enum class
They're fairly useful still (and since C++11 you can specify their underlying type), you can use them as namespaced macro definitions
The name means "Value Object Generator" as it uses Source generation to generate the "Value object" types.
That readme has links to similar libraries and further reading.
Once you have several of these types, and they have validation and other concerns then the cost-benefit might flip.
FYI, In modern c#, you could try using "readonly record struct" in order to get lots of equality and other concerns generated for you. It's like a "whole library" but it's a compiler feature.
However I disagree in this case - if you have the problem that the library solves and it is ergonomic, then why not use it. Your "5-line-of-code example" doers not cover validation, and serialisation and casting concerns. As another commenter put it: "a properly constructed ID type has a non-trivial amount of code".
If you don't need more lines of code than that, then do your thing. But in the example that I looked at, I definitely would. As I said elsewhere in the thread, it is where all customer ids are strings, but only very specific strings are customer ids.
The larger point is that people who write c# and are reading this thread should know that these toolkits exist - that url links to other similar libraries and further reading. So they can can then make their own informed choices.
I prefer to have the generated code to be the part of the code repo. That's why I use code templates instead of source generators. But a properly constructed ID type has a non-trivial amount of code: https://github.com/vborovikov/pwsh/blob/main/Templates/ItemT...
That is correct, I've looked at the generated code and it's non-trivial, especially when validation, serialisation and casting concerns are present. And when you have multiple id types, and the allowed casts can change over time (i.e. lock it down when the migration is complete)
That's why I'd want it to be common, tested code.
I want it for a case where it seems very well suited - all customer ids are strings, but only very specific strings are customer ids. And there are other string ids around as well.
IMHO Migration won't be hard - you could allow casts to/from the primitive type while you change code. Temporarily disallowing these casts will show you where you need to make changes.
I don't know yet how "close to the edges" you would have to go back to the primitive types in ordered for json and db serialisation to work.
But it would be easier to get in place in a new "green field" codebase. I pitched it as a refactoring, but the other people were well, "antithetical" is a good word.
This is going to have the biggest impact on my coding style this year.
https://lukasschwab.me/blog/gen/deriving-safe-id-types-in-go...
https://lukasschwab.me/blog/gen/safe-incompatibility.html
Wrapper structs are the idiomatic way to achieve this, and with ExpressibleByStringLiteral are pretty ergonomic, but I wonder if there's a case for something like a "strong" typealias ("typecopy"?) that indicates e.g. "this is just a String but it's a particular kind of String and shouldn't be mixed with other Strings".
I guess the examples in TFA are golang? It's kind of nice that you don't have to define those wrapper types, they do make things a bit more annoying.
In C++ you have to be extra careful even with wrapper classes, because types are allowed to implicitly convert by default. So if Foo has a constructor that takes a single int argument, then you can pass an int anywhere Foo is expected. Fine as long as you remember to mark your constructors as explicit.
In OOP languages as long as the type you want to specialize isn't final you can just create a subclass. It's cheap (no additional wrappers or boxes), easy, and you can specialize behavior if you want to.
Unfortunately for various good reasons Java makes String final, and String is one of the most useful types to specialize on.
I think Rich Hickey was completely right, this is all information and we just need to get better at managing information like we are supposed to.
The downside of this approach is that these systems are tremendously brittle as changing requirements make you comfort your original data model to fit the new requirements.
Most OOP devs have seen atleast 1 library with over 1000 classes. Rust doesn't solve this problem no matter how much I love it. Its the same problem of now comparing two things that are the same but are just different types require a bunch of glue code which can itself lead to new bugs.
Data as code seems to be the right abstraction. Schemas give validation a-la cart while still allowing information to be passed, merged, and managed using generic tools rather than needing to build a whole api for every new type you define in your mega monolith.
This is an important concept to keep in mind. It applies to programming, it applies to politics, it applies to nearly every situation you can think of. Any time you find yourself wishing that everyone would just do X and the world would be a better place, realize that that is never going to happen, and that some people will choose to do Y — and some of them will even be right to do so, because you do not (and cannot) know the specific needs of every human being on the planet, so X will not actually be right for some of them.
Uhuh, so my age and my weight are the same (integers), but just have different types. Okay.
Go will not automatically cast a variable of one type to another. That still has to be done explicitly.
https://go.dev/play/p/4eNQOJSmGqDAlmost nothing is a number. A length is not a number, an age is not a number, a phone number is not a number - sin(2inches) is meaningless, 30years^2 is meaningless, phone#*2 is meaningless, and 2inches+30years is certainly meaningless - but most of our languages permit us to construct, and use, and confuse these meaningless things.
When you write 42 in Go, it’s not an int32 or int64 or some more specific type. It’s automatically inferred to have the correct type. This applies even for user-defined numeric types.
I teach Go a few times a year, and this comes up a few times a year. I've not got a good answer why this is consistent with such an otherwise-explicit language.
The goal is to encode the information you learn while parsing your data into your type system. This unlocks so many capabilities: better error handling, making illegal states unrepresentable, better compiler checking, better autocompletion etc.
[1]https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...
1. https://www.typescriptlang.org/docs/handbook/2/template-lite...
But depending on the format it can sometimes be tricky to narrow a string back down to that format.
We have type guards to do that narrowing. (see: https://www.typescriptlang.org/docs/handbook/2/narrowing.htm..., but their older example is a little easier to read: https://www.typescriptlang.org/docs/handbook/advanced-types....)
If writing the check is too tricky, sometimes it can just be easier to track the type of a value with the value (if you can be told the type externally) with tagged unions (AKA: Discriminated unions). See: https://www.typescriptlang.org/docs/handbook/typescript-in-5...
And if the formats themselves are generated at runtime and you can use the "unique" keyword to make sure different kinds of data are treated as separate (see: https://www.typescriptlang.org/docs/handbook/symbols.html#un...).
You can combine `unique symbol` with tagged unions and type predicates to make it easier to tell them apart.
I think it's a pretty good idea. I'm just wondering how this translated to other systems.
The only drawback was marshalling the types when they come out of the db layer. Since the db library types were string we had to hard cast them to the correct types, really my only pain. That isn't such a big deal, but it means some object creation and memory waste, like:
We normally didn't do it, but it would be at that time you could have some `function isObjectId(id:string) : id is ObjectId { id.beginsWith("object:"); }` wrapper for formal verification (and maybe throw exceptions on bad keys). And we were probably doing some type conversions anyway (e.g. `new Date(result.createdAt)`).If we were reading stuff from the client or network, we would often do the verification step with proper error handling.
Not because it's a bad idea. Quite the contrary. I've sung the praises of it myself.
But because it's like the most basic way you can use a type system to prevent bugs. In both the sense used in the article, and in the sense that it is something you have to do to get the even more powerful tools brought to bear on the problem that type systems often.
And yet, in the real world, I am constantly explaining this to people and constantly fighting uphill battles to get people to do it, and not bypass it by using primitives as much as possible then bashing it into the strict type at the last moment, or even just trying to remove the types.
Here on HN we debate the finer points of whether we should be using dependent typing, and in the real world I'm just trying to get people to use a Username type instead of a string type.
Not always. There are some exceptions. And considered over my entire career, the trend is positive overall. But there's still a lot of basic explanations about this I have to give.
I wonder what the trend of LLM-based programming will result in after another few years. Will the LLMs use this technique themselves, or will people lean on LLMs to "just" fix the problems from using primitive types everywhere?
It's a step past normal "strong typing", but I've loved this concept for a while and I'd love to have a name to refer to it by so I can help refer others to it.
[1] https://doc.rust-lang.org/rust-by-example/generics/new_types...
Different people draw the line in different places for this. I've never tried writing code that takes every domain concept, no matter how small, and made a type out of it. It's always been on my bucket list though to see how it works out. I just never had the time or in-the-moment inclination to go that far.
Some languages like C++ made a contracts concept where you could make these checks more formal.
As some people indicated the auto casting in many languages could make the implementation of these primitive based types complicated and fragile and provide more nuisance than it provides value.
"There exists an identifiable programming style based on the widespread use of type information handled through mechanical typechecking techniques. This typeful programming style is in a sense independent of the language it is embedded in; it adapts equally well to functional, imperative, object-oriented, and algebraic programming, and it is not incompatible with relational and concurrent programming."
[1] Luca Cardelli, Typeful Programming, 1991. http://www.lucacardelli.name/Papers/TypefulProg.pdf
[2] https://news.ycombinator.com/item?id=18872535
https://en.wikipedia.org/wiki/Strongly_typed_identifier
> The strongly typed identifier commonly wraps the data type used as the primary key in the database, such as a string, an integer or universally unique identifier (UUID).
To keep building on history, I'd suggest Hungarian types.
Meta-programming also introduced a notation which was the precursor to Hungarian Notation (page 44,45), so painted types technically pre-date Hungarian Notation.
https://web.archive.org/web/20170313211616/http://www.parc.c...
Relevant quote:
> These examples show that the idea of types is independent of how the objects belonging to the type are represented. All scalar quantities appearing above - column numbers, indices and so forth, could be represented as integers, yet the set of operations defined for them, and therefore their types, are different. We shall denote the assignment of objects to types, independent of their representations, by the term painting. When an object is painted, it acquires a distinguishing mark (or color) without changing its underlying representation. A painted type is a class of values from an underlying type, collectively painted a unique color. Operations on the underlying type are available for use on painted types as the operations are actually performed on the underlying representation; however, some operations may not make sense within the semantics of the painted type or may not be needed. The purpose of painting a type is to symbolize the association of the values belonging to the type with a certain set of operations and the abstract objects represented by them.
Relevant terms are "Value object" (1) and avoiding "Primitive obsession" where everything is "stringly typed".
Strongly typed ids should be Value Objects, but not all value objects are ids. e.g. I might have a value object that represents an x-y co-ordinate, as I would expect an object with value (2,3) to be equal to a different object with the same value.
1) https://martinfowler.com/bliki/ValueObject.html
https://en.wikipedia.org/wiki/Value_object
https://beartype.readthedocs.io/en/latest/
Or see their page about performance: https://beartype.readthedocs.io/en/latest/faq/#faq-realtime
See for example in wdoc, my advanced personal RAG system:
https://github.com/thiswillbeyourgithub/wdoc/blob/main/wdoc/...
See for example in wdoc, my advanced personal RAG library:
https://github.com/thiswillbeyourgithub/wdoc/blob/main/wdoc/...
So you can have for example:
``` import numpy as np
def func(inputA: np.ndarray, inputB: np.ndarray) -> np.ndarray: return np.concat(inputA, inputB) ```
But then you want to modify the code for some reason way later and do this:
``` def func(inputA: np.ndarray, inputB: np.ndarray) -> np.ndarray: if len(inputB.shape) == 1: return np.concat(inputA, inputB) else: return inputB ```
Now imagine that inputB is directly received from some library you imported. So pyright might not be able to check its type and inputB is actually a List. Then you will never get a crash in the version 1. But its types are wrong as inputB is a List.
The version 2 on the other hand will crash as List don't have a shape attribute. Notice also how func returns inputB, propagating the wrong type.
Sure that means the code still works until you modify version 1, but any developpers or LLM that reads func would get the wrong idea about how to modify such code. Also this example is trivial but it can become much much more complicated of course.
This would not be caught in pyright but beartype would. I'm basically using beartype absolutely everywhere I can and it really made me way more sure of my code.
If you're not convinced I'm super curious about your reasoning!
PS: also try/except in python are extremely slow so figuring out types in advance is always AFAIK a good idea performance wise.
Basically it's pure python ultra optimized code that calls "isinstance(a, b)" all the time everywhere. If there is a mismatch it crashes.
Note that you can also set it to warn instead of crash.
[1] https://www.velopen.com/blog/adding-type-safety-to-object-id...
In the example, they are (it seems) converting between Celsius and Fahrenheit, using floating point. There is the possibility of minor rounding errors, although if you are converting between Celsius and Kelvin with integers only then these rounding errors do not occur.
In some cases, a function might be able to work with any units as long as the units match.
> Public and even private functions should often avoid dealing in floats or integers alone
In some cases it makes sense to use those types directly, e.g. many kind of purely mathematical functions (such as checking if a number is prime). When dealing with physical measurements, bit fields, ID numbers, etc, it does make sense to have types specifically for those things, although the compiler should allow to override the requirement of the more specific type in specific cases by an explicit operator.
There is another article about string types, but I think there is the problem of using text-based formats, that will lead to many of these problems, including needing escaping, etc.
https://github.com/Mk-Chan/libchess/blob/master/internal/Met... https://github.com/Mk-Chan/libchess/blob/master/Square.h
The same argument gets brought up in favor of dynamic typing. The point of typing is that you don't need all those repetitive tests.
Moreover, the coding feedback loop gets shorter since there's no need to wait until the tests run to find out a string was passed in instead of an int (or UserID).
[1] https://typing.python.org/en/latest/spec/aliases.html
There is no duck, just primitive types organized duck-wise.
The sooner you embrace the truth of mereological nihilism the better your abstractions will be.
Almost everything at every layer of abstraction is structure.
Understanding this will allow you to still use types, just not abuse them because you think they are "real".
> The Ecstasy type system is called the Turtles Type System, because the entire type system is bootstrapped on itself, and -- lacking primitives -- solely on itself. An Int, for example, is built out of an Array of Bit, and a Bit is built out of an IntLiteral (i.e. 0 or 1), which is built out of a String, which is an Array of Char, and a Char is built out of an Int. Thus, an Int is built out of many Ints. It's turtles, the whole way down.
[1]:https://xtclang.blogspot.com/2019/06/an-introduction-to-ecst...
I think it's a much better idea to do:
Now clients of `UserID` no longer knows anything about the representation. Like with the original approach you need a bit of casting, but that can be neatly encapsulated as it would be in the original approach anyway.The examples were a bit less contrived than this, encoding business rules where you'd want nickname for most UI but real name for official notifications, and the type system prevented future devs from using the wrong one when adding new UI or emails.
Especially and particularly attributes/fields/properties in an enterprise solution.
You want to associate various metadata - including at runtime - with a _value_ and use that as attribute/field/property in a container.
You want to be able to transport and combine these values in different ways, especially if your business domain is subject to many changes.
If you are tempted to use "classes" for this, you will sign up for significant pain later down the road.
Moreover: you can separate types based on admitted values and perform runtime checks. Percentage, Money, etc.
Supoose you make two simple types one for Kelvin K and the other for Fahrenheit F or degrees D.
And you implement the conversions between them in the types.
But then you have something like
d: D = 10;
For i=1...100000:
endThen you will implicitly have many many automatic conversions that are not useful. How to handle this? Is it easily catched by the compiler when the functions are way more complex?
The downside to adding the SI into the type system is the SI is not a sound type system. For example:
The type system will permit this: using torque where energy is expected, and vice-versa, because they have the same SI unit, but they don't represent the same thing, and ideally it should be rejected. A potential improvement is to include Siano's Orientational Analysis[1], which can resolve this particular unsoundness because the orientations of Nm and J would be incompatible.[1]:https://www.doc88.com/p-9099799188322.html
My response is: these conversions are unlikely to be the slow step in your code, don’t worry about it.
I do agree though, that it would be nice if the compiler could simplify the math to remove the conversions between units. I don’t know of any languages that can do that.
For example, it's not my case but it's like having to convert between two image representations (matrix multiply each pixel) every time.
I'm scared that this kind of 'automatic conversion' slowness will be extremely difficult to debug and to monitor.
On your case about swapping between image representations: let’s say you’re doing a FFT to transform between real and reciprocal representations of an image - you probably have to do that transformation in order to do the the work you need doing on reciprocal space. There’s no getting around it. Or am I misunderstanding?
Please don’t take my response as criticism, I’m genuinely interested here, and enjoying the discussion.
When I tried to refactor using types, this kind of problems became obvious. And forced more conversions than intended.
So I'm really curious because, a part from rewriting everything, I don't see how to avoid this problem. It's more natural for some applications to have the data format 1 and for others the data format 2. And forcing one over the other would make the application slow.
The problem arises only in 'hybrid' pipelines when new scientist need to use some existing functions some of them in the first data format, and the others in the other.
As a simple example, you can write rotations in a software in many ways, some will use matrix multiply, some Euler angles, some quaternions, some geometric algebra. It depends on the application at hand which one works the best as it maps better with the mental model of the current application. For example geometric algebra is way better to think about a problem, but sometimes Euler angles are output from a physical sensor. So some scientists will use the first, and the others the second. (of course, those kind of conversions are quite trivial and we don't care that much, but suppose each conversion is very expensive for one reason or another)
I didn't find it a criticism :)
I agree that would be a good solution, despite that my data is huge, but it assumes the data doesn't change, or doesn't change that much.
Also relevant https://refactoring.guru/smells/primitive-obsession
That refactoring guru raccoon reminds me of Minix for some reason.
Being forced to think early on types has a payoff at the medium complexity scale
But if you have a function that works with different types you should make it more reusable.
It’s a good marker to yourself or to a review agent
I think Rich Hickey has a point that bugs like this almost certain get caught by running the program. If they make it into production it usually results in an obscure edge case.
I’m sure there are exceptions but unless you’re designing for the worst case (safety critical etc) rather than average case (web app), types come with a lot of trade offs.
I’ve been on the fence about types for a long time, but having built systems fast at a startup for years, I now believe dynamic typing is superior. Folks I know who have built similar systems and are excellent coders also prefer dynamic typing.
In my current startup we use typescript because the other team members like it. It does help replace comments when none are available, and it stops some bugs, but it also makes the codebase very hard to read and slows down dev.
A high quality test suite beats everything else hands down.
No types anywhere, so making a change is SCARY! And all the original engineers have usually moved on. Fun times. Types are a form of forced documentation after all, and help catch an entire class of bugs. If you’re really lucky, the project has good unit tests.
I think dynamic typing is wonderful for making software quickly, and it can be a force multiplier for startups. I also enjoy it when creating small services or utilities. But for a large web app, you’ll pay a price eventually. Or more accurately…the poor engineer that inherits your code in 10 years will pay the price. God bless them if they try to do a medium sized refactor without types lol. I’ve been on both ends of the spectrum here.
Pros and cons. There’s _always_ a tradeoff for the business.
But most startups aren’t building for 10 years out. If you use a lot of typing, you’ll probably die way before then. But yeah if you’re building a code base for the long term then use types unless you’re disciplined enough to write comments and good code.
As for refactoring, that is exactly what test suites are for.
That is certainly correct... but that doesn't make it a good thing. One wants to catch bugs before the program is running, not after.
I understand the line of reasoning here, but the examples are bad. Those aren't good reasons to introduce new types. If you follow this advice, you'll end up with an insufferable codebase where 80% LoC is type casting.
Types are like database schemas. You should spend a lot of time thinking about semantics, not simply introduce new types because you want to avoid (hypothetical) programmer errors.
"It is better to have 100 functions operate on one data structure than to have 10 functions operate on 10 data structures."
[0]: https://wiki.c2.com/?PrimitiveObsession
V.S. an instant compile error.
I have seen errors like this many times, both written by myself and others. I think it's great to use the type system to eliminate this class of error!
(Especially in languages like Go that make it very low-friction to define the newtype.)
Another benefit if you're working with any sort of static data system is it makes it very easy to validate the data -- e.g. just recursively scan for instances of FooId and make sure they are actually foo, instead of having to write custom logic or schema for everywhere a FooId might occur.
The compiler tests the type is correct wherever you use it. It is also documentation.
Still have tests! But types are great.
But sadly, in practice I don't often use a type per ID type because it is not idiomatic to code bases I work on. It's a project of its own to move a code base to be like that if it wasn't in the outset. Also most programming languages don't make it ergonomic.
Coming from C++, this kind of types with classes make sense. But also are a maintenance task with further issues, were often proper variable naming matters. Likely a good balance is the key.
That is, this is perfectly acceptable C:
In Go the equivalent would be an error because it will not, automatically, convert from one type to another just because it happens to be structurally identical. This forces you to be explicit in your conversion. So even though the type happens to be an int, an arbitrary int or other types which are structurally ints cannot be accidentally converted to a myId unless you somehow include an explicit but unintended conversion.This helped me! Especially because you started with typedef from C. Therefore I could relate. Others just downvote and don't explain.
A strong enough type system would be a lot more useful.
To the best of my knowledge, the earliest specific term for the concept is painted types (Simonyi, 1976)[0], but I believe this was a new term for a concept that was already known. Simonyi himself quotes (Hoare, 1970)[1]. Hoare doesn't provide a specific term like painted type for the type being defined, but he describes new types as being built from constintuent types, where a singular constituent type is known as the base type.
Simonyi uses the term underlying type rather than base type. Hoare eludes to painted types by what they contain - though he doesn't explicitly require that the new type has the same representation as it's base type - so they're not necessarily equivalent, though in practice they often are. Simonyi made this explicit, which is what we expect from newtype in Haskell - and specifically why we'd specifically use `newtype` instead of `data` with a single (base) constituent.
If you're aware of any other early references, please share them.
---
[0]:https://web.archive.org/web/20170313211616/http://www.parc.c...
> These examples show that the idea of types is independent of how the objects belonging to the type are represented. All scalar quantities appearing above - column numbers, indices and so forth, could be represented as integers, yet the set of operations defined for them, and therefore their types, are different. We shall denote the assignment of objects to types, independent of their representations, by the term painting. When an object is painted, it acquires a distinguishing mark (or color) without changing its underlying representation. A painted type is a class of values from an underlying type, collectively painted a unique color. Operations on the underlying type are available for use on painted types as the operations are actually performed on the underlying representation; however, some operations may not make sense within the semantics of the painted type or may not be needed. The purpose of painting a type is to symbolize the association of the values belonging to the type with a certain set of operations and the abstract objects represented by them.
[1]:https://www.cs.cornell.edu/courses/cs4860/2018fa/lectures/No...
> In most cases, a new type is defined in terms of previously defined constituent types; the values of such a new type are data structures, which can be built up from component values of the constituent types, and from which the component values can subsequently be extracted. These component values will belong to the constituent types in terms of which the structured type was defined. If there is only one constituent type, it is known as the base type.
Also, you can still do integer things with them, such as
> nonsense = UserId(1) + UserId(2)
from typing import NewType
UserId = NewType('UserId', int) some_id = UserId(524313)
(Typescript's Zed and Clojure's Malli are counterexamples. Although not official offerings)
Following OP's example, what prevents you from getting a AccountID parsed as a UserID at runtime, in production? In production it's all UUIDs, undistinguishable from one another.
A truly safe approach would use distinct value prefixes – one per object type. Slack does this I believe.
That's part of the point of being static. If we can statically determine properties of the system and use that information in the derived machine code (or byte code or whatever), then we may be able to discard that information at runtime (though there are reasons not to discard it).
> Following OP's example, what prevents you from getting a AccountID parsed as a UserID at runtime, in production? In production it's all UUIDs, undistinguishable from one another.
If you're receiving information from the outside and converting it into data in your system you have to parse and validate it. If the UUID does not correspond to a UserID in your database or whatever, then the attempted conversion should fail. You'd have a guard like this:
Static typing is just a tool, aiming to help with a subset of all possible problems you may find. If you think it's an absolute oracle of every possible problem you may find, sorry, that's just not true, and trivially demonstrable.
Your example already is a runtime check that makes no particular use of the type system. It's a simple "set contains" check (value-oriented, not type-oriented) which also is far more expensive than simply verifying the string prefix of a Slack-style object identifier.
Ultimately I'm not even saying that types are bad, or that static typing is bad. If you truly care about correctness, you'd use all layers at your disposition - static and dynamic.