> Language designers who studied the async/await experience in other ecosystems concluded that the costs of function coloring outweigh the benefits and chose different paths.
Not really. The author provides Go as evidence, but Go's CSP-based approach far predates the popularity of async/await. Meanwhile, Zig's approach still has function coloring, it's just that one color is "I/O function" and the other is "non-I/O function". And this isn't a problem! Function coloring is fine in many contexts, especially in languages that seek to give the user low-level control! I feel like I'm taking crazy pills every time people harp about function coloring as though it were something deplorable. It's just a bad way of talking about effect systems, which are extremely useful. And sure, if you want to have a high-level managed language like Go with an intrusive runtime, then you can build an abstraction that dynamically papers over the difference at some runtime cost (this is probably the uniformly correct choice for high-level languages, like dynamic or scripting languages (although it must be said that Go's approach to concurrency in general leaves much to be desired (I'm begging people to learn about structured concurrency))).
CSP is a theory about synchronization and implies nothing about green threads or M:N scheduling. Go could have used OS threads and called it CSP.
Certainly it’s true that Go invented neither, both Erlang and Haskell had truly parallel green threads without function coloring before Go or Node existed.
I agree with you, but the big difference between function arguments and effect systems is that the tools we have for composing functions with arguments are a lot simpler to deal with than the tools we have for composing effects.
You could imagine a programming language that expressed “comptime” as a function argument of a type that is only constructible at compile-time. And one for runtime as well, and then functions that can do both can take the sum type “comptime | runtime”.
That is an unfair characterization of Zig. The OP correctly points out:
> Function signatures don’t change based on how they’re scheduled, and async/await become library functions rather than language keywords.
The functions have the same calling conventions regardless of IO implementation. Functions return data and not promises, callbacks, or futures. Dependency injection is not function coloring.
> The functions have the same calling conventions regardless of IO implementation.
Okay, then by that definition, Rust doesn't have colored functions either, because `async fn foo() -> Bar {` in Rust is just syntax sugar for `fn foo() -> impl Future<Output=Bar> {` (that's a normal function that returns a future), and `Future` is just a normal trait that provides a `poll` method; there's no different calling convention anywhere. So which is it? Either Rust and Zig both have colored functions, or neither do.
These things _are_ function colouring, but they show function colouring isn't scary or hard.
The original function colouring essay was much more about JavaScript's implementation than a general statement.
If JavaScript had exposed a way for a synchronous function to call back into the runtime to wait for an async function to complete, it would still be just as coloured, but no one would be complaining about colour (deadlocks yes, but that's another kettle of fish).
I mean Java's Loom feels like the 'ultimate' example of the latter for the _ordinary_ programmer, in that it effectively leaves you just doing what looks like completely normal threads however you so please, and it all 'just works'.
Java had green threads in 1997, removed them in 2000 and brought them back properly now as virtual threads.
I'm kinda glad they've sat out the async mania, with virtual threads/goroutines, the async stuff just feels like lipstick on a pig. Debugging, stacktrackes etc. are just jumbled.
They are not perfectly fine. If a task panics then you will get the right stack trace, but there is no way to get a stack trace for a task that’s currently waiting. (At least not without intrusive hacks.)
Java didn't really "sit it out". It launched CompletableFutures, CompletionStages, Sources and Sinks, arguably even streams. All of those are standard library forms of async programming. People tried to make it catch on, but the experience of using it, The runtime wrapping all your errors in completion exceptions, destroying your callstacks, just made it completely useless.
Every Java codebase using something like Flux serves as a datapoint in favor of this argument - they're an abomination to read, reason about or (heaven help) debug.
I'm curious how escape analysis works with virtual threads. With the asynchronous model, an object local to a function will be migrated to the old generation heap while the external call gets executed. With virtual threads I imagine the object remains in the virtual thread "stack", therefore reducing pressure in garbage collection.
The initial Loom didn't really provide the semantics and ergonomics of async/await which is why they immediately started working on structured concurrency.
And for my money I prefer async/await to the structured concurrency stuff..
In my experience people complain about it because they are coming from a blocking first mindset. They're trying to shoehorn async calls into an inherently synchronous structure.
A while back I just started leaning in. I write a lot of Python at work, and anytime I have to use a library that's relies on asyncio, I just write the entire damn app as an asynchronous one. Makes function coloring a non-issue. If I'm in a situation where the two have to coexist, the async runtime gets its own thread and communication back and forth is handled at specific boundaries.
>In my experience people complain about it because they are coming from a blocking first mindset. They're trying to shoehorn async calls into an inherently synchronous structure.
There's no "inherently synchronous structure", at least not in Javascript. The nature is synchronous, asynchronous is an illusion built on top of it. Which is why you can easily block an "asynchronous" program:
while (true) {}
on any async function will do.
JavaScript execution is synchronous on a single call stack. That's why they added Workers which is different to async.
Rust's Tokio and co are also blocking. You need threads to get something that's not an inherently synchronous with merely a facade or cooperative asychronicity.
> They're trying to shoehorn async calls into an inherently synchronous structure.
You can make any async system synchronous. It's much harder to mske a sync sydtem asynchronous. (Misquoting from something Erlang-related).
There are many cases when I don't care if a function call is asynchronous. I'm happy to wait for the result. Yet too many systems tell me I can't, for no good reason.
Unfortunately, asynchronous programming is almost always discussed in terms of network I/O. However, there are many more use cases for concurrency. Coroutines can be extremely useful for modelling state machines or any kind of process that happens over time. Everytime you need to do X, then wait for N (milli)seconds, then do Y, etc., coroutines provide a very ergonomic solution. If your language supports stackful coroutines (e.g. Lua or Ruby), you don't even need to color your functions: you can just write regular functions that yield back to the scheduler anywhere in the call stack.
To give a concrete example: computer music languages, such as SuperCollider, need concurrency to implement musical scheduling. Imagine a musical sequence where you play a note, wait N beats, play another note, etc. Often you want to play many such sequences simultaneously. Stackful coroutines provide a very elegant solution to this problem. Every independent musical sequence can be modelled by a coroutine that yields everytime it needs to wait. The yielded value is interpreted by the scheduler as a delta time after which the function should be resumed. In this sense, SuperCollider users have been doing async programming since the early 2000s, long before it became mainstream.
Having lived through the changes from callback hell, early promises and then async/await I only ever found each step an improvement and the negatives are very minor when actually working with them.
Now function colouring is interesting but not for the reason these articles get excited. Recolouring is easy and has basically no impact on code maintenance. BUT if you need that code path to really fly then marking it as async is a killer, as all those tiny little promises add tiny delays in the form of many tasks. Which add up to performance problems on hot code paths. This is particularly frustrating if functions are sometimes async, like lazy loaders or similar cache things. To get around this you can either use callbacks instead or use selective promise chaining to only use promises when you get a promise. Both strategies can be messy and trip up people who don’t understand these careful design decisions.
One other fun thing is indexeddb plays terribly with promises, as it uses a “transactions close at end of task” mechanism, making certain common patterns impossible with promises due to how they behave with the task system. Although some API designers have come up with ways around this to give you promise interfaces for databases. Normally by using callbacks internally and only doing one operation per transaction.
> all those tiny little promises add tiny delays in the form of many tasks.
That depends on the language/framework. In some languages, `await foo()` is equivalent to `Future f = foo(); await f`. In others (e.g. Python), it's a primitive operation and you have to use a different syntax if you want to create a future/task. In Trio (an excellent Python alternative to asyncio), there isn't even the concept of a future at all!
> all those tiny little promises add tiny delays in the form of many tasks
Is this because the functions are async or is that because most of the time async is used for things that are I/O like and therefore susceptible to these kinds of delays?
I will agree - async rust on an operating system isn’t all that impressive - it’s a lot easier to just have well defined tasks and manually spawn threads to do the work.
However, in embedded rust async functions are amazing! Combine it with a scheduler like rtic or embassy, and now hardware abstractions are completely taken care of. Serial port? Just two layers of abstraction and you have a DMA system that shoves bytes out UART as fast as you can create them. And your terminal thread will only occupy as much time as it needs to generate the bytes and spit them out, no spin locking or waiting for a status register to report ready.
you’re talking to the core of the issue. In no other language did they try to satisfy the case of running on an embedded system vs general purpose computing. Async rust tried to, and came up with a solution that is not great for the majority of programmers writing rust.
I wish to God that the rust library devs would admit to this fact - say that async rust should stay for embedded runtimes usecases, but we shouldn’t be forcing async across the majority of general purpose computing libraries. It’s just not a pleasant experience to write nor read. And it really doesn’t give any performance benefits.
I write reams of async rust for a living, and completely disagree with this characterization. The concurrency primitives in the futures crate are able to elegantly model the large majority of places where concurrency is needed, and they are nicely compostable.
More than once, we have wanted to improve the performance of some path and been able to lift the sequential model into a stream, evaluated concurrently with some max buffer size. From there, converting to true parallel execution is just a matter of wrapping the looped futures in Tasks.
Obviously just sprinkling an async on it isn’t going to make anything faster (it just converts your function into a state-machine generator that then needs to be driven to completion). But being able to easily and progressively move code from sequential to concurrent to parallel execution makes for significant performance gains.
Despite my panning of async elsewhere on this thread, I agree with you here. Embassy is a thing of beauty and a great use of Rust's async. Much of my embedded career was bogged down managing a pile of state machines. With async/await and embassy, that just goes away.
The discussion around async await always focuses on asynchronous use-cases, but I see the biggest benefits when writing synchronous code. In JS, not having await in front of a statement means that nothing will interfere with your computation. This simplifies access to shared state without race conditions.
The other advantage is a rough classification in the type system. Not marking a function as async means that the author believes it can be run in a reasonable amount of time and is safe to run eg. on a UI main thread. In that sense, the propagation through the call hierarchy is a feature, not a bug.
I can see that maintaining multiple versions of a function is annoying for library authors, but on the other hand, functions like fs.readSync shouldn’t even exist. Other code could be running on this thread, so it's not acceptable to just freeze it arbitrarily.
Maybe I am missing something. But the function coloring problem is basically the tension that async can dominate call hierarchies and the sync code in between looses it's beneficial properties to a degree. It's at least awkward to design a system that smoothly tries to blend sync that executes fast and async code that actually requires it.
Saying that fs.readSync shouldn't exist is really weird. Not all code written benefits from async nor even requires it. Running single threaded, sync programs is totally valid.
'readSync' does two different things - tells the OS we want to read some data and then waits for the data to be ready.
In a good API design, you should exposed functions that each do one thing and can easily be composed together. The 'readSync' function doesn't meet that requirement, so it's arguably not necessary - it would be better to expose two separate functions.
This was not a big issue when computers only had a single processor or if the OS relied on cooperative multi-threading to perform I/O. But these days the OS and disk can both run in parallel to your program so the requirement to block when you read is a design wart we shouldn't have to live with.
The application in question is frozen for that period though, that's the wait they're referring to.
Even websites had this problem with freezing the browser in the early AJAX days, when people would do a synchronous XMLHttpRequest without understanding it.
he was referring to fs.readSync (node) which has also has fs.read, which is async. there is also no parallelism in node.
i don't see it as very useful or elegant to integrate any form for parallelism or concurrency into every imaginable api. depends on context of course. but generalized, just no. if a kind of io takes a microsecond, why bother.
Sync options are useful. If everything is on the net probably less so. But if you have a couple of 1ms io ops that you want to get done asap, it's better to get them done asap.
You can definitely have a race condition in JS. Being single-threaded means you don't have parallelism, but you still have concurrency, and that's enough to have race conditions. For example you might have some code that behaves differently depending on which promise resolves first.
Sure, but concurrent != parallel. You can't have data races with a single thread of execution - a while loop writing i=0 or i=1 on each iteration is not a data race.
Two async functions doing so is not a data race either.
You should really look up the definition of race condition; it has nothing to do with parallel processing. Parallel processing just makes it harder to deal with.
You're mixing up quite a few somewhat related but different concepts: data races, race conditions, concurrency and parallelism.
Concurrency is needed for race conditions, parallelism is needed for data races. Many single threaded runtimes including JS have concurrency, and hence the potential for race conditions, but don't have parallelism and hence no data races.
Concurrency with a single thread of execution runs with complete mutual exclusion, so no "pure" single threaded concurrency is definitely race condition free.
What we may argue over (and it becomes more of a what definition to use): IO/external event loop/signal handlers. These can cause race conditions even in a single threaded program, but one may argue (this is sort of where I am) that these are then not single threaded. The kernel IO operation is most definitely not running on the same thread of execution as JS.
I think I have been fairly consistent in the definition of a data race as a type of race condition, where a specific shared memory is written to while other(s) read it with no synchronization mechanism. This can be safe (most notably OpenJDK's implementation is tear-free, no primitive or reference pointer may ever be observed as a value not explicitly set by a writer), or unsafe (c/c++/rust with unsafe, surprisingly go) where you have tearing and e.g. a pointer data race can cause the pointer to appear as a value that was never set by anyone, causing a segfault or worse.
> OS threads are expensive: an operating system thread typically reserves a megabyte of stack space and takes roughly a millisecond to create.
It's typically less than a hundred kilobytes and (on the systems I've benchmarked using std::thread) it takes 60usec (wall time in userspace) to create and destroy a thread.
Threads have gotten so fast that paying the async function coloring price makes very little sense for most software.
Its the stack space allocated to each thread that prevents you from spawning more than a thousand threads. Strategies like a thread per network connection do not scale.
I regularly spawn several thousands of threads in the C++ servers I write, and they perform well. At least 40% of FAANG companies just reduce the size of their per-thread stacks. "Thread per-connection" works just fine, and when you need to go faster, thread pools work even better without coloring all your functions.
There is an extremely small set of domains where simple threading doesn't suffice, and async/await is too high a price to pay across the entire software ecosystem just to slightly optimize those domains.
> 1.On a system that is handling 10k concurrent requests, the 10GB of RAM is going to be a fraction of what is installed.
My example (and the c10k problem) is 10k concurrent connections, not 10k concurrent requests.
> 2. It's not 10GB of RAM anyway, it's 10GB of address space. It still only gets faulted into real RAM when it gets used.
Yes, and that's both memory and cpu usage that isn't needed when using a better concurrency model. That's why no high-performance server software use a huge amount of threads, and many use the reactor pattern.
> Yes, and that's both memory and cpu usage that isn't needed
No, it literally is not. The "memory" is just entries in a page table in the kernel and MMU. It shouldn't worry you at all.
Nor is the CPU used by the kernel to manage those threads going to be necessarily less efficient than someone's handrolled async runtime. In fact given it gets more eyes... likely more.
The sole argument I can see is just avoiding a handful of syscalls and excessive crossing of the kernel<->userspace brain blood barrier too much.
> > Yes, and that's both memory and cpu usage that isn't needed
No, it literally is not. The "memory" is just entries in a page table in the kernel and MMU. It shouldn't worry you at all.
Only if you never free one of those stacks. TLB flushes can be quite expensive.
Except if those threads are actually faulting in all of that memory and making it resident, they'd be doing the same thing, just on the heap, for a classic async coroutine style application.
You don't pay for stack space you don't use unless you disable overcommit. And if you disable overcommit on modern linux the machine will very quickly stop functioning.
The amount of stack you pay for on a thread is proportional to the maximum depth that the stack ever reached on the thread. Operating systems can grow the amount of real memory allocated to a thread, but never shrink it.
It’s a programming model that has some really risky drawbacks.
> Operating systems can grow the amount of real memory allocated to a thread, but never shrink it.
Operating systems can shrink the memory usage of a stack.
madvise(page, size, MADV_DONTNEED);
Leaves the memory mapping intact but the kernel frees underlying resources. Subsequent accesses get either new zero pages or the original file's pages.
Linux also supports mremap, which is essentially a kernel version of realloc. Supports growing and shrinking memory mappings.
Whether existing systems make use of this is another matter entirely. My language uses mremap for growth and shrinkage of stacks. C programs can't do it because pointers to stack allocated objects may exist.
> C programs can't do it because pointers to stack allocated objects may exist.
They sure shouldn't exist to the unused region of the stack though; if they do, that's a bug (because anything could claim that memory now). You should be free and clear to release stack pages past your current stack pointer.
Stack memory is never unmapped until the thread terminates as far as I know. I don’t know of any kernel that does this, for precisely the reason you arrive at by the very last sentence.
Yeah, none of this makes sense to me. Allocating memory for stack space is not expensive (and the default isn't even 1MB??) because you're just creating a VMA and probably faulting in one or two pages.
They also say:
>The system spends time managing threads that could be better spent doing useful work.
What do they think the async runtime in their language is doing? It's literally doing the same thing the kernel would be doing. There's nothing that intrinsically makes scheduling 10k couroutines in userspace more efficient than the kernel scheduling 10k threads. Context switches are really only expensive when the switch is happening between different processes, the overhead of a context switch on a CPU between two threads in the same process is very small (and they're not free when done in userspace anyway).
There are advantages to doing scheduling in the kernel and there are advantages to doing scheduling in userspace, but this article doesn't really touch on any of the actual pros and cons here, it just assumes that userspace scheduling is automatically more efficient.
Not the runtime per se, but cooperative scheduling has the advantage that tasks do not yield at adverse code points, e.g., right before giving up a lock, or performing an I/O request. Of course the lack of preemption has it's own downsides, but with thread-per-request you tend to run into tail latency issues much earlier than context switching overhead.
It's a cargo cult and a bias I see all over the place.
I feel like we're now, what, 20, 25 years on and people still haven't adjusted themselves to the fact that the machines we have now are multicore, have boatloads of cache, or how that cache is shared (or not) between cores.
Nor is there apparently a real understanding of the difference between VSS and RSS.
Nor of the fact that modern machines are really really fast if you can keep stuff in cache. And so you really should be focused on how you can make that happen.
1 megabyte stacks mean ten thousand threads require 10 gigabytes of RAM just for the stacks. The entire point of the asynchronous programming paradigm is to reclaim all of those gigabytes by not allowing stacks to develop at all, by stealthily turning everything into a hidden form of cooperative multitasking instead.
While that's strictly true, resident memory in this context is a function of worst case memory usage by the code executing on those stacks. Seems wise to assume worst case performance when discussing this.
The program could use one page's worth of stack space, which is optimal. The program could use like 200 bytes of stack space, which wastes the rest of the page. The program could recurse all the way to 9.9 MB of stack usage, stop just before overflow and then unwind back to constant 200 bytes stack space usage, and never touch all those pages ever again.
The author doesn't fully justify the assertion but it does have sound basis.
While virtual memory allocation does not require physical allocation, it immediately runs into the kinds of performance problems that huge pages are designed to solve. On modern systems, you can burn up most of your virtual address space via casual indifference to how it maps to physical memory and the TLB space it consumes. Spinning up thousands of stacks is kind of a pathological case here.
10µs is an eternity for high-performance software architectures. That is also around the same order of magnitude as disk access with modern NVMe. An enormous amount of effort goes into avoiding blocking on NVMe disk access with that latency for good reason. 10µs is not remotely below the noise floor in terms of performance.
> Why is reserving a megabyte of stack space "expensive"?
Guess it's not a huge issue in these 64-bit days, but back in the 32-bit days it was a real limitation to how many threads you could spin up due to the limited address space.
Of course most applications which hit this would override the 1MB default.
There's much ridiculous hatred for OS threads based on people's biases of operating systems and hardware from 20 years ago.
So much so that they'll sign themselves up for async frameworks that thread steal at will and bounce things all over cores causing cache line bouncing and associated memory stalls, not understanding what this is doing to their performance profile.
And endure complexity, etc. through awkward async call chains and function colouring.
Most people's applications would be totally fine just spawning OS threads and using them without fear and dropping into a futex when waiting on I/O; or using the kernel's own async completion frameworks. The OS scheduler is highly efficient, and it is very good at managing multiple cores and even being aware asymmetrical CPU hierarchies, etc.. Likely more efficient than half the async runtimes out there.
I don't think those benches are much of a flex, even by the author's own description you'd be fine with any of them. They all have acceptable performance and don't show any order of magnitude differences or non-linear scaling problems.
Further, the benches that are showing best there are non-thread-stealing scenarios, not tokio.
I also suspect simply tuning the thread-based workloads more aggressively would have the same effect.
When I profile high throughput tokio applications there's way too much contention on shared atomics, mostly inside tokio's scheduler itself. On lower core count machines and where the workload is I/O heavy, this is probably fine. So, yes, web servers.
But I'm very interested in applications that scale on machines with lots of cores and where CPU is a large part of the equation.
How many systems are there that can't just spawn a thread for each task they have to work on concurrently? This has to be a system that is A) CPU or memory bound (since async doesn't make disk or network IO faster) and B) must work on ~tens of thousands of tasks concurrently, i.e. can't just queue up tasks and work on only a small number concurrently. The only meaningful example I can come up with are load balancers, embedded software and perhaps something like browsers. But e.g. an application server implementing a REST API that needs to talk to a database anyway to answer each request doesn't really qualify, since the database connection and the work the database itself does are likely much more resource intensive than the overhead of a thread.
I'm not sure this is correct mental model of what async solves
Async precisely improves disk/network I/O-bound applications because synchronous code has to waste a whole thread sitting around waiting for an I/O response (each with its own stack memory and scheduler overhead), and in something like an application server there will be many incoming requests doing so in parallel. Cancellation is also easier with async
CPU-bound code would not benefit because the CPU is already busy, and async adds overhead
I have some test code that runs a comparison of Hyper pre-async (aka thread per request) vs async (via Tokio), and the pre-async version is able to process more requests per second in every scenario (I/o, CPU complex tasks, shared memory).
I'll publish my results shortly. I did these as baselines because I'm testing finishing the User Managed Concurrency Groups proposal to the linux kernel which is an extension to provide faster kernel threads (which beat both of them)
The UMCG implementation allows kernel thread context switches to happen in 150-200 microseconds, compared to the 1500-2000 microseconds for normal kernel thread context switches. My goal is to show that if UMCG could be merged into the Linux run time then then it would be competitive with async rust without the headache.
I'll have to check my work computer on Monday. It was 8 cpu virtual machine on a m1 Mac. the UMCG and normal threads were 1024 set on the server, the Tokio version was 2 threads per core. Just from the top of my head - the I/O bound requests topped out around 40k/second for the Tokio version, 60k/second for the normal hyper version, and 80k/second for the UMCG hyper version.
I'm pretty close to being done - I'm hoping to publish the entire GitHub repository with tests for the community to validate by next week.
UMCG is essentially an open source version of Google Fibers, which is their internal extension to the linux core for "light weight" threads. It requires you to build a user space scheduler, but that allows you to create different types of schedulers. I can not remember which scheduler showed ^ results but I have at least 6 different UMCG schedulers I was testing.
So essentially you get the benefits of something like tokio where you can have different types of schedulers optimized for different use cases, but the power of kernel threads which means easy cancellation, easy programming (at least in rust). It's still a linux thread with an entire 8mb(?) stack size, but from my testing it's far faster than what Tokio can provide, without the headache of async/await programming.
I read this argument ("async is for I/O-bound applications") often, but it makes no sense to me. If your app is I/O bound, how does reducing the work the (already idling!) CPU has to spend on context switching improve the performance of the system?
IO bound might mean latency but not throughput, so you can up concurrency and add batching, both of which require more concurrent requests in flight to hit your real limit. IO bound might also really mean contention for latches on the database, and different types of requests might hit different tables. Basically, I see people say they're IO bound long before they're at the limit of a single disk, so obviously they are not IO bound. Modern drives are absurdly fast. If everyone were really IO bound, we'd need 1/1000 the hardware we needed 10-15 years ago.
It sounds like you're assuming both pieces are running on the same server, which may not be the case (and if you're bottlenecked on the database it probably shouldn't be, because you'd want to move that work off the struggling database server)
Assuming for the sake of argument that they are together, you're still saving stack memory for every thread that isn't created. In fact you could say it allows the CPU to be idle, by spending less time context switching. On top of that, async/await is a perfect fit for OS overlapped I/O mechanisms for similar reasons, namely not requiring a separate blocking thread for every pending I/O (see e.g. https://en.wikipedia.org/wiki/Overlapped_I/O, https://stackoverflow.com/a/5283082)
Right, I think the argument should be that transitioning from a synchronous to asynchronous programming model can improve the performance of a previously CPU/Memory-bound system so that it saturates the IO interface.
If the system is CPU-bound doing useful work, that's not the case. Async shines when there are a lot of "tasks" that are not doing useful work, because they are waiting (e.g. on I/O). Waiting threads waste resources. That's what async greatly improves.
The simplest example is that you can easily be wasteful in your use of threads. If you just write blocking code, you will block the thread while waiting on io, and threads are a finite resource.
So avoiding that would mean a server can handle more traffic before running into limits based on thread count.
Inversion of thought pattern: Why is a thread such a waste that we can't have one per concurrent request? Make threads less wasteful instead. Go took things in this direction.
Pretty much anything that needs performance and has a lot of relatively light operations is not a candidate for spawning a thread. Context switching and the cost of threads is going to kill performance. A server spawning a thread per request for relatively lightweight request is going to be extremely slow. But sure, if every REST call results in a 10s database query then that's not your bottleneck. A query to a database can be very fast though (due to caches, indices, etc.) so it's not a given that just because you're talking to a database you can just spin up new threads and it'll be fine.
EDIT: Something else to consider is what if your REST calls needs to make 5 queries. Do you serialize them? Now your latency can be worse. Do you launch a thread per query? Now you need to a) synchornize b) take x5 the thread cost. Async patterns or green threads or coroutines enable more efficient overlapping of operations and potentially better concurrency (though a server that handles lots of concurrent requests may already have "enough" concurrency anyways).
Server applications don’t spawn threads per request, they use thread pools. The extra context switching due to threads waiting for I/O is negligible in practice for most applications. Asynchronous I/O becomes important when the number of simultaneous requests approaches the number of threads you can have on your system. Many applications don’t come close to that in practice.
There’s a benefit in being able to code the handling of a request in synchronous logic. A case has to be made for the particular application that it would cause performance or resource issues, before opting for asynchronous code that adds more complexity.
Thread pools are another variation on the theme. But if your threads block then your pool saturates and you can't process any more requests. So thread pools still need non-blocking operations to be efficient or you need more threads. If you have thread pools you also need a way of communicating with that pool. Maybe that exists in the framework and you don't worry about it as a developer. If you are managing a pool of threads then there's a fair amount of complexity to deal with.
I totally agree there are applications for which this is overkill and adds complexity. It's just a tool in the toolbox. Video games famously are just a single thread/main loop kind of application.
There’s also a really good operational benefit if you have limits like total RAM, database connections, etc. where being able to reason about resource usage is important. I’ve seen multiple async apps struggle with things like that because async makes it harder to reason about when resources are released.
Basically it’s the non-linear execution flow creating situations which are harder to reason about. Here’s an example I’m trying to help a Node team fix right now: something is blocking the main loop long enough that some of the API calls made in various places are timing out or getting auth errors due to the signature expiring between when the request was prepared and when it is actually dispatched because that’s sporadically tend of seconds instead of milliseconds. Because it’s all async calls, there are hundreds of places which have to be checked whereas if it was threaded this class of error either wouldn’t be possible or would be limited to the same thread or an explicit synchronization primitive for something like a concurrency limit on the number of simultaneous HTTP requests to a given target. Also, the call stack and other context is unhelpful until you put effort into observability for everything because you need to know what happened between hitting await and the exception deep in code which doesn’t share a call stack.
No such thing. In a preemptive multitasking OS (that's basically all of them today) you will get context switching regardless of what you do. Most modern OS's don't even give you the tools to mess with the scheduler at all; the scheduler knows best.
That's not accurate. Preemptive multitasking just means your thread will get preempted. Blocking still incurs additional context switching. The core your thread is running on isn't just going to sit idle while your thread blocks.
I think it's another case of the whole industry being driven by the needs of the very small number of systems that need to handle >10k concurrent requests.
Also very useful for applications that have heavily animated graphical user interfaces, where you don't want I/O to interfere with UI animations on the UI thread. (e.g. ALL android apps. Or any UI framework that has a message pump (all of them?).
Or biases inherited from deploying on single or dual core 32-bit systems from 20 years ago.
Honestly, it's a mostly obsolete approach. OS threads are fast. We have lots of cores. The cost of bouncing around on the same core and losing L1 cache coherence is higher than the cost of firing up a new OS thread that could land on a new core.
The kernel scheduler gets tuned. Language specific async runtimes are unlikely to see so many eyeballs.
Function coloring also only applies to a few select languages. If your runtime allows you can call an async function from a sync function by pausing execution of the current function/thread whenever you're waiting for some async op.
Libraries like Tokio (mentioned in the article) have support for this built-in. Goroutines sidestep the issue completely. C# Tasks are batteries included in that regard. In fact function colors aren't an issue in most languages that have async/await. JavaScript is the odd one out, mostly due to being single-threaded. Can't really be made to work in a clean way in existing JS engines.
“Function coloring” is an imaginary issue in the first place. Or rather it's a real phenomenon, but absolutely not limited to async and people don't seem to care about it at all except when talking about async.
Take Rust: you return `Result<T,E>`, you are coloring your function the same way as you are when using `async`. Same for Option. Errors as return values in Go: again, function coloring.
One of your nested function starts taking a "serverUrl" input parameter instead of reading an environment variable: you've colored your function and you now need to color the entire call stack (taking the url parameter themselves).
All of them are exactly as annoying, as you need to rewrite the entire call stack's function signature to accommodate for the change, but somehow people obsess about async in particular as if it was something special.
It's not special, it's just the reflection that something can either be explicit and require changing many function signatures at once when making a change, or be implicit (with threads, exceptions or global variables) which is less work, but less explicit in the code, and often more brittle.
Function coloring does not mean that functions take parameters and have return values. Result<T,E> is not a color. You can call a function that returns a Result from any other function. Errors as return values do not color a function, they're just return values.
Async functions are colored because they force a change in the rest of the call stack, not just the caller. If you have a function nested ten levels deep and it calls a function that returns a Result, and you change that function to no longer return a result because it lost all its error cases, you only have to change the direct callers. If you are ten layers deep in a stack of synchronous functions and suddenly need to make an asynchronous call, the type signature of every individual function in the stack has to change.
You might say "well, if I'm ten layers deep in stack of functions that don't return errors and have to make a call that returns the error, well now I have to change the entire stack of functions to return the error", but that's not true. The type change from sync to async is forced. The error is not. You could just discard it. You could handle it somehow in one of the intervening calls and terminate the propagation of the type signature changes half way up. The caller might log the error and then fail to propogate it upwards for any number of reasons. You aren't being forced to this change by the type system. You may be forced to change by the rest of the software engineering situation, but that's not a "color".
For similar reasons, the article is incorrect about Go's "context.Context" being a coloration. It's just a function parameter like anything else. If you're ten layers deep into non-Context-using code and you need to call a function that takes a context, you can just pass it one with context.Background() that does nothing context-relevant. You may, for other software engineering reasons, choose to poke that use of a context up the stack to the rest of the functions. It's probably a good idea. But you're not being forced to by the type system.
"Coloration" is when you have a change to a function that doesn't just change the way it interacts with the functions that directly call it. It's when the changes forcibly propagate up the entire call stack. Not just when it may be a good idea for other reasons but when the language forces the changes.
It is not, in the maximally general sense, limited to async. It's just that sync/async is the only such color that most languages in common use expose.
> If you are ten layers deep in a stack of synchronous functions and suddenly need to make an asynchronous call, the type signature of every individual function in the stack has to change.
If you are ten nested functions deep in sync code and want to call an async function you could always choose to block the thread to do it, which stops the async color from propagating up the stack. That's kind of a terrible way to do it, but it's sort of the analog of ignoring errors when that innermost function becomes fallible.
So I don't buy that async colors are fundamentally different.
You can exit an async/IO monad just like you can exit an error monad: you have a thread blocking run(task) that actually executes everything until the future resolves. Some runtimes have separate blocking threadpools so you don't stall other tasks.
If you have something in a specific language that does not result in having to change the entire call stack to match something about it, then you do not have a color. Sync/async isn't a "color" in all languages. After all, it isn't in thread-based languages or programs anyhow.
Threading methodology is unrelated though. How exactly the call stack is scheduled is orthogonal to the question of whether or not making a call to a particular function results in type changes being forced on all function in the entire stack.
There may also be cases where you can take "async" code and run it entirely out of the context of any sort of sceduler, where it can simply be turned into the obvious sync code. While that does decolor the resulting call (or, if you prefer, recolor it back into the "sync" color) it doesn't mean that async is not generally a color in code where that is not an option. Solving concurrency by simply turning it off certainly has a time and place (e.g., a shell script may be perfectly happen to run "async" code completely synchronously because it may be able to guarantee nothing will ever happen concurrently), but that doesn't make the coloration problem go away when that is not an option.
Here's the list of requirements: 1. Every function has a color. 2. The way you call a function depends on its color. 3. You can only call a red function from within another red function. 4. Red functions are more painful to call. 5. Some core library functions are red.
You are complaining about point 3. You are saying if there's any way to call a red function from a blue function then it's not real. The type change from sync to async is not forced any more than changing T to Result<T,E>. You just get a Promise from the async function. So you logically think that async is not a color. You think even a Haskell IO-value can be used in a pure function if you don't actually do the IO or if you use unsafePerformIO. This is nonsense. Anything that makes the function hard to use can be color.
You can still use a function that returns result in a function that uses option.
And result and option usually mean something else. Option is a value or none. None doesn't necessarily means the function failed. Result is the value or an error message. You can have result<option, error>
That's different then async where you can call the other type.
Function coloring is an effect. If the language makes a distinction between sync and async, then it has that effect. Just because there are escape hatches to get around one effect doesn't really change this fact.
Like in Haskell there is the IO monad used to denote the IO effect. And there are unsafe ways to actually execute it - does that make everything in Haskell impure?
I wish the “Function coloring” meme died. It made sense in the context of the original blog post (which was about callback hell, hence the “4. Red functions are more painful to call” section un the original blog post), but doesn't make sense in the context of async/await. There's literally nothing special with async, it's just an effect among many others.
As soon as you start using function arguments instead of using a global variable, you are coloring your function in the exact same way. Yet I don't think anyone would make the case that we should stop using function arguments and use global variables instead…
I think the lesson is to be careful about introducing incompatibility via the type system. When you introduce distinctions, you reduce compatibility. Often that’s deliberate (two functions shouldn’t be interchangeable because it introduces a bug) but the result is lots of incompatible code, and, often, duplicate code.
Effects are another way of making functions incompatible, for better or worse. It can be done badly. Java fell into that trap with checked exceptions. They meant well, but it resulted in fragmentation.
Sometimes it’s worth making an effort to make functions more compatible by standardizing types. By convention, all functions in Go that return an error use the same type. It gives you less information about what errors can actually happen, but that means the implementation of a function can be modified to return a new error without breaking callers.
Another example is standardizing on a string type. There are multiple ways strings can be implemented, but standardization is more important.
You can also use type inference with union types like ZIO. So you could e.g. return a Result where the error type is `DatabaseError | InvalidBirthdayError`. If you're in an error monad anyway, and you add a new error type deep in the call stack, it can just infer itself into the union up the stack to wherever you want to handle it.
That will help locally, but for a published API or a callback function where you don't know the callers, it's still going to break people if you change a union type. It doesn't matter if it's inferred or not.
IIRC ZIO solution is actually to return a generic E :> X|Y. Caller providing the callback knows what else is on E, and they're the only one that knows it so only they could've handled it anyway. You still get type inference.
Or if you mean that returning a new error breaks API compatibility, then yes that's the point. If now you can error in a different way, your users now need to handle that. But if it's all generic and inferred, it can still just bubble up to wherever they want to do that with no changes to middle layers.
If you declare specific error types and callers only write handlers for specific cases, then adding a new error breaks them. If you just declare a base error type in your API, they have to write a generic error handler or it doesn't type check.
In this way, declaring a type guides people to write calling code that doesn't break, provided you set it up that way. It makes things easier for the implementation to change.
Sometimes you do need handlers for specific errors, but in Go you always need to write generic error handling, too.
(A type variable can do something similar. It forces the implementation to be generic because the type isn't known, or is only partially known.)
Usually in Scala errors subtype Throwable, so as long as new union members continue to do so, if you wanted a generic handler (e.g. you just log and return) you could handle that. But you can also be more specific with actual logic, with the benefit that if you choose to do so and the underlying implementation changes, you detect it.
Go also basically requires you to write actually generic error code (if err != nil return err), so it feels like errors are more work than they are. In Scala (especially ZIO) propagation is all automatic and you just handle it wherever you'd like. The only code that cares about errors is the part generating them and the part handling them. But it's all tracked in the type system so you can always see what exact errors are possible from what methods. And it's all simple return values so no trickiness with exceptions (e.g. across async boundaries).
I mean, the very point of a type system is to introduce distinctions and reduce compatibility (compatibility of incorrectly typed programs).
Throwing the baby out with the water like what go sort of does with its error handling is no solution. The proper solution is a better type system (e.g. a result type with a generic handles what go can't).
For effects though, we need a type systems that support these - but it's only available in research languages so far. You can actually just be generic in effects (e.g. an fmap function applying a lambda to a list could just "copy" the effect of the lambda to the whole function - this can be properly written down and enforced by the compiler)
Async/await will be equivalent to parameters when they are first class and can be passed in as parameters. Language syntax and semantics are not equivalent and colored functions are colored by the syntax. Zig avoided colored functions by doing something very much like this.
Sure. But syntax sugar that allows you to write functions whose content isn't indented 300 columns to the right whose flow of control is much easier to reason about. <shrugs>
The thread treats async/await as one design pattern. It hasn’t aged the same way across languages.
C# async/await on top of the TPL is doing different work from JavaScript’s promise model. The C# version composes with cancellation tokens, structured exception handling, and a real thread pool underneath. JavaScript coloured the language because it had to: single-threaded runtime, no alternative. Rust’s async is closer to C++’s coroutines: a state machine at the call site, no runtime by default, executors as libraries. Three different things wearing the same syntax.
C++ shipped without async for thirty years and got along on thread pools and condition variables. The async crowd is right that thread-per-connection breaks at c10k. The thread-per-connection crowd is right that almost nobody is at c10k. Both are right about different problems.
The question that matters isn’t whether async was a mistake. It’s whether each language imported the cost (function colouring, runtime overhead, debugger pain) for the workload it actually has. JavaScript had to. C# had a strong case. Rust’s case for embedded is excellent. Python’s case is the most contested of the four and the one that gets defended hardest.
Async is a Javascript hack that inexplicably got ported to other languages that didn't need it.
The issue arose because Javascript didn't have threads, and processing events from the DOM is naturally event driven. To be fair, it's a rare person who can deal with the concurrency issues threads introduce, but the separate stacks threads provide a huge boon. They allow you to turn event driven code into sequential code.
window.on_keydown(foo);
// Somewhere far away
function foo(char_event) { process_the_character(char_event.key_pressed) };
becomes:
while (char = read())
process_the_character(char);
The latter is easy to read linear sequence of code that keeps all the concerns in one place, the former rapidly becomes a huge entangled mess of event processing functions.
The history of Javascript described in the article is just a series of attempts to replace the horror of event driven code with something that looks like the sequential code found in a normal program. At any step in that sequence, the language could have introduced green threads and the job would have been done. And it would have been done without new syntax and without function colouring. But if you keep refining the original hacks they were using in the early days and don't the somewhat drastic stop of introducing a new concept to solve the problem (separate stacks), you end up where they did - at async and await. Mind you, async and await to create a separate stack of sorts - but it's implemented as a chain objects malloc'ed on the heap instead the much more efficient stack structure.
I can see how the javascript community fell into that trap - it's the boiling frog scenario. But Python? Python already had threads - and had the examples of Go and Erlang to show how well then worked compared to async / await. And as for Rust - that's beyond inexplicable. Rust has green threads in the early days and abandoned them in favour of async / await. Granted the original green thread implementation needed a bit of refinement - making every low level choose between event driven and blocking on every invocation was a mistake. Rust now has a green thread implementation that fixes that mistake, which demonstrates it wasn't that hard to do. Yet they didn't do it at the time.
It sounds like Zig with its pluggable I/O interface finally got it right - they injected I/O as a dependency injected at compile time. No "coloured" async keywords and compiler monomorphises the right code. Every library using I/O only has to be written once - what a novel concept! It's a pity it didn't happen in Rust.
> async/await came out of C# (well at least the JS version of it).
Having been instrumental in accelerating bringing async/await to JS, it definitely was the case that it came from C# and we eagerly were awaiting its arrival in JS and worked with Igalia to focus on it and make it happen across browsers more quickly so people could actually depend on it.
Yep and I loved when C# introduced it. I worked on a system in C# that predated async/await and had to use callbacks to make the asynchronous code work. It was a mess of overnested code and poor exception handling, since once the code did asynchronous work the call stack became disconnected from where the try-catches could take care of them. async/await allowed me to easily make the code read and function like equivalent synchronous code.
> async/await came out of C# (well at least the JS version of it).
Not sure if inspired by it, but async/await is just like Haskells do-notation, except specialized for one type: Promise/Future. A bit of a shame. Do-notation works for so many more types.
- for lists, it behaves like list-comprehensions.
- for Maybes it behaves like optional chaining.
- and much more...
All other languages pile on extra syntax sugar for that. It's really beautiful that such seemingly unrelated concepts have a common core.
Similarly F#'s computation expressions predate C#'s syntax, and there is some evidence that C# language designers were looking at F#'s computation expressions. Since the Linq work, C# has been very aware of Monads, and very slow and methodical about how it approaches them. Linq syntax is a subtly compromised computation expression and async/await is a similar compromise.
It's interesting to wonder about the C# world where those things were more unified.
It's also interesting to explore in C# all the existing ways that Linq syntax can be used to work with arbitrary monads and also Task<T> can be abused to use async/await syntax for arbitrary monads. (In JS, it is even easier to bend async/await to arbitrary monads given the rules of a "thenable" are real simple.)
> use async/await syntax for arbitrary monads. (In JS, it is even easier to bend async/await to arbitrary monads given the rules of a "thenable" are real simple.)
I tried once to hack list comprehensions into JS by abusing async/await. You can monkey patch `then` onto Array and define it as flatMap and IIRC you can indeed await arrays that way, but the outer async function always returns a regular Promise. You can't force it to return an instance of the patched Array type.
JavaScript got async in 2017, Python in 2015, and C# in 2012. Python actually had a version of it in 2008 with Twisted's @inlineCallbacks decorator - you used yield instead of await, but the semantics were basically the same.
> And as for Rust - that's beyond inexplicable. Rust has green threads in the early days and abandoned them in favour of async / await.
There was a fair bit of time between the two, to the point I'm not sure the latter can be called much of a strong motivation for the former. Green threads were removed pre-1.0 by the end of 2014 [0], while work on async/await proper started around 2017/2018 [1].
In addition, I think the decision to remove green threads might be less inexplicable than you might otherwise expect if you consider how Rust's chosen niche changed pre-1.0. Off the top of my head no obligatory runtime and no FFI/embeddability penalties are the big ones.
> Rust now has a green thread implementation that fixes that mistake
As part of the runtime/stdlib or as a third-party library?
But for a long time (I think even till today despite that there is as an optional free-threaded build) CPython used Global Interpreter Lock (GIL) which paradoxically makes the programs run slower when more threads are used. It's a bad idea to allow to share all the data structure across threads in high level safe programming languages.
JS's solution is much better, it has worker threads with message passing mechanisms (copying data with structuredClone) and shared array buffers (plain integer arrays) with atomic operation support. This is one of the reasons why JavaScript hasn't suffered the performance penalty as much as Python has.
No, you appear to have no idea what you're talking about here. Rust abandoned green threads for good reason, and no, the problems were not minor but fundamental, and had to do with C interoperability, which Go sacrifices upon the altar (which is a fine choice to make in the context of Go, but not in the context of Rust). And no, Rust does not today have a green thread implementation. Furthermore, Rust's async design is dramatically different from Javascript, while it certainly supports typical back-end networking uses it's designed to be suitable for embedded contexts/freestanding contexts to enable concurrency even on systems where threads do not exist, of which the Embassy executor is a realization: https://embassy.dev/
> At any step in that sequence, the language could have introduced green threads and the job would have been done.
The job wouldn’t have been done. They would have needed threads. And mutexes. And spin locks. And atomics. And semaphores. And message queues. And - in my opinion - the result would have been a much worse language.
Multithreaded code is often much harder to reason about than async code, because threads can interleave executions and threads can be preempted anywhere. Async - on the other hand - makes context switching explicit. Because JS is fundamentally single threaded, straight code (without any awaits) is guaranteed to run uninterrupted by other concurrent tasks. So you don’t need mutexes, semaphores or atomics. And no need to worry about almost all the threading bugs you get if you aren’t really careful with that stuff. (Or all the performance pitfalls, of which there are many.)
Just thinking about mutexes and semaphores gives me cold sweats. I’m glad JS went with async await. It works extremely well. Once you get it, it’s very easy to reason about. Much easier than threads.
Once you write enough code, you'll realize you need synchronization primitives for async code as well. In pretty much the same cases as threaded code.
You can't always choose to write straight code. What you're trying to do may require IO, and then that introduces concurrency, and the need for mutual exclusion or notification.
Examples: If there's a read-through cache, the cache needs some sort of lock inside of it. An async webserver might have a message queue.
The converse is also true. I've been writing some multithreaded code recently, and I don't want to or need to deal with mutexes, so, I use other patterns instead, like thread locals.
Now, for sure the async equivalents look and behave a lot better than the threaded ones. The Promise static methods (any, all, race, etc) are particularly useful. But, you could implement that for threads. I believe that this convenience difference is more due to modernity, of the threading model being, what 40, 50, 60 years old, and given a clean-ish slate to build a new model, modern language designers did better.
But it raises the idea: if we rethought OS-level preemptible concurrency today (don't call it threads!), could we modernize it and do better even than async?
> Once you write enough code, you'll realize you need synchronization primitives for async code as well. In pretty much the same cases as threaded code.
I've been programming for 30 years, including over a decade in JS. You need sync primitives in JS sometimes, but they're trivial to write in javascript because the code is run single threaded and there's no preemption.
> What you're trying to do may require IO
Its usually possible to factor your code in a way that separates business logic and IO. Then you can make your business logic all completely synchronous.
Interleaving IO and logic is a code smell.
> The Promise static methods (any, all, race, etc) are particularly useful. But, you could implement that for threads. I believe that this convenience difference is more due to modernity, of the threading model being, what 40, 50, 60 years old, and given a clean-ish slate to build a new model, modern language designers did better.
Then why don't see any better designs amongst modern languages?
New languages have an opportunity to add newer, better threading primitives. Yet, its almost always the same stuff: Atomics, mutexes and semaphores. Even Rust uses the same primitives, just with a borrow checker this time. Arguably message passing (erlang, go) is better. But Go still has shared mutable memory and mutexes in its sync library.
> But it raises the idea: if we rethought OS-level preemptible concurrency today (don't call it threads!), could we modernize it and do better even than async?
I'd love to see some thought put into this. Threading doesn't seem like a winner to me.
Now you are comparing single threaded code with multi threaded, which is a completely different axis to async vs sync. Just take a look at C#'s async, where you have both async and multi threading, with all the possible combinations of concurrency bugs you can imagine.
Of course I'm comparing them. Threading and async are two solutions to the same problem: How do you write high performance event driven systems like network services? How do you solve the C10K problem (or more recently the C10M problem)?
If you use a thread per connection (or green threads like Go), you don't also need async. If you have async (eg nodejs), you can get great performance without threads. You're right that they can also be combined - either within a single process (like tokio in rust). Or via multi-process configurations (eg one nodejs instance per core, all behind nginx). But they don't need to be. Go (green threads) and Nodejs (async, single threads) both work well.
Of course we're comparing them. We all want to know who wore it better
> The job wouldn’t have been done. They would have needed threads. And mutexes. And spin locks. And atomics. And semaphores. And message queues. And - in my opinion - the result would have been a much worse language.
My point is that you do need mutexes, spin locks, etc with async as well, given that you have a multi threaded platform. So no, we have basically 2x2 stuff we are talking about with very different properties (async/sync x single/multi threaded).
> Rust has green threads in the early days and abandoned them in favour of async / await. Granted the original green thread implementation needed a bit of refinement - making every low level choose between event driven and blocking on every invocation was a mistake.
That's a mischaraterization. They were abandoned because having green threads introduces non-trivial runtime. It means Rust can't run on egzotic architectures.
> It sounds like Zig with its pluggable I/O interface finally got it right
That remains to be seen. It looks good, with emphasis on looks. Who knows what interesting design constraints and limitation that entails.
Looking at comptime, which is touted as Zig's mega feature, it does come at expense of a more strictly typed system.
Intuition is relative: when I first encountered unix-style synchronous, threaded IO, I found it awkward and difficult to reason about. I had grown up on the callback-driven classic Mac OS, where you never waited on the results of an IO call because that would freeze the UI; the asynchronous model felt like the normal and straightforward one.
It is a tool. Some tools make you more productive after you have learned how to use them.
I find it interesting how in software, I repeatedly hear people saying "I should not have to learn, it should all be intuitive". In every other field, it is a given that experts are experts because they learned first.
> I find it interesting how in software, I repeatedly hear people saying "I should not have to learn, it should all be intuitive". In every other field, it is a given that experts are experts because they learned first.
Other fields don't have the same ability to produce unlimited incidental complexity, and therefore not the same need to rein it in. But I don't think there's any field which (as a whole) doesn't value simplicity.
I feel like it's missing my point. Using a chainsaw is harder than using a manual saw, but if you need to cut many trees it's a lot more efficient to first learn how to use the chainsaw.
Now if you take the chainsaw without spending a second thinking about learning to use it, and start using it like a manual saw... no doubt you will find it worse, but that's the wrong way to approach a chainsaw.
And I am not saying that async is "strictly better" than all the alternatives (in many situations the chainsaw is inferior to alternatives). I am saying that it is a tool. In some situations, I find it easier to express what I want with async. In others, I find alternatives better. At the end of the day, I am the professional choosing which tool I use for the job.
Right but how do you expose your state machine and epoll logic to callers? As a blocking function? As a function that accepts continuations and runs on its own thread? Or with no interface such that anyone who wants to interoperate with you has to modify your state machine?
I find state machines plus some form of message passing more intuitive than callbacks or any abstraction that is based on callbacks. Maybe I'm just weird.
When I did not know how to program, neither async nor message passing were intuitive. I had to learn, and now those are tools I can use when they make sense.
I never thought "programming languages are a failure, because they are not intuitive to people who don't know how to program".
My point being that I don't judge a tool by how intuitive it is to use when I don't know how to use it. I judge a tool by how useful it is when I know how to use it.
Obviously factoring in the time it took to learn it (if it takes 10 years to master a hammer, probably it's not a good hammer), but if you're fine with programming, state machines and message passing, I doubt that it will take you weeks to understand how async works. Took me less than a few hours to start using them productively.
Not true. I’ve used both, and I often prefer the explicitness of async await. It’s easier to reason about. The language guarantees that functions which aren’t async can’t be preempted - and that makes a lot of code much easier to write because you don’t need mutexes, atonics and semaphores everywhere. And that in turn often dramatically improves performance.
At least in JS. I don’t find async in rust anywhere near as nice to use. But that’s a separate conversation.
It forces programmers to learn completely different ways of doing things, makes the code harder to understand and reason about, purely in order to get better performance.
Which is exactly the wrong thing for language designers to do. Their goal should be to find better ways to get those performance gains.
> It forces programmers to learn completely different ways of doing things, makes the code harder to understand and reason about, purely in order to get better performance.
Technically, promises/futures already did that in all of the mentioned languages. Async/await helped make it more user friendly, but the complexity was already there long before async/await arrived
If I want sequential execution, I just call functions like in the synchronous case and append .await.
If I want parallel and/or concurrent execution, I spawn futures instead of threads and .await them.
If I want to use locks across await points, I use async locks, anything else?
Really? async/await is the model that makes it really easy to ignore all the subtleties of asynchronous code and just go with it. You just need to trial and error where/when to put async/await keywords. It's not hard to learn. Just effort. If something goes wrong, then "that's just how things go these days".
You can call async methods without immediately calling await. You can naively await as late as possible. They'll run in parallel, or at least how ever the call was configured.
In Javascript, promises are eager and start executing immediately. They return control back to the caller when they need to wait. So in practice, all of your promises are running concurrently as soon as you create them.
In Rust, futures are lazy don't start executing until they are awaited. You have to use various features of your chosen runtime to run multiple futures concurrently (functions like `spawn` or `select`). But that interface isn't standardized and leads to the the ecosystem fragmentation issue discussed in the article. There was an attempt to standardize the interface in the `futures` crate, but none of the major runtimes actually implement the interface.
I’m not really smart on this subject but I started during callback hell and now use async in Node and front-end and I find it to be just superb. Sometimes I have to reason about queued tasks vs. micro tasks and all that but most of the time it just does what I expect and keeps the code very clean.
> async/await introduced entirely new categories of bugs that threads don’t have. O’Connor documents a class of async Rust deadlocks he calls “futurelocks”
I didn't coin that term, the Oxide folks did: https://rfd.shared.oxide.computer/rfd/0609. I want to emphasize that I don't think futurelocks represent a "fundamental mistake" or anything like that in Rust's async model. Instead, I believe they can be fixed reliably with a combination of some new lint rules and some replacement helper functions and macros that play nicely with the lints. The one part of async Rust that I think will need somewhat painful changes is Stream/AsyncIterator (https://github.com/rust-lang/rust/issues/79024#issuecomment-...), but those aren't yet stable, so hopefully some transition pain is tolerable there.
> The pattern scales poorly beyond small examples. In a real application with dozens of async calls, determining which operations are independent and can be parallelized requires the programmer to manually analyze dependencies and restructure the code accordingly.
I think Rust is in an interesting position here. On the one hand, running things concurrently absolutely does take deliberate effort on the programmer's part. (As it does with threads or goroutines.) But on the other hand, we have the borrow checker and its strict aliasing rules watching our back when we do choose to put in that effort. Writing any sort of Rust program comes with cognitive overhead to keep the aliasing and mutation details straight. But since we pay that overhead either way (for better or worse), the additional complexity of making things parallel or concurrent is actually a lot less.
> At the function level, adding a single i/o call to a previously synchronous function changes its signature, its return type, and its calling convention. Every caller must be updated, and their callers must be updated.
This is part of the original function coloring story in JS ("you can only call a red function from within another red function") that I think gets over-applied to other languages. You absolutely can call an async function from a regular function in Rust, by spinning up a runtime and using `block_on` or similar. You can also call a regular function from an async function by using `spawn_blocking` or similar. It's not wonderful style to cross back and forth across that boundary all the time, and it's not free either. (Tokio can also get mad at you if you nest runtimes within one another on the same thread.) But in general you don't need to refactor your whole codebase the first time you run into a mismatch here.
> This was bad enough that Node.js eventually changed unhandled rejections from a warning to a process crash, and browsers added unhandledrejection events. A feature designed to improve error handling managed to create an entirely new class of silent failures that didn’t exist with callbacks.
Zig is just doing vtable-based effect programming. This is the way to go for far more than async, but it also needs aggressive compiler optimization to avoid actual runtime dispatch.
I would really hate to work with a blue/red function system. I would have to label all my functions and get nothing in return. But, labelling my functions with some useful information that I care about, that can tell me interesting things about the function without me having to read the function itself and all the functions that it calls, I'd consider a win. I happen to care about whether my functions do IO or not, so the async label has been nothing short of a blessing.
Async ruined Rust for me, even though I write exactly the kind of highly concurrent servers to which it's supposed to be perfectly suited. It degrades API surfaces to the worst case :Send+Sync+'static because APIs have to be prepared to run on multithreaded executors, and this infects your other Rust types and APIs because each of these async edges is effectively a black hole for the borrow checker.
Don't get me started on how you need to move "blocking" work to separate thread pools, including any work that has the potential to take some CPU time, not even necessarily IO. I get it, but it's another significant papercut, and your tail latency can be destroyed if you missed even one CPU-bound algorithm.
These may have been the right choices for Rust specifically, but they impair quality of life way too much in the course of normal work. A few years ago, I had hope this would all trend down, but instead it seems to have asymptoted to a miserable plateau.
I make use of all of those, but still prefer avoiding Async, for the typical coloring reason. I can integrate the above things into a code base with low friction; Async poses a compatibility barrier.
But yes, once you go dining on other people's crates you definitely get the impression that you have to, because tokio gets its fingerprints all over everything.
But also there are non-thread stealing runtimes that don't require Send/Sync on the Future. Just nobody uses them.
this is dead true to me. I write systems code. Rust is supposed to be a systems language. Because I do work that is effectively always written as if it's in the kernel mode and distributed over the network, everything I do is async by default. And the ergonomics around async are just miserable, littered with half-finished implementations and definitions (i.e. async streams, async traits), and a motley bunch of libraries that are written to use particular modes of tokio that don't necessary play well together. its a giant bodge that would be excusable if that wasn't supposed to be part of the core applicability of the language. not to mention that the whole borrower business becomes largely useless, so you forgot to add Arc+Mutex, and Pin implicitly to your list of wrapper type signatures.
what bother me the most, is that aside from async, I _really_ do like the language and appreciate what its trying to do. otherwise I would just turn away from the whole mess. this just landed really badly.
I completely agree. I really like rust, but all the async stuff is so half baked. It’s shocking coming from the JavaScript ecosystem. Async feels - comparatively - incredibly simple in JS. Even async streams are simple in JS and they work great. And I don’t have to wait 10 years for the linker to process all of tokio for a 1 line change.
I think they mean tokio::spawn’s signature forces libraries that want to be easy to use with it to expose send+sync APIs (and thus use Arc+Mutex internally)
> Async ruined Rust for me, even though I write exactly the kind of highly concurrent servers to which it's supposed to be perfectly suited. It degrades API
It refers to async in Rust, and everyone else is responding as if it is talking about async in Rust. That's a mischaracterization. You don't have to use the Tokio executor.
It's a bit like saying, "Graphics in Rust are ruined. I need to use Vulkan to make my game."
No, you don't. If your use case is unique enough, use something else. There are bindings for OpenGL, DirectX, etc.
> This is a promise (JavaScript) or future (Java, Rust, etc). The concept dates to Baker and Hewitt in 1977, but it took the C10K pressure of the 2010s to push it into mainstream programming.
Almost. JavaScript adopted async because it was a programming language designed to slot into someone else's event loop. Other programming languages, at least on the server, that needed lightweight threading didn't bother with any of this, they just shipped their own managed stacks. But UI code practically demands to own its own event loop and requires everything else live as callbacks inside of it. And JavaScript, because it was designed to live in a browser, inherited these same semantics.
It seems unfair to spend so much time in this article talking about JavaScript and Java without mentioning that async/await first appeared in .NET, and _broadly speaking_ works pretty well there.
This was a hardware and os level problem first. All of that had to be solved before higher level abstractions through languages like go JavaScript could tackle it. Author skipped this entirely.
in real life when request handler call async/colored/whatnot it lets the call proceed and immediately ready to process next request. The backend then would have no problems to create ever growing number of asyncs currently in flight. In real life those asyncs would most likely end up calling database. The end result is that backend would simply overwhelm the database and other resources that have to maintain states of those countless asyncs in flight.
This whole thing is basically snake oil. The best thing backend can do instead is have dedicated thread pool where each real thread has its own queue of limited size. Each element in queue would contain input and output state of request and code to deal with those. Once queue grows over certain size the backend should simply immediately return error code (too busy). Much more sound strategy in my opinion.
There are more complex cases of course (like computationally expensive requests with no io that take long time). Handling those would require some extra logic. Async stuff however will not help here either
There is a small "straw man" bias here. Callbacks are not the only alternative to Promises. There exist state machines and event loops too.
I play around with real time audio, and use state machine/event loop. A very powerful, if verbose, method to do real-time programming, I cannot see how asyc/await could achieve the same ends
Not a fan of async in other languages (I avoid it in rust and python like the plague), but it feels like a straight upgrade in JS. I’ve never once regretted its addition. In my experience it’s extremely rare for things to get more complicated than an await followed by a Promise.all(). Unhandled rejections are super obvious to a human as performing a .then() chain is uncommon in the days of await. And linters will pick it up if you miss it. Function coloring isn’t an issue as all of the Node stdlib that I’ve seen provides async functionality (back in the day you could accidentally call a synchronous file system operation and break the event loop). You end up with everything returning a promise except for some business logic at the leafs of the dependency graph. A Node app is mostly i/o anyway, thus the functions mostly return Promises. The await keyword is homomorphic across promises and other values. And type checking (who isn’t using typescript?) will catch most API changes where something becomes async. I can’t say it’s perfect, but it’s really not a problem for me.
> No mention of Novell Netware. This was a solved problem decades ago and Windows had it for almost as long.
Solved at the OS layer, sure. This article is about solving it at the language layer.
It's been solved for a long time at the OS layer. I mean, didn't VAX have something like overlapped IO in 1980? Unix had asynchronous IO as well with `select`.
It’s a slop alright. But it also missed the next mainstream iteration which is Java virtual threads / Goroutines. Those do away with coloring by attacking the root of the problem: that OS threads are expensive.
Sure, it comes with its own issues like large stacks (10k copy of nearly the same stack?) and I predict a memory coloring in the future (stack variables or even whole frames that can opt out from being copied across virtual threads).
Not really. The author provides Go as evidence, but Go's CSP-based approach far predates the popularity of async/await. Meanwhile, Zig's approach still has function coloring, it's just that one color is "I/O function" and the other is "non-I/O function". And this isn't a problem! Function coloring is fine in many contexts, especially in languages that seek to give the user low-level control! I feel like I'm taking crazy pills every time people harp about function coloring as though it were something deplorable. It's just a bad way of talking about effect systems, which are extremely useful. And sure, if you want to have a high-level managed language like Go with an intrusive runtime, then you can build an abstraction that dynamically papers over the difference at some runtime cost (this is probably the uniformly correct choice for high-level languages, like dynamic or scripting languages (although it must be said that Go's approach to concurrency in general leaves much to be desired (I'm begging people to learn about structured concurrency))).
Certainly it’s true that Go invented neither, both Erlang and Haskell had truly parallel green threads without function coloring before Go or Node existed.
You could imagine a programming language that expressed “comptime” as a function argument of a type that is only constructible at compile-time. And one for runtime as well, and then functions that can do both can take the sum type “comptime | runtime”.
> Function signatures don’t change based on how they’re scheduled, and async/await become library functions rather than language keywords.
The functions have the same calling conventions regardless of IO implementation. Functions return data and not promises, callbacks, or futures. Dependency injection is not function coloring.
Okay, then by that definition, Rust doesn't have colored functions either, because `async fn foo() -> Bar {` in Rust is just syntax sugar for `fn foo() -> impl Future<Output=Bar> {` (that's a normal function that returns a future), and `Future` is just a normal trait that provides a `poll` method; there's no different calling convention anywhere. So which is it? Either Rust and Zig both have colored functions, or neither do.
The original function colouring essay was much more about JavaScript's implementation than a general statement.
If JavaScript had exposed a way for a synchronous function to call back into the runtime to wait for an async function to complete, it would still be just as coloured, but no one would be complaining about colour (deadlocks yes, but that's another kettle of fish).
Java had green threads in 1997, removed them in 2000 and brought them back properly now as virtual threads.
I'm kinda glad they've sat out the async mania, with virtual threads/goroutines, the async stuff just feels like lipstick on a pig. Debugging, stacktrackes etc. are just jumbled.
Like their purpose/implementation everything is just so different, they don't share anything at all.
https://docs.rs/tokio/latest/%20tokio/runtime/struct.Handle....
I assume that answers your question.
And for my money I prefer async/await to the structured concurrency stuff..
https://openjdk.org/jeps/505
There are also related discussions on other platforms that are worthy to read.
A while back I just started leaning in. I write a lot of Python at work, and anytime I have to use a library that's relies on asyncio, I just write the entire damn app as an asynchronous one. Makes function coloring a non-issue. If I'm in a situation where the two have to coexist, the async runtime gets its own thread and communication back and forth is handled at specific boundaries.
There's no "inherently synchronous structure", at least not in Javascript. The nature is synchronous, asynchronous is an illusion built on top of it. Which is why you can easily block an "asynchronous" program:
on any async function will do.JavaScript execution is synchronous on a single call stack. That's why they added Workers which is different to async.
Rust's Tokio and co are also blocking. You need threads to get something that's not an inherently synchronous with merely a facade or cooperative asychronicity.
Yes, having to rewrite literally all of your code because you need to use an async function somewhere is an issue.
An even bigger issue is that now you have two (incompatible!) versions of literally every library dependency.
I was talking about when writing from scratch.
You can make any async system synchronous. It's much harder to mske a sync sydtem asynchronous. (Misquoting from something Erlang-related).
There are many cases when I don't care if a function call is asynchronous. I'm happy to wait for the result. Yet too many systems tell me I can't, for no good reason.
To give a concrete example: computer music languages, such as SuperCollider, need concurrency to implement musical scheduling. Imagine a musical sequence where you play a note, wait N beats, play another note, etc. Often you want to play many such sequences simultaneously. Stackful coroutines provide a very elegant solution to this problem. Every independent musical sequence can be modelled by a coroutine that yields everytime it needs to wait. The yielded value is interpreted by the scheduler as a delta time after which the function should be resumed. In this sense, SuperCollider users have been doing async programming since the early 2000s, long before it became mainstream.
Now function colouring is interesting but not for the reason these articles get excited. Recolouring is easy and has basically no impact on code maintenance. BUT if you need that code path to really fly then marking it as async is a killer, as all those tiny little promises add tiny delays in the form of many tasks. Which add up to performance problems on hot code paths. This is particularly frustrating if functions are sometimes async, like lazy loaders or similar cache things. To get around this you can either use callbacks instead or use selective promise chaining to only use promises when you get a promise. Both strategies can be messy and trip up people who don’t understand these careful design decisions.
One other fun thing is indexeddb plays terribly with promises, as it uses a “transactions close at end of task” mechanism, making certain common patterns impossible with promises due to how they behave with the task system. Although some API designers have come up with ways around this to give you promise interfaces for databases. Normally by using callbacks internally and only doing one operation per transaction.
That depends on the language/framework. In some languages, `await foo()` is equivalent to `Future f = foo(); await f`. In others (e.g. Python), it's a primitive operation and you have to use a different syntax if you want to create a future/task. In Trio (an excellent Python alternative to asyncio), there isn't even the concept of a future at all!
This is a solved problem in C#. You can use ValueTask<T> instead of Task<T> and no promise will be allocated if it never awaits.
Is this because the functions are async or is that because most of the time async is used for things that are I/O like and therefore susceptible to these kinds of delays?
However, in embedded rust async functions are amazing! Combine it with a scheduler like rtic or embassy, and now hardware abstractions are completely taken care of. Serial port? Just two layers of abstraction and you have a DMA system that shoves bytes out UART as fast as you can create them. And your terminal thread will only occupy as much time as it needs to generate the bytes and spit them out, no spin locking or waiting for a status register to report ready.
I wish to God that the rust library devs would admit to this fact - say that async rust should stay for embedded runtimes usecases, but we shouldn’t be forcing async across the majority of general purpose computing libraries. It’s just not a pleasant experience to write nor read. And it really doesn’t give any performance benefits.
More than once, we have wanted to improve the performance of some path and been able to lift the sequential model into a stream, evaluated concurrently with some max buffer size. From there, converting to true parallel execution is just a matter of wrapping the looped futures in Tasks.
Obviously just sprinkling an async on it isn’t going to make anything faster (it just converts your function into a state-machine generator that then needs to be driven to completion). But being able to easily and progressively move code from sequential to concurrent to parallel execution makes for significant performance gains.
The other advantage is a rough classification in the type system. Not marking a function as async means that the author believes it can be run in a reasonable amount of time and is safe to run eg. on a UI main thread. In that sense, the propagation through the call hierarchy is a feature, not a bug.
I can see that maintaining multiple versions of a function is annoying for library authors, but on the other hand, functions like fs.readSync shouldn’t even exist. Other code could be running on this thread, so it's not acceptable to just freeze it arbitrarily.
Saying that fs.readSync shouldn't exist is really weird. Not all code written benefits from async nor even requires it. Running single threaded, sync programs is totally valid.
In a good API design, you should exposed functions that each do one thing and can easily be composed together. The 'readSync' function doesn't meet that requirement, so it's arguably not necessary - it would be better to expose two separate functions.
This was not a big issue when computers only had a single processor or if the OS relied on cooperative multi-threading to perform I/O. But these days the OS and disk can both run in parallel to your program so the requirement to block when you read is a design wart we shouldn't have to live with.
No, it tells the OS "schedule the current thread to wake up when the data read task is completed".
Having to implement that with other OS primitives is a) complex and error-prone, and b) not atomic.
Even websites had this problem with freezing the browser in the early AJAX days, when people would do a synchronous XMLHttpRequest without understanding it.
i don't see it as very useful or elegant to integrate any form for parallelism or concurrency into every imaginable api. depends on context of course. but generalized, just no. if a kind of io takes a microsecond, why bother.
Maybe, but is it useful to have sync options?
You can still write single threaded programs
Sync options are useful. If everything is on the net probably less so. But if you have a couple of 1ms io ops that you want to get done asap, it's better to get them done asap.
But in ordinary JS there just can't be a race condition, everything is single threaded.
Two async functions doing so is not a data race either.
Serially, completely synchronously overwriting values is none of these categories though.
Concurrency is needed for race conditions, parallelism is needed for data races. Many single threaded runtimes including JS have concurrency, and hence the potential for race conditions, but don't have parallelism and hence no data races.
What we may argue over (and it becomes more of a what definition to use): IO/external event loop/signal handlers. These can cause race conditions even in a single threaded program, but one may argue (this is sort of where I am) that these are then not single threaded. The kernel IO operation is most definitely not running on the same thread of execution as JS.
I think I have been fairly consistent in the definition of a data race as a type of race condition, where a specific shared memory is written to while other(s) read it with no synchronization mechanism. This can be safe (most notably OpenJDK's implementation is tear-free, no primitive or reference pointer may ever be observed as a value not explicitly set by a writer), or unsafe (c/c++/rust with unsafe, surprisingly go) where you have tearing and e.g. a pointer data race can cause the pointer to appear as a value that was never set by anyone, causing a segfault or worse.
It's typically less than a hundred kilobytes and (on the systems I've benchmarked using std::thread) it takes 60usec (wall time in userspace) to create and destroy a thread.
Threads have gotten so fast that paying the async function coloring price makes very little sense for most software.
There is an extremely small set of domains where simple threading doesn't suffice, and async/await is too high a price to pay across the entire software ecosystem just to slightly optimize those domains.
you skip the function coloring.
Virtual threads in JVM are similar.
Why is reserving a megabyte of stack space "expensive"?
> and takes roughly a millisecond to create
I'm not sure where this number is from, it seems off by a few orders of magnitude. On Linux, thread creation is closer to 10 microseconds.
Because if you use one thread for each of your 10,000 idle sockets you will use 10GB to do nothing.
So you'll want to use a better architecture such as a thread pool.
And if you want your better architecture to be generic and ergonomic, you'll end up with async or green threads.
1.On a system that is handling 10k concurrent requests, the 10GB of RAM is going to be a fraction of what is installed.
2. It's not 10GB of RAM anyway, it's 10GB of address space. It still only gets faulted into real RAM when it gets used.
My example (and the c10k problem) is 10k concurrent connections, not 10k concurrent requests.
> 2. It's not 10GB of RAM anyway, it's 10GB of address space. It still only gets faulted into real RAM when it gets used.
Yes, and that's both memory and cpu usage that isn't needed when using a better concurrency model. That's why no high-performance server software use a huge amount of threads, and many use the reactor pattern.
No, it literally is not. The "memory" is just entries in a page table in the kernel and MMU. It shouldn't worry you at all.
Nor is the CPU used by the kernel to manage those threads going to be necessarily less efficient than someone's handrolled async runtime. In fact given it gets more eyes... likely more.
The sole argument I can see is just avoiding a handful of syscalls and excessive crossing of the kernel<->userspace brain blood barrier too much.
Only if you never free one of those stacks. TLB flushes can be quite expensive.
I've written massively concurrent systems where each connection only handled maybe a few kilobytes of data.
Async io is a massive win in those situations.
This describes many rest endpoints. Fetch a few rows from a DB, return some JSON.
You don't pay for stack space you don't use unless you disable overcommit. And if you disable overcommit on modern linux the machine will very quickly stop functioning.
It’s a programming model that has some really risky drawbacks.
Operating systems can shrink the memory usage of a stack.
Leaves the memory mapping intact but the kernel frees underlying resources. Subsequent accesses get either new zero pages or the original file's pages.Linux also supports mremap, which is essentially a kernel version of realloc. Supports growing and shrinking memory mappings.
Whether existing systems make use of this is another matter entirely. My language uses mremap for growth and shrinkage of stacks. C programs can't do it because pointers to stack allocated objects may exist.They sure shouldn't exist to the unused region of the stack though; if they do, that's a bug (because anything could claim that memory now). You should be free and clear to release stack pages past your current stack pointer.
They also say:
>The system spends time managing threads that could be better spent doing useful work.
What do they think the async runtime in their language is doing? It's literally doing the same thing the kernel would be doing. There's nothing that intrinsically makes scheduling 10k couroutines in userspace more efficient than the kernel scheduling 10k threads. Context switches are really only expensive when the switch is happening between different processes, the overhead of a context switch on a CPU between two threads in the same process is very small (and they're not free when done in userspace anyway).
There are advantages to doing scheduling in the kernel and there are advantages to doing scheduling in userspace, but this article doesn't really touch on any of the actual pros and cons here, it just assumes that userspace scheduling is automatically more efficient.
I feel like we're now, what, 20, 25 years on and people still haven't adjusted themselves to the fact that the machines we have now are multicore, have boatloads of cache, or how that cache is shared (or not) between cores.
Nor is there apparently a real understanding of the difference between VSS and RSS.
Nor of the fact that modern machines are really really fast if you can keep stuff in cache. And so you really should be focused on how you can make that happen.
The program could use one page's worth of stack space, which is optimal. The program could use like 200 bytes of stack space, which wastes the rest of the page. The program could recurse all the way to 9.9 MB of stack usage, stop just before overflow and then unwind back to constant 200 bytes stack space usage, and never touch all those pages ever again.
While virtual memory allocation does not require physical allocation, it immediately runs into the kinds of performance problems that huge pages are designed to solve. On modern systems, you can burn up most of your virtual address space via casual indifference to how it maps to physical memory and the TLB space it consumes. Spinning up thousands of stacks is kind of a pathological case here.
10µs is an eternity for high-performance software architectures. That is also around the same order of magnitude as disk access with modern NVMe. An enormous amount of effort goes into avoiding blocking on NVMe disk access with that latency for good reason. 10µs is not remotely below the noise floor in terms of performance.
Guess it's not a huge issue in these 64-bit days, but back in the 32-bit days it was a real limitation to how many threads you could spin up due to the limited address space.
Of course most applications which hit this would override the 1MB default.
So much so that they'll sign themselves up for async frameworks that thread steal at will and bounce things all over cores causing cache line bouncing and associated memory stalls, not understanding what this is doing to their performance profile.
And endure complexity, etc. through awkward async call chains and function colouring.
Most people's applications would be totally fine just spawning OS threads and using them without fear and dropping into a futex when waiting on I/O; or using the kernel's own async completion frameworks. The OS scheduler is highly efficient, and it is very good at managing multiple cores and even being aware asymmetrical CPU hierarchies, etc.. Likely more efficient than half the async runtimes out there.
https://vorner.github.io/async-bench.html
Further, the benches that are showing best there are non-thread-stealing scenarios, not tokio.
I also suspect simply tuning the thread-based workloads more aggressively would have the same effect.
When I profile high throughput tokio applications there's way too much contention on shared atomics, mostly inside tokio's scheduler itself. On lower core count machines and where the workload is I/O heavy, this is probably fine. So, yes, web servers.
But I'm very interested in applications that scale on machines with lots of cores and where CPU is a large part of the equation.
Equally, if a megabyte of stack is a lot for your usecase, can't you just ask pthreads to reserve less? I believe it goes down to like 16k
Async precisely improves disk/network I/O-bound applications because synchronous code has to waste a whole thread sitting around waiting for an I/O response (each with its own stack memory and scheduler overhead), and in something like an application server there will be many incoming requests doing so in parallel. Cancellation is also easier with async
CPU-bound code would not benefit because the CPU is already busy, and async adds overhead
See e.g. https://learn.microsoft.com/en-us/aspnet/web-forms/overview/... and https://learn.microsoft.com/en-us/aspnet/web-forms/overview/...
I'll publish my results shortly. I did these as baselines because I'm testing finishing the User Managed Concurrency Groups proposal to the linux kernel which is an extension to provide faster kernel threads (which beat both of them)
The UMCG implementation allows kernel thread context switches to happen in 150-200 microseconds, compared to the 1500-2000 microseconds for normal kernel thread context switches. My goal is to show that if UMCG could be merged into the Linux run time then then it would be competitive with async rust without the headache.
I'm pretty close to being done - I'm hoping to publish the entire GitHub repository with tests for the community to validate by next week.
UMCG is essentially an open source version of Google Fibers, which is their internal extension to the linux core for "light weight" threads. It requires you to build a user space scheduler, but that allows you to create different types of schedulers. I can not remember which scheduler showed ^ results but I have at least 6 different UMCG schedulers I was testing.
So essentially you get the benefits of something like tokio where you can have different types of schedulers optimized for different use cases, but the power of kernel threads which means easy cancellation, easy programming (at least in rust). It's still a linux thread with an entire 8mb(?) stack size, but from my testing it's far faster than what Tokio can provide, without the headache of async/await programming.
Using async for languages like Rust or C++ is cargo cult by people who don't know what the hell they're doing.
[Caveat: there's a use case for async if you're doing embedded development where you don't have threads or call stacks at all.]
Assuming for the sake of argument that they are together, you're still saving stack memory for every thread that isn't created. In fact you could say it allows the CPU to be idle, by spending less time context switching. On top of that, async/await is a perfect fit for OS overlapped I/O mechanisms for similar reasons, namely not requiring a separate blocking thread for every pending I/O (see e.g. https://en.wikipedia.org/wiki/Overlapped_I/O, https://stackoverflow.com/a/5283082)
So avoiding that would mean a server can handle more traffic before running into limits based on thread count.
I mean, I suppose we could move the scheduling and tracking out of kernel mode and into user mode...
But then guess what we've just reinvented?
EDIT: Something else to consider is what if your REST calls needs to make 5 queries. Do you serialize them? Now your latency can be worse. Do you launch a thread per query? Now you need to a) synchornize b) take x5 the thread cost. Async patterns or green threads or coroutines enable more efficient overlapping of operations and potentially better concurrency (though a server that handles lots of concurrent requests may already have "enough" concurrency anyways).
There’s a benefit in being able to code the handling of a request in synchronous logic. A case has to be made for the particular application that it would cause performance or resource issues, before opting for asynchronous code that adds more complexity.
I totally agree there are applications for which this is overkill and adds complexity. It's just a tool in the toolbox. Video games famously are just a single thread/main loop kind of application.
Why does async make it harder to reason about when resources are released?
No such thing. In a preemptive multitasking OS (that's basically all of them today) you will get context switching regardless of what you do. Most modern OS's don't even give you the tools to mess with the scheduler at all; the scheduler knows best.
Because they wrote “thread per task” which I assume to mean something like “each os thread handles the work submitted by one user”.
This is beside the point but, something like io_uring is still significantly better than doing threadpool nvme io.
Linux kernel uses 8k stacks (TBH, it's been a while), but there's also some copy-on-write overhead. Still, this is not the C10k problem...
Honestly, it's a mostly obsolete approach. OS threads are fast. We have lots of cores. The cost of bouncing around on the same core and losing L1 cache coherence is higher than the cost of firing up a new OS thread that could land on a new core.
The kernel scheduler gets tuned. Language specific async runtimes are unlikely to see so many eyeballs.
Libraries like Tokio (mentioned in the article) have support for this built-in. Goroutines sidestep the issue completely. C# Tasks are batteries included in that regard. In fact function colors aren't an issue in most languages that have async/await. JavaScript is the odd one out, mostly due to being single-threaded. Can't really be made to work in a clean way in existing JS engines.
Take Rust: you return `Result<T,E>`, you are coloring your function the same way as you are when using `async`. Same for Option. Errors as return values in Go: again, function coloring.
One of your nested function starts taking a "serverUrl" input parameter instead of reading an environment variable: you've colored your function and you now need to color the entire call stack (taking the url parameter themselves).
All of them are exactly as annoying, as you need to rewrite the entire call stack's function signature to accommodate for the change, but somehow people obsess about async in particular as if it was something special.
It's not special, it's just the reflection that something can either be explicit and require changing many function signatures at once when making a change, or be implicit (with threads, exceptions or global variables) which is less work, but less explicit in the code, and often more brittle.
Async functions are colored because they force a change in the rest of the call stack, not just the caller. If you have a function nested ten levels deep and it calls a function that returns a Result, and you change that function to no longer return a result because it lost all its error cases, you only have to change the direct callers. If you are ten layers deep in a stack of synchronous functions and suddenly need to make an asynchronous call, the type signature of every individual function in the stack has to change.
You might say "well, if I'm ten layers deep in stack of functions that don't return errors and have to make a call that returns the error, well now I have to change the entire stack of functions to return the error", but that's not true. The type change from sync to async is forced. The error is not. You could just discard it. You could handle it somehow in one of the intervening calls and terminate the propagation of the type signature changes half way up. The caller might log the error and then fail to propogate it upwards for any number of reasons. You aren't being forced to this change by the type system. You may be forced to change by the rest of the software engineering situation, but that's not a "color".
For similar reasons, the article is incorrect about Go's "context.Context" being a coloration. It's just a function parameter like anything else. If you're ten layers deep into non-Context-using code and you need to call a function that takes a context, you can just pass it one with context.Background() that does nothing context-relevant. You may, for other software engineering reasons, choose to poke that use of a context up the stack to the rest of the functions. It's probably a good idea. But you're not being forced to by the type system.
"Coloration" is when you have a change to a function that doesn't just change the way it interacts with the functions that directly call it. It's when the changes forcibly propagate up the entire call stack. Not just when it may be a good idea for other reasons but when the language forces the changes.
It is not, in the maximally general sense, limited to async. It's just that sync/async is the only such color that most languages in common use expose.
well, this isn't really true - at least for Rust:
runtime.block_on(async{});
https://docs.rs/tokio/latest/tokio/runtime/struct.Handle.htm...
This is a language specific problem, not a language pattern one.
So I don't buy that async colors are fundamentally different.
Threading methodology is unrelated though. How exactly the call stack is scheduled is orthogonal to the question of whether or not making a call to a particular function results in type changes being forced on all function in the entire stack.
There may also be cases where you can take "async" code and run it entirely out of the context of any sort of sceduler, where it can simply be turned into the obvious sync code. While that does decolor the resulting call (or, if you prefer, recolor it back into the "sync" color) it doesn't mean that async is not generally a color in code where that is not an option. Solving concurrency by simply turning it off certainly has a time and place (e.g., a shell script may be perfectly happen to run "async" code completely synchronously because it may be able to guarantee nothing will ever happen concurrently), but that doesn't make the coloration problem go away when that is not an option.
Here's the list of requirements: 1. Every function has a color. 2. The way you call a function depends on its color. 3. You can only call a red function from within another red function. 4. Red functions are more painful to call. 5. Some core library functions are red.
You are complaining about point 3. You are saying if there's any way to call a red function from a blue function then it's not real. The type change from sync to async is not forced any more than changing T to Result<T,E>. You just get a Promise from the async function. So you logically think that async is not a color. You think even a Haskell IO-value can be used in a pure function if you don't actually do the IO or if you use unsafePerformIO. This is nonsense. Anything that makes the function hard to use can be color.
And result and option usually mean something else. Option is a value or none. None doesn't necessarily means the function failed. Result is the value or an error message. You can have result<option, error>
That's different then async where you can call the other type.
Like in Haskell there is the IO monad used to denote the IO effect. And there are unsafe ways to actually execute it - does that make everything in Haskell impure?
As soon as you start using function arguments instead of using a global variable, you are coloring your function in the exact same way. Yet I don't think anyone would make the case that we should stop using function arguments and use global variables instead…
Effects are another way of making functions incompatible, for better or worse. It can be done badly. Java fell into that trap with checked exceptions. They meant well, but it resulted in fragmentation.
Sometimes it’s worth making an effort to make functions more compatible by standardizing types. By convention, all functions in Go that return an error use the same type. It gives you less information about what errors can actually happen, but that means the implementation of a function can be modified to return a new error without breaking callers.
Another example is standardizing on a string type. There are multiple ways strings can be implemented, but standardization is more important.
Or if you mean that returning a new error breaks API compatibility, then yes that's the point. If now you can error in a different way, your users now need to handle that. But if it's all generic and inferred, it can still just bubble up to wherever they want to do that with no changes to middle layers.
In this way, declaring a type guides people to write calling code that doesn't break, provided you set it up that way. It makes things easier for the implementation to change.
Sometimes you do need handlers for specific errors, but in Go you always need to write generic error handling, too.
(A type variable can do something similar. It forces the implementation to be generic because the type isn't known, or is only partially known.)
Go also basically requires you to write actually generic error code (if err != nil return err), so it feels like errors are more work than they are. In Scala (especially ZIO) propagation is all automatic and you just handle it wherever you'd like. The only code that cares about errors is the part generating them and the part handling them. But it's all tracked in the type system so you can always see what exact errors are possible from what methods. And it's all simple return values so no trickiness with exceptions (e.g. across async boundaries).
Throwing the baby out with the water like what go sort of does with its error handling is no solution. The proper solution is a better type system (e.g. a result type with a generic handles what go can't).
For effects though, we need a type systems that support these - but it's only available in research languages so far. You can actually just be generic in effects (e.g. an fmap function applying a lambda to a list could just "copy" the effect of the lambda to the whole function - this can be properly written down and enforced by the compiler)
That isn't function colouring, but rather plain incompatible APIs/runtime. You could have the equivalent with non-async ecosystems.
C# async/await on top of the TPL is doing different work from JavaScript’s promise model. The C# version composes with cancellation tokens, structured exception handling, and a real thread pool underneath. JavaScript coloured the language because it had to: single-threaded runtime, no alternative. Rust’s async is closer to C++’s coroutines: a state machine at the call site, no runtime by default, executors as libraries. Three different things wearing the same syntax.
C++ shipped without async for thirty years and got along on thread pools and condition variables. The async crowd is right that thread-per-connection breaks at c10k. The thread-per-connection crowd is right that almost nobody is at c10k. Both are right about different problems.
The question that matters isn’t whether async was a mistake. It’s whether each language imported the cost (function colouring, runtime overhead, debugger pain) for the workload it actually has. JavaScript had to. C# had a strong case. Rust’s case for embedded is excellent. Python’s case is the most contested of the four and the one that gets defended hardest.
The issue arose because Javascript didn't have threads, and processing events from the DOM is naturally event driven. To be fair, it's a rare person who can deal with the concurrency issues threads introduce, but the separate stacks threads provide a huge boon. They allow you to turn event driven code into sequential code.
becomes: The latter is easy to read linear sequence of code that keeps all the concerns in one place, the former rapidly becomes a huge entangled mess of event processing functions.The history of Javascript described in the article is just a series of attempts to replace the horror of event driven code with something that looks like the sequential code found in a normal program. At any step in that sequence, the language could have introduced green threads and the job would have been done. And it would have been done without new syntax and without function colouring. But if you keep refining the original hacks they were using in the early days and don't the somewhat drastic stop of introducing a new concept to solve the problem (separate stacks), you end up where they did - at async and await. Mind you, async and await to create a separate stack of sorts - but it's implemented as a chain objects malloc'ed on the heap instead the much more efficient stack structure.
I can see how the javascript community fell into that trap - it's the boiling frog scenario. But Python? Python already had threads - and had the examples of Go and Erlang to show how well then worked compared to async / await. And as for Rust - that's beyond inexplicable. Rust has green threads in the early days and abandoned them in favour of async / await. Granted the original green thread implementation needed a bit of refinement - making every low level choose between event driven and blocking on every invocation was a mistake. Rust now has a green thread implementation that fixes that mistake, which demonstrates it wasn't that hard to do. Yet they didn't do it at the time.
It sounds like Zig with its pluggable I/O interface finally got it right - they injected I/O as a dependency injected at compile time. No "coloured" async keywords and compiler monomorphises the right code. Every library using I/O only has to be written once - what a novel concept! It's a pity it didn't happen in Rust.
There are a bunch of use cases for it outside of implementing concurrency in a single threaded runtime.
Pretty much every GUI toolkit I've ever used was single threaded event loop/GUI updates.
Green threads are a very controversial design choice that even JVM backed out of.
Having been instrumental in accelerating bringing async/await to JS, it definitely was the case that it came from C# and we eagerly were awaiting its arrival in JS and worked with Igalia to focus on it and make it happen across browsers more quickly so people could actually depend on it.
Not sure if inspired by it, but async/await is just like Haskells do-notation, except specialized for one type: Promise/Future. A bit of a shame. Do-notation works for so many more types.
- for lists, it behaves like list-comprehensions.
- for Maybes it behaves like optional chaining.
- and much more...
All other languages pile on extra syntax sugar for that. It's really beautiful that such seemingly unrelated concepts have a common core.
It's interesting to wonder about the C# world where those things were more unified.
It's also interesting to explore in C# all the existing ways that Linq syntax can be used to work with arbitrary monads and also Task<T> can be abused to use async/await syntax for arbitrary monads. (In JS, it is even easier to bend async/await to arbitrary monads given the rules of a "thenable" are real simple.)
I tried once to hack list comprehensions into JS by abusing async/await. You can monkey patch `then` onto Array and define it as flatMap and IIRC you can indeed await arrays that way, but the outer async function always returns a regular Promise. You can't force it to return an instance of the patched Array type.
Did they? Project Loom has stabilized around Java 21, no?
There was a fair bit of time between the two, to the point I'm not sure the latter can be called much of a strong motivation for the former. Green threads were removed pre-1.0 by the end of 2014 [0], while work on async/await proper started around 2017/2018 [1].
In addition, I think the decision to remove green threads might be less inexplicable than you might otherwise expect if you consider how Rust's chosen niche changed pre-1.0. Off the top of my head no obligatory runtime and no FFI/embeddability penalties are the big ones.
> Rust now has a green thread implementation that fixes that mistake
As part of the runtime/stdlib or as a third-party library?
[0]: https://github.com/rust-lang/rust/issues/17325
[1]: https://without.boats/blog/why-async-rust/
But for a long time (I think even till today despite that there is as an optional free-threaded build) CPython used Global Interpreter Lock (GIL) which paradoxically makes the programs run slower when more threads are used. It's a bad idea to allow to share all the data structure across threads in high level safe programming languages.
JS's solution is much better, it has worker threads with message passing mechanisms (copying data with structuredClone) and shared array buffers (plain integer arrays) with atomic operation support. This is one of the reasons why JavaScript hasn't suffered the performance penalty as much as Python has.
No, you appear to have no idea what you're talking about here. Rust abandoned green threads for good reason, and no, the problems were not minor but fundamental, and had to do with C interoperability, which Go sacrifices upon the altar (which is a fine choice to make in the context of Go, but not in the context of Rust). And no, Rust does not today have a green thread implementation. Furthermore, Rust's async design is dramatically different from Javascript, while it certainly supports typical back-end networking uses it's designed to be suitable for embedded contexts/freestanding contexts to enable concurrency even on systems where threads do not exist, of which the Embassy executor is a realization: https://embassy.dev/
The job wouldn’t have been done. They would have needed threads. And mutexes. And spin locks. And atomics. And semaphores. And message queues. And - in my opinion - the result would have been a much worse language.
Multithreaded code is often much harder to reason about than async code, because threads can interleave executions and threads can be preempted anywhere. Async - on the other hand - makes context switching explicit. Because JS is fundamentally single threaded, straight code (without any awaits) is guaranteed to run uninterrupted by other concurrent tasks. So you don’t need mutexes, semaphores or atomics. And no need to worry about almost all the threading bugs you get if you aren’t really careful with that stuff. (Or all the performance pitfalls, of which there are many.)
Just thinking about mutexes and semaphores gives me cold sweats. I’m glad JS went with async await. It works extremely well. Once you get it, it’s very easy to reason about. Much easier than threads.
You can't always choose to write straight code. What you're trying to do may require IO, and then that introduces concurrency, and the need for mutual exclusion or notification.
Examples: If there's a read-through cache, the cache needs some sort of lock inside of it. An async webserver might have a message queue.
The converse is also true. I've been writing some multithreaded code recently, and I don't want to or need to deal with mutexes, so, I use other patterns instead, like thread locals.
Now, for sure the async equivalents look and behave a lot better than the threaded ones. The Promise static methods (any, all, race, etc) are particularly useful. But, you could implement that for threads. I believe that this convenience difference is more due to modernity, of the threading model being, what 40, 50, 60 years old, and given a clean-ish slate to build a new model, modern language designers did better.
But it raises the idea: if we rethought OS-level preemptible concurrency today (don't call it threads!), could we modernize it and do better even than async?
I've been programming for 30 years, including over a decade in JS. You need sync primitives in JS sometimes, but they're trivial to write in javascript because the code is run single threaded and there's no preemption.
> What you're trying to do may require IO
Its usually possible to factor your code in a way that separates business logic and IO. Then you can make your business logic all completely synchronous.
Interleaving IO and logic is a code smell.
> The Promise static methods (any, all, race, etc) are particularly useful. But, you could implement that for threads. I believe that this convenience difference is more due to modernity, of the threading model being, what 40, 50, 60 years old, and given a clean-ish slate to build a new model, modern language designers did better.
Then why don't see any better designs amongst modern languages?
New languages have an opportunity to add newer, better threading primitives. Yet, its almost always the same stuff: Atomics, mutexes and semaphores. Even Rust uses the same primitives, just with a borrow checker this time. Arguably message passing (erlang, go) is better. But Go still has shared mutable memory and mutexes in its sync library.
> But it raises the idea: if we rethought OS-level preemptible concurrency today (don't call it threads!), could we modernize it and do better even than async?
I'd love to see some thought put into this. Threading doesn't seem like a winner to me.
If you use a thread per connection (or green threads like Go), you don't also need async. If you have async (eg nodejs), you can get great performance without threads. You're right that they can also be combined - either within a single process (like tokio in rust). Or via multi-process configurations (eg one nodejs instance per core, all behind nginx). But they don't need to be. Go (green threads) and Nodejs (async, single threads) both work well.
Of course we're comparing them. We all want to know who wore it better
> The job wouldn’t have been done. They would have needed threads. And mutexes. And spin locks. And atomics. And semaphores. And message queues. And - in my opinion - the result would have been a much worse language.
My point is that you do need mutexes, spin locks, etc with async as well, given that you have a multi threaded platform. So no, we have basically 2x2 stuff we are talking about with very different properties (async/sync x single/multi threaded).
That's a mischaraterization. They were abandoned because having green threads introduces non-trivial runtime. It means Rust can't run on egzotic architectures.
> It sounds like Zig with its pluggable I/O interface finally got it right
That remains to be seen. It looks good, with emphasis on looks. Who knows what interesting design constraints and limitation that entails.
Looking at comptime, which is touted as Zig's mega feature, it does come at expense of a more strictly typed system.
I understand that some devs don’t want to learn async programming. It’s unintuitive and hard to learn.
On the other hand I feel like saying “go bloody learn async, it’s awesome and massively rewarding”.
Funny, because it was supposed to be more intuitive than handling concurrency manually.
I find it interesting how in software, I repeatedly hear people saying "I should not have to learn, it should all be intuitive". In every other field, it is a given that experts are experts because they learned first.
Other fields don't have the same ability to produce unlimited incidental complexity, and therefore not the same need to rein it in. But I don't think there's any field which (as a whole) doesn't value simplicity.
Now if you take the chainsaw without spending a second thinking about learning to use it, and start using it like a manual saw... no doubt you will find it worse, but that's the wrong way to approach a chainsaw.
And I am not saying that async is "strictly better" than all the alternatives (in many situations the chainsaw is inferior to alternatives). I am saying that it is a tool. In some situations, I find it easier to express what I want with async. In others, I find alternatives better. At the end of the day, I am the professional choosing which tool I use for the job.
I never thought "programming languages are a failure, because they are not intuitive to people who don't know how to program".
My point being that I don't judge a tool by how intuitive it is to use when I don't know how to use it. I judge a tool by how useful it is when I know how to use it.
Obviously factoring in the time it took to learn it (if it takes 10 years to master a hammer, probably it's not a good hammer), but if you're fine with programming, state machines and message passing, I doubt that it will take you weeks to understand how async works. Took me less than a few hours to start using them productively.
But concurrency is hard and there's so much you syntax can do about it.
If you come from callbacks it is (almost) purely an upgrade, from threads is it more mixed.
At least in JS. I don’t find async in rust anywhere near as nice to use. But that’s a separate conversation.
After you’ve learned the paradigm and bedded it down with practice.
If you don't need the performance and scalability then it is not unreasonable to argue that async isn't worth the engineering effort.
It forces programmers to learn completely different ways of doing things, makes the code harder to understand and reason about, purely in order to get better performance.
Which is exactly the wrong thing for language designers to do. Their goal should be to find better ways to get those performance gains.
And the designers of Go and Java did just that.
Technically, promises/futures already did that in all of the mentioned languages. Async/await helped make it more user friendly, but the complexity was already there long before async/await arrived
If I want sequential execution, I just call functions like in the synchronous case and append .await. If I want parallel and/or concurrent execution, I spawn futures instead of threads and .await them. If I want to use locks across await points, I use async locks, anything else?
You can call async methods without immediately calling await. You can naively await as late as possible. They'll run in parallel, or at least how ever the call was configured.
In Javascript, promises are eager and start executing immediately. They return control back to the caller when they need to wait. So in practice, all of your promises are running concurrently as soon as you create them.
In Rust, futures are lazy don't start executing until they are awaited. You have to use various features of your chosen runtime to run multiple futures concurrently (functions like `spawn` or `select`). But that interface isn't standardized and leads to the the ecosystem fragmentation issue discussed in the article. There was an attempt to standardize the interface in the `futures` crate, but none of the major runtimes actually implement the interface.
It uses N:M threading model - where N virtual threads are mapped to M system threads and its all hidden away from you.
All the other languages just leak their abstractions to you, java quietly doesn't.
Sure, java is kinda ugly language, you can use a different JVM language, all good.
Don't get me wrong, love python, rust, dart etc, but JVM is nice for this.
I didn't coin that term, the Oxide folks did: https://rfd.shared.oxide.computer/rfd/0609. I want to emphasize that I don't think futurelocks represent a "fundamental mistake" or anything like that in Rust's async model. Instead, I believe they can be fixed reliably with a combination of some new lint rules and some replacement helper functions and macros that play nicely with the lints. The one part of async Rust that I think will need somewhat painful changes is Stream/AsyncIterator (https://github.com/rust-lang/rust/issues/79024#issuecomment-...), but those aren't yet stable, so hopefully some transition pain is tolerable there.
> The pattern scales poorly beyond small examples. In a real application with dozens of async calls, determining which operations are independent and can be parallelized requires the programmer to manually analyze dependencies and restructure the code accordingly.
I think Rust is in an interesting position here. On the one hand, running things concurrently absolutely does take deliberate effort on the programmer's part. (As it does with threads or goroutines.) But on the other hand, we have the borrow checker and its strict aliasing rules watching our back when we do choose to put in that effort. Writing any sort of Rust program comes with cognitive overhead to keep the aliasing and mutation details straight. But since we pay that overhead either way (for better or worse), the additional complexity of making things parallel or concurrent is actually a lot less.
> At the function level, adding a single i/o call to a previously synchronous function changes its signature, its return type, and its calling convention. Every caller must be updated, and their callers must be updated.
This is part of the original function coloring story in JS ("you can only call a red function from within another red function") that I think gets over-applied to other languages. You absolutely can call an async function from a regular function in Rust, by spinning up a runtime and using `block_on` or similar. You can also call a regular function from an async function by using `spawn_blocking` or similar. It's not wonderful style to cross back and forth across that boundary all the time, and it's not free either. (Tokio can also get mad at you if you nest runtimes within one another on the same thread.) But in general you don't need to refactor your whole codebase the first time you run into a mismatch here.
Java has this too.
Don't get me started on how you need to move "blocking" work to separate thread pools, including any work that has the potential to take some CPU time, not even necessarily IO. I get it, but it's another significant papercut, and your tail latency can be destroyed if you missed even one CPU-bound algorithm.
These may have been the right choices for Rust specifically, but they impair quality of life way too much in the course of normal work. A few years ago, I had hope this would all trend down, but instead it seems to have asymptoted to a miserable plateau.
But yes, once you go dining on other people's crates you definitely get the impression that you have to, because tokio gets its fingerprints all over everything.
But also there are non-thread stealing runtimes that don't require Send/Sync on the Future. Just nobody uses them.
Because Rust is the language that tokio ate.
Which ones ?
https://github.com/bytedance/monoio
https://github.com/DataDog/glommio
what bother me the most, is that aside from async, I _really_ do like the language and appreciate what its trying to do. otherwise I would just turn away from the whole mess. this just landed really badly.
This isn't true at all. The Send+Sync+'static is basically the limitation of tokio::spawn
https://emschwartz.me/async-rust-can-be-a-pleasure-to-work-w...
Change the executor, and the bound changes.
> Async ruined Rust for me, even though I write exactly the kind of highly concurrent servers to which it's supposed to be perfectly suited. It degrades API
It refers to async in Rust, and everyone else is responding as if it is talking about async in Rust. That's a mischaracterization. You don't have to use the Tokio executor.
It's a bit like saying, "Graphics in Rust are ruined. I need to use Vulkan to make my game."
No, you don't. If your use case is unique enough, use something else. There are bindings for OpenGL, DirectX, etc.
Almost. JavaScript adopted async because it was a programming language designed to slot into someone else's event loop. Other programming languages, at least on the server, that needed lightweight threading didn't bother with any of this, they just shipped their own managed stacks. But UI code practically demands to own its own event loop and requires everything else live as callbacks inside of it. And JavaScript, because it was designed to live in a browser, inherited these same semantics.
My trite slop bashing was days ago:
https://news.ycombinator.com/item?id=47862726
This whole thing is basically snake oil. The best thing backend can do instead is have dedicated thread pool where each real thread has its own queue of limited size. Each element in queue would contain input and output state of request and code to deal with those. Once queue grows over certain size the backend should simply immediately return error code (too busy). Much more sound strategy in my opinion.
There are more complex cases of course (like computationally expensive requests with no io that take long time). Handling those would require some extra logic. Async stuff however will not help here either
1. It makes asynchronous programming look synchronous. I do not like things being other than they appear. The point was touched on with the:
example, but it is a serious thing when the v mental model is wrong2. On a related issue CPU bound code can block the thread of execution and stop any concurrency in its tracks
In Rust there is the added problem of shoehorning it into the memory model which has lead to a lot of hairy code and tortured paradigms (e.g. pin)
I play around with real time audio, and use state machine/event loop. A very powerful, if verbose, method to do real-time programming, I cannot see how asyc/await could achieve the same ends
The next decade will be a proliferation of hackers having fun with io_uring coming up with all sorts of patterns.
Solved at the OS layer, sure. This article is about solving it at the language layer.
It's been solved for a long time at the OS layer. I mean, didn't VAX have something like overlapped IO in 1980? Unix had asynchronous IO as well with `select`.
It's 2026 and I'm starting to hate the internet.
Sure, it comes with its own issues like large stacks (10k copy of nearly the same stack?) and I predict a memory coloring in the future (stack variables or even whole frames that can opt out from being copied across virtual threads).