Most of my programming these days is on Cortex-M, using Rust. The article is another prior in my suspicion that Async embedded Rust is designed with toy use cases in mind (blinky); I don't know how it would work in practical firemware because I haven't seen examples.
Another possibility is that my brain and learning patterns don't work well with Async. I have a hard time understanding it, and it feels like a layer of obfuscation or misdirection. It's an abstraction that may be more suitable for other learning styles, or different program structures or requirements than the ones I've worked on.
I'd love to see an RTOS for Rust, but am worried it will be Async.
(I am referring to the Async/Await pattern; not asynchronous programming or concurrency)
At work, we have a non-async RTOS https://hubris.oxide.computer/ though it is a bit harder to use than "add a dependency in Cargo.toml" like most projects.
The original designer of Hubris, Cliff, has his own async RTOS that he uses for personal projects. He recently has been writing some blog posts on this that may be of interest to you:
Re Rtic: I'm using the V1 of it, which is non-Async; it seems to offer smarter locking than using critical sections + Mutex, since it's capable of interrupting a lower-priority task with a higher one, depending on which resources are locked.
Async Rust does definitely work for non-toy use cases. As a data point, we use Embassy for all production firmware at my startup (https://akiles.app/en), using async tasks for everything: Bluetooth, TCP/IP networking, motor control, user interface (LEDs, keypad), a key-value database in flash, stats collection... Async helps with battery life too since it allows putting the core to sleep when no task has work to do, it allows us to build devices with 1-2 years of battery life.
There's other companies using Embassy in production. Sadly firmwares are usually not open source. There's a few non-toy open-source projects using Embassy though:
RTIC (Real Time Interrupt-driven Concurrency) is an RTOS written in Rust with specific hardware acceleration for Cortex-M devices. Well, depending on what your definition of an RTOS really is. If you don't like async then its a pretty good bet to get concurrency working on an Arm Cortex-M microcontroller.
This article about async in Python helped me understand it pretty well, since it explains them in terms of coroutines, which are very intuitive for me: https://mleue.com/posts/yield-to-async-await/
Another thing that helps me get it is comparing it to the continuation passing style, where you never return from a function, you just take an argument that's basically a function pointer bound to an environment, and at the end of the function, instead of returning, you call your input function, giving it another function and environment as input, repeating the cycle. It's very similar to the transformation of callbacks within callbacks within callbacks pattern in JavaScript to the async/await pattern.
The thing is, we tried coroutines in C (embedded) y-e-a-r-s ago. It was all the rave for a bit. There were a couple different macro/libraries you could use with duff's device and other trickery to get coroutine-ish things in C.
Maybe the implementation just wasn't up to where it needs to be with these newer/slicker/more integrated versions, but mine (and others') issues with them wasn't the weakness/caveats of the implementation, but rather with the mess of spaghetti it made as your coroutine use grew with any degree. In onesie twosies under nice demo cases (look ma, I get some data from the intertubes with this syncy thing), they're great, but my experience was that they're a mess when scaled.
I'm happy to be proven wrong. I get to use them a bunch in Kotlin, I'm trying not to be a victim of my experience. I'm still on the fence.
> but mine (and others') issues with them wasn't the weakness/caveats of the implementation, but rather with the mess of spaghetti it made as your coroutine use grew with any degree.
Rust async is a bit different than in other languages. It's more like sugar over state machines instead of sugar over callbacks.
This is what makes it work nicely on embedded. The compiler-generated state machines are structs with fixed size so they can be statically allocated. Callbacks would have to be heap-allocated and garbage-collected/refcounted.
> It's more like sugar over state machines instead of sugar over callbacks
they are equivalent [1]. There are scheme compilers (a language with have first class continuations and often heap allocated stack frames) that compile everything down to a giant C switch statement.
[1] well, continuations are strictly more powerful of course, but the stackless subset needed for async/await is the same.
This misses the actual async part, which is more like polling a task queue. Callback hell sugar is only a thing in langues that already have event loops built in (i.e. JavaScript, which I assume is also what you're referring to)
The point is that it actually is insightful to think of it as callback sugar. That will give you better understanding of how the threading is handled when the calling method yields to the callee and conversely how it must be different when the caller isn't yielded.
It generally applies to eventloops, not just to lanugages with a builtin event loop. E.g. it certainly also applies to raw boost::asio which uses callbacks, libuv, libevent, QT, GObject, Netty, etc.
There are no threading implications to async in Rust. The executor you're using may add some requirements on your futures because it wants to run them on multiple threads[1], but that's not related to async itself, and you can always use a single threaded executor if you don't want these limitations (and they doesn't apply to embeded anyway).
[1] namely, your futures will need to be Send + 'static.
I'm not as familiar with Rusts's implementation but even in C# it's mostly true. Threads are only hit when a callback is not directly awaited. There's more to it but it starts you down the right path, I think.
Has anyone seriously tried structured concurrency for single-threaded async Rust? The pattern where main effectively leaks a task seems kind of gross to me, and it’s exactly what structured concurrency tries to solve.
(I realize that Rust’s async is inherently semi-structured.)
You can use futures combinators like `join`, `select`, `with_timeout` with async Rust. (crates like `embassy-futures` or `futures` have implementations of these). They work nicely even in no-std embedded.
They're different tools for different use cases. Structured concurrency is nice when doing related actions concurrently where one might need to cancel the other, while unstructured task spawning makes more sense if they're truly unrelated tasks, that live for the entire duration of the program or where you don't care for how long they live (for example concurrently handling requests in a server).
There's also the simplicity of bare embedded in Rust, because you can mostly[1] use Cargo, depend on the right embedded_hal crates for your target, and be up and running with a hardware-backed async runtime and everything. Zephyr, Tock, Mynewt, etc. all have bespoke build/deployment systems and driver ecosystems. That said, RTOS makes actually deploying working products much more realizable, so pick your poison.
1: I think you still need xargo/cargo-xbuild for no-std embedded sysroots, but there may have been some upstream development on that front since I last played in this sandbox.
Embedded-HAL is interesting to me - in principle it's a great concept; writing hardware-agnostic, portable code.
I don't use it in my firmwares because:
- All of the embedded-HAL libs I've found (eg drivers) have had notable limitations or ergonomics problems.
- It appears unsuitable for use with DMA, which I use for all runtime IO
- The APIs tend to be a mess due to heavy use of typestates
> It appears unsuitable for use with DMA, which I use for all runtime IO
The `embedded-hal-async` traits can be implemented with DMA, the Embassy HALs do so. This is a problem with the `nb` traits only.
> The APIs tend to be a mess due to heavy use of typestates
This is an issue of some HAL crates implementing the traits, not with the `embedded-hal` traits themselves. I also dislike the heavy use of typestates/generics, Embassy tries to implement the traits while keeping typestates at a minimum.
Building core/std/whatever using unstable flags and --target works for me these days. I can do everything with cargo and make it even more ergonomic with a toolchain file.
Now if Cargo had a way of producing multiple variants of the same project (e.g. different --targets with pre-set features) just with `cargo build` then things would be a lot better.
As of now, a Makefile is still needed to bring it all together.
> I think you still need xargo/cargo-xbuild for no-std embedded sysroots, but there may have been some upstream development on that front since I last played in this sandbox.
Cargo's build-std flag is the bespoke way now. Unstable, but working it's way to stable. AFAIK cargo/build aren't getting any more development in lieu of build-std.
> you can mostly[1] use Cargo, depend on the right embedded_hal crates for your target, and be up and running with a hardware-backed async runtime and everything
Can one use Rust's embedded_hal from a different programming language?
If you're asking why do coroutines exist, it's because context switching in kernel space and synchronization have very high overhead by comparison. You use those features when you want parallelism, not concurrency.
For parallelism, you also usually want a userspace task scheduler because simply spawning threads in kernelspace is much slower. Cilk and Go provide this built into their language runtime.
we seem to be talking about cooperative multitasking here
correct me if i'm wrong, but i think that even with an almost standard abi it's five instructions and 27 cortex-m3 clock cycles; here we reserve r10 for the current task pointer (with the bonus that you can use it for a tls base register) and round-robin among the tasks in a circularly linked list:
yield: push {r4-r9, r11, lr} @ save all callee-saved regs except r10
str sp, [r10], #4 @ save stack pointer in current task
ldr r10, [r10] @ load pointer to next task
ldr sp, [r10] @ switch to next task's stack
pop {r4-r9, r11, pc} @ return into yielded context there
i think in gcc that's more or less -mpic-register=r10 -msingle-pic-base, though that isn't what those options are meant for
if you're willing and able to deform the abi a little by having less callee-saved registers, something you might want to do anyway for efficiency, you can cut r4-r9 down to r4-r5 and shave that down to i think 19 clocks? if you're willing to eliminate callee-saved registers entirely you can get to 15, but that seems like a heavy price to pay
if you have to use the standard abi it's i think 37 clocks on a cortex-m3 or better:
i feel like something in the range of 20-40 clocks is really not that expensive? i mean an ldr or str is 3 or 4 clocks if it's not accessing tcm, right? and on cortex-m0 every taken branch is 3 clocks because you don't have branch prediction?
i mean it's not going to compete with barrel processors on context switch time, but if it's really the cost of accessing seven fields of structs on the heap, it hardly seems like it's "a hell of a lot more expensive" than even zero
if we're talking about preemptive context switching, that's a different ball of wax; you have to save all the registers that you use at all, not just callee-saved registers, and depending on your application maybe vfp registers as well as integer registers
i should clarify that while i have tested both of these context-switching subroutines, and they do seem to work in simple programs, i haven't tested them on actual cortex-m hardware, much less measured their real-world performance
I believe that Tock (tockos.org) and Theseus (https://github.com/theseus-os/Theseus) are in this area a bit as well, just from an actual OS perspective.
I don't know much about this area, but it would be wonderful if these could work with the Libre compute boards, like the AM Logic S905X (Lepotato) or the Rock chip, since they're so much cheaper than a Pi.
Would be great to get a dedicated simple system up to toy with Rust OS for only 30$.
I think the nature of microcontroller programming is already asynchronous. I mean, it's not that the only alternative to "async Rust" is an RTOS; a classic loop program with interrupts is already asynchronous. The only difference is you have to decide how you want to waste time waiting for events, and the same "energy saving" effect can also be achieved with a simple state machine and __WFI(), resulting in cleaner code with no dependencies whatsoever.
Also, it seems to me that Embassy takes over interrupt handlers? If that's the case, what if I want to deal with interrupts in a different way? Or what happens with interrupts that deal with many pins, like EXTI9_5_IRQHandler(), as if I want to use pin 9 with Async Rust, but the rest as a regular interrupt?
Yeah, you can always write async code by hand. It's a pain because you have to manually yield (as in, save your context, return to the main loop, and poll for the event to complete yourself) every time you wait for IO or a timer. For simple enough systems it's managable, in larger ones it quickly becomes a pain.
So, cooperative multitasking with no control of actual scheduling? Cute, but a real RTOS is the solution to most any "i need to do more than one thing" in the embedded world.
You can run multiple executors at different interrupt priority levels (with multiple tasks per executor), which allows tasks on the higher priority executor to interrupt other tasks. Here's an example https://github.com/embassy-rs/embassy/blob/main/examples/nrf...
The cursive italics are apparently a feature of the Victor Mono [1] font used for the full page. While it'd be amusing in Tumblr context (where cursive is used for hyperbolic emphasis), I can't fathom why one would consider it in a code context.
You can change it (at least on Safari) by going into developer tools, clicking any node, and removing "Victor Mono" from --font-family
Trying to shoehorn Rust this and Rust that into every freaking thing makes more problems than it solves. Case in point is async rust in microcontrollers. Looks ugly as a sin.
Cooperative multitasking in embedded has to be the worst idea ever. Anyone who has done any serious work in embedded systems will tell you how bad is it when a faulty task blocks everything. Please stop advertising async in embedded.
It really comes down to memory requirements. If you can afford to give every task it's own stack and dispatch it directly from prioritized interrupts, that's great. Even better if you can use memory protection hardware to isolate the tasks from each other.
But if you have many long running tasks that only need to keep a handful of bytes of state when they're waiting, a mechanism like rust async can allow for huge memory savings while mostly retaining the same code style as stackful tasks.
And you can even use both approaches in the same program, with isolated stacks for realtime critical tasks and cooperative state machines for everything else.
Watchdogs shouldn't be triggered in the normal course of events. They are a last line of defense for exceptional circumstances that will render a device inoperative, not a cure-all for poor system design.
Sure, but a system where a task has ceased to execute is in pretty much all circumstances a system where the watchdog (or some other assert) should trigger. (the whole toyota case involved them getting raked over the coals because their system did not in fact detect a failed task and do this).
I... know. I'm saying that a stalled chip is a stalled chip. I don't know why async would stall your chip randomly unless you had a bug, which would stall the chip anyway.
Yikes, no kidding. Triggering a watchdog is a “stop the dev cycle and everyone figure out what show-stopping bug is lurking in our code” time.
I have no desire to use embedded rust async, I use C++ with none of the allocating containers, and honestly it’s more of a C with classes style than modern C++, plus FreeRTOS in my production embedded code. I would happily trade out C++ for the memory safety of rust.
I would pick up a lightweight RTOS written in non-async rust if one existed and I could have faith in it, but other than personal projects I can’t advocate for building a hardware project on something like this yet.
Actually I never found cooperative multitasking a real issue. You don't want faulty tasks in your embedded application anyway. With cooperative multitasking it at least is easy to spot which task is the culprit.
If I understand the article correctly, it seems that in this case the multitasking also isn't controlled by calling yield in custom user code, but rather always from the implementation that makes async/await work.
> You don't want faulty tasks in your embedded application anyway.
Faults are a way of life in software, they are unavoidable because each and every piece of software rests on a bunch of assumptions and if any one of those or a combination of them do not hold then you have a fault. Failure to envision that fault and a way to deal with it in embedded systems can cause damage to property, injury and loss of life.
None of this is simple, not in theory and definitely not in practice.
Sure, but an RTOS doesn't help you much with safety critical guarantees. When you're worrying about that you're also worrying about redundancy of the hardware your software is running on, for example. The only thing that an RTOS is really helpful for is making it a bit easier to argue that some realtime guarantees can be met even if other code running on the device may have worst-case runtimes longer than your deadline (this doesn't really become a mitigation for a truly defective task, though, since at that point all bets are off unless you have some good task isolation). But this is not all embedded systems, not even all safety-critical ones, and there are other solutions available if you are not using an RTOS which can give you the same or better options in that case(like placing the critical code in a high-priority interrupt handler, for example).
Another possibility is that my brain and learning patterns don't work well with Async. I have a hard time understanding it, and it feels like a layer of obfuscation or misdirection. It's an abstraction that may be more suitable for other learning styles, or different program structures or requirements than the ones I've worked on.
I'd love to see an RTOS for Rust, but am worried it will be Async.
(I am referring to the Async/Await pattern; not asynchronous programming or concurrency)