Hacker Timesnew | past | comments | ask | show | jobs | submitlogin
Asynchronous Rust on Cortex-M microcontrollers (memfault.com)
187 points by picture on July 19, 2023 | hide | past | favorite | 68 comments


Most of my programming these days is on Cortex-M, using Rust. The article is another prior in my suspicion that Async embedded Rust is designed with toy use cases in mind (blinky); I don't know how it would work in practical firemware because I haven't seen examples.

Another possibility is that my brain and learning patterns don't work well with Async. I have a hard time understanding it, and it feels like a layer of obfuscation or misdirection. It's an abstraction that may be more suitable for other learning styles, or different program structures or requirements than the ones I've worked on.

I'd love to see an RTOS for Rust, but am worried it will be Async.

(I am referring to the Async/Await pattern; not asynchronous programming or concurrency)


At work, we have a non-async RTOS https://hubris.oxide.computer/ though it is a bit harder to use than "add a dependency in Cargo.toml" like most projects.

The original designer of Hubris, Cliff, has his own async RTOS that he uses for personal projects. He recently has been writing some blog posts on this that may be of interest to you:

* http://cliffle.com/blog/async-inversion/

* http://cliffle.com/blog/composed-concurrency-in-drivers/

The former is a general introduction to async, and the second is about applying it to write an I2C driver.

There already are some general RTOSes for Rust, and they do tend to use async:

* https://rtic.rs/2/book/en/

* https://github.com/embassy-rs/embassy (this one is used in the article)

* https://tockos.org/


Cliff's articles are outstanding!

Re Rtic: I'm using the V1 of it, which is non-Async; it seems to offer smarter locking than using critical sections + Mutex, since it's capable of interrupting a lower-priority task with a higher one, depending on which resources are locked.


Async Rust does definitely work for non-toy use cases. As a data point, we use Embassy for all production firmware at my startup (https://akiles.app/en), using async tasks for everything: Bluetooth, TCP/IP networking, motor control, user interface (LEDs, keypad), a key-value database in flash, stats collection... Async helps with battery life too since it allows putting the core to sleep when no task has work to do, it allows us to build devices with 1-2 years of battery life.

There's other companies using Embassy in production. Sadly firmwares are usually not open source. There's a few non-toy open-source projects using Embassy though:

- https://github.com/nviennot/turbo-resin

- https://github.com/ivmarkov/ruwm

- https://github.com/dkhayes117/propane-monitor-embassy

(disclaimer: Embassy maintainer, Akiles CTO here)


RTIC (Real Time Interrupt-driven Concurrency) is an RTOS written in Rust with specific hardware acceleration for Cortex-M devices. Well, depending on what your definition of an RTOS really is. If you don't like async then its a pretty good bet to get concurrency working on an Arm Cortex-M microcontroller.

https://rtic.rs/1/book/en/preface.html


This article about async in Python helped me understand it pretty well, since it explains them in terms of coroutines, which are very intuitive for me: https://mleue.com/posts/yield-to-async-await/

Another thing that helps me get it is comparing it to the continuation passing style, where you never return from a function, you just take an argument that's basically a function pointer bound to an environment, and at the end of the function, instead of returning, you call your input function, giving it another function and environment as input, repeating the cycle. It's very similar to the transformation of callbacks within callbacks within callbacks pattern in JavaScript to the async/await pattern.


The thing is, we tried coroutines in C (embedded) y-e-a-r-s ago. It was all the rave for a bit. There were a couple different macro/libraries you could use with duff's device and other trickery to get coroutine-ish things in C.

Maybe the implementation just wasn't up to where it needs to be with these newer/slicker/more integrated versions, but mine (and others') issues with them wasn't the weakness/caveats of the implementation, but rather with the mess of spaghetti it made as your coroutine use grew with any degree. In onesie twosies under nice demo cases (look ma, I get some data from the intertubes with this syncy thing), they're great, but my experience was that they're a mess when scaled.

I'm happy to be proven wrong. I get to use them a bunch in Kotlin, I'm trying not to be a victim of my experience. I'm still on the fence.


Those libraries were always somewhat of a hack. Async Rust is an official language-backed syntax.


OP did say:

> but mine (and others') issues with them wasn't the weakness/caveats of the implementation, but rather with the mess of spaghetti it made as your coroutine use grew with any degree.


Once you realize async/await is just sugar over the familiar callback hell, a lot of the mystery fades away and it's easier to groc.


Rust async is a bit different than in other languages. It's more like sugar over state machines instead of sugar over callbacks.

This is what makes it work nicely on embedded. The compiler-generated state machines are structs with fixed size so they can be statically allocated. Callbacks would have to be heap-allocated and garbage-collected/refcounted.

(disclaimer: Embassy maintainer here)


> It's more like sugar over state machines instead of sugar over callbacks

they are equivalent [1]. There are scheme compilers (a language with have first class continuations and often heap allocated stack frames) that compile everything down to a giant C switch statement.

[1] well, continuations are strictly more powerful of course, but the stackless subset needed for async/await is the same.


This misses the actual async part, which is more like polling a task queue. Callback hell sugar is only a thing in langues that already have event loops built in (i.e. JavaScript, which I assume is also what you're referring to)


The point is that it actually is insightful to think of it as callback sugar. That will give you better understanding of how the threading is handled when the calling method yields to the callee and conversely how it must be different when the caller isn't yielded.


It generally applies to eventloops, not just to lanugages with a builtin event loop. E.g. it certainly also applies to raw boost::asio which uses callbacks, libuv, libevent, QT, GObject, Netty, etc.


That’s true in JS but less so in a language like Rust where there are threading implications.


There are no threading implications to async in Rust. The executor you're using may add some requirements on your futures because it wants to run them on multiple threads[1], but that's not related to async itself, and you can always use a single threaded executor if you don't want these limitations (and they doesn't apply to embeded anyway).

[1] namely, your futures will need to be Send + 'static.


I'm not as familiar with Rusts's implementation but even in C# it's mostly true. Threads are only hit when a callback is not directly awaited. There's more to it but it starts you down the right path, I think.


Or realize CPS, channel and actor model are basically equivalent.


    spawner.spawn(blinky(led)).unwrap();
Has anyone seriously tried structured concurrency for single-threaded async Rust? The pattern where main effectively leaks a task seems kind of gross to me, and it’s exactly what structured concurrency tries to solve.

(I realize that Rust’s async is inherently semi-structured.)


IIRC been some proposals floating around, but one major block is asncy drop. This is the article I'm thinking of specifically: https://blog.yoshuawuyts.com/tree-structured-concurrency/


You can use futures combinators like `join`, `select`, `with_timeout` with async Rust. (crates like `embassy-futures` or `futures` have implementations of these). They work nicely even in no-std embedded.

They're different tools for different use cases. Structured concurrency is nice when doing related actions concurrently where one might need to cancel the other, while unstructured task spawning makes more sense if they're truly unrelated tasks, that live for the entire duration of the program or where you don't care for how long they live (for example concurrently handling requests in a server).

(disclaimer: Embassy maintainer here)


I could be wrong, but isn't every multi-task / RTOS essentially supporting this? Wake up, check flags, do work if required.

Of course if you're bare-metal programming, it might be nice to have a lightweight mechanism for this.


There's also the simplicity of bare embedded in Rust, because you can mostly[1] use Cargo, depend on the right embedded_hal crates for your target, and be up and running with a hardware-backed async runtime and everything. Zephyr, Tock, Mynewt, etc. all have bespoke build/deployment systems and driver ecosystems. That said, RTOS makes actually deploying working products much more realizable, so pick your poison.

1: I think you still need xargo/cargo-xbuild for no-std embedded sysroots, but there may have been some upstream development on that front since I last played in this sandbox.


Embedded-HAL is interesting to me - in principle it's a great concept; writing hardware-agnostic, portable code.

I don't use it in my firmwares because:

  - All of the embedded-HAL libs I've found (eg drivers) have had notable limitations or ergonomics problems.
  - It appears unsuitable for use with DMA, which I use for all runtime IO
  - The APIs tend to be a mess due to heavy use of typestates


> It appears unsuitable for use with DMA, which I use for all runtime IO

The `embedded-hal-async` traits can be implemented with DMA, the Embassy HALs do so. This is a problem with the `nb` traits only.

> The APIs tend to be a mess due to heavy use of typestates

This is an issue of some HAL crates implementing the traits, not with the `embedded-hal` traits themselves. I also dislike the heavy use of typestates/generics, Embassy tries to implement the traits while keeping typestates at a minimum.

(disclaimer: Embassy maintainer here)


Building core/std/whatever using unstable flags and --target works for me these days. I can do everything with cargo and make it even more ergonomic with a toolchain file.

Now if Cargo had a way of producing multiple variants of the same project (e.g. different --targets with pre-set features) just with `cargo build` then things would be a lot better.

As of now, a Makefile is still needed to bring it all together.


We were using cargo-make for something like this, though eventually we switched to using Cargo's experimental support for artifact dependencies: https://doc.rust-lang.org/nightly/cargo/reference/unstable.h...


> I think you still need xargo/cargo-xbuild for no-std embedded sysroots, but there may have been some upstream development on that front since I last played in this sandbox.

Cargo's build-std flag is the bespoke way now. Unstable, but working it's way to stable. AFAIK cargo/build aren't getting any more development in lieu of build-std.


That's good to hear, I've used that in the past to drive down codesize even on non embedded targets by compiling for size and rebuilding the stdlib.


> you can mostly[1] use Cargo, depend on the right embedded_hal crates for your target, and be up and running with a hardware-backed async runtime and everything

Can one use Rust's embedded_hal from a different programming language?


IIUC you can make some api in rust that uses the HAL, which ties your code into board specific stuff.

That api you write can be exposed via C abi so you can link C or languages that have an FFI to C to the rust code. https://docs.rust-embedded.org/book/interoperability/rust-wi...


You call this ugly ass mess of a code simple? yeah right!


If you're asking why do coroutines exist, it's because context switching in kernel space and synchronization have very high overhead by comparison. You use those features when you want parallelism, not concurrency.

For parallelism, you also usually want a userspace task scheduler because simply spawning threads in kernelspace is much slower. Cilk and Go provide this built into their language runtime.


Context switches on Cortex-M are relatively cheap with optimizations to limit the amount of data that has to be stacked.


Relative to what? They might be cheaper than some architectures, but they're still a hell of a lot more expensive than not doing them.


we seem to be talking about cooperative multitasking here

correct me if i'm wrong, but i think that even with an almost standard abi it's five instructions and 27 cortex-m3 clock cycles; here we reserve r10 for the current task pointer (with the bonus that you can use it for a tls base register) and round-robin among the tasks in a circularly linked list:

    yield:  push {r4-r9, r11, lr}   @ save all callee-saved regs except r10
            str sp, [r10], #4       @ save stack pointer in current task
            ldr r10, [r10]          @ load pointer to next task
            ldr sp, [r10]           @ switch to next task's stack
            pop {r4-r9, r11, pc}    @ return into yielded context there
i think in gcc that's more or less -mpic-register=r10 -msingle-pic-base, though that isn't what those options are meant for

if you're willing and able to deform the abi a little by having less callee-saved registers, something you might want to do anyway for efficiency, you can cut r4-r9 down to r4-r5 and shave that down to i think 19 clocks? if you're willing to eliminate callee-saved registers entirely you can get to 15, but that seems like a heavy price to pay

if you have to use the standard abi it's i think 37 clocks on a cortex-m3 or better:

    yield:  push {r4-r12, lr}
            ldr r9, =current_task_pointer
            ldr r10, [r9]
            str sp, [r10], #4
            ldr r10, [r10]
            str r10, [r9]
            ldr sp, [r10]
            pop {r4-r12, pc}
i feel like something in the range of 20-40 clocks is really not that expensive? i mean an ldr or str is 3 or 4 clocks if it's not accessing tcm, right? and on cortex-m0 every taken branch is 3 clocks because you don't have branch prediction?

i mean it's not going to compete with barrel processors on context switch time, but if it's really the cost of accessing seven fields of structs on the heap, it hardly seems like it's "a hell of a lot more expensive" than even zero

if we're talking about preemptive context switching, that's a different ball of wax; you have to save all the registers that you use at all, not just callee-saved registers, and depending on your application maybe vfp registers as well as integer registers


i should clarify that while i have tested both of these context-switching subroutines, and they do seem to work in simple programs, i haven't tested them on actual cortex-m hardware, much less measured their real-world performance


It's different and very old story.

Protothreads are classic now: https://dunkels.com/adam/pt/index.html

Contiki OS is also a piece of embedded development classic: https://en.wikipedia.org/wiki/Contiki

In general, it's more about syntax than semantics. Semantics is generally difficult and rarely changes.


I believe that Tock (tockos.org) and Theseus (https://github.com/theseus-os/Theseus) are in this area a bit as well, just from an actual OS perspective.

I don't know much about this area, but it would be wonderful if these could work with the Libre compute boards, like the AM Logic S905X (Lepotato) or the Rock chip, since they're so much cheaper than a Pi. Would be great to get a dedicated simple system up to toy with Rust OS for only 30$.


I think the nature of microcontroller programming is already asynchronous. I mean, it's not that the only alternative to "async Rust" is an RTOS; a classic loop program with interrupts is already asynchronous. The only difference is you have to decide how you want to waste time waiting for events, and the same "energy saving" effect can also be achieved with a simple state machine and __WFI(), resulting in cleaner code with no dependencies whatsoever.

Also, it seems to me that Embassy takes over interrupt handlers? If that's the case, what if I want to deal with interrupts in a different way? Or what happens with interrupts that deal with many pins, like EXTI9_5_IRQHandler(), as if I want to use pin 9 with Async Rust, but the rest as a regular interrupt?

And Pin type is a poor name choice for embedded.

Also:

    ExtiInputFuture::new(self.pin.pin.pin.pin(), self.pin.pin.pin.port(), true, false).await


Yeah, you can always write async code by hand. It's a pain because you have to manually yield (as in, save your context, return to the main loop, and poll for the event to complete yourself) every time you wait for IO or a timer. For simple enough systems it's managable, in larger ones it quickly becomes a pain.


So, cooperative multitasking with no control of actual scheduling? Cute, but a real RTOS is the solution to most any "i need to do more than one thing" in the embedded world.


You can run multiple executors at different interrupt priority levels (with multiple tasks per executor), which allows tasks on the higher priority executor to interrupt other tasks. Here's an example https://github.com/embassy-rs/embassy/blob/main/examples/nrf...


I new all that time I spent programming the Mac back in 1987 would come in handy. Asnyc programming - cooperative multitasking never go out of style.


Edit: commented on wrong post, meant for https://hackertimes.com/item?id=36791506

Offtopic:

The cursive italics are apparently a feature of the Victor Mono [1] font used for the full page. While it'd be amusing in Tumblr context (where cursive is used for hyperbolic emphasis), I can't fathom why one would consider it in a code context.

You can change it (at least on Safari) by going into developer tools, clicking any node, and removing "Victor Mono" from --font-family

[1] https://rubjo.github.io/victor-mono/


That may be specific to your machine, I don't see any italics on mine, and the CSS just looks like `font-family: 'Source Code Pro', monospace;`.


Trying to shoehorn Rust this and Rust that into every freaking thing makes more problems than it solves. Case in point is async rust in microcontrollers. Looks ugly as a sin.


Cooperative multitasking in embedded has to be the worst idea ever. Anyone who has done any serious work in embedded systems will tell you how bad is it when a faulty task blocks everything. Please stop advertising async in embedded.


It really comes down to memory requirements. If you can afford to give every task it's own stack and dispatch it directly from prioritized interrupts, that's great. Even better if you can use memory protection hardware to isolate the tasks from each other.

But if you have many long running tasks that only need to keep a handful of bytes of state when they're waiting, a mechanism like rust async can allow for huge memory savings while mostly retaining the same code style as stackful tasks.

And you can even use both approaches in the same program, with isolated stacks for realtime critical tasks and cooperative state machines for everything else.


... anything faulty in an embedded world can block the chip.

This is why watchdog timers exist. Async does not shut off watchdog. Not sure what your point is.


Watchdogs shouldn't be triggered in the normal course of events. They are a last line of defense for exceptional circumstances that will render a device inoperative, not a cure-all for poor system design.


Sure, but a system where a task has ceased to execute is in pretty much all circumstances a system where the watchdog (or some other assert) should trigger. (the whole toyota case involved them getting raked over the coals because their system did not in fact detect a failed task and do this).


I... know. I'm saying that a stalled chip is a stalled chip. I don't know why async would stall your chip randomly unless you had a bug, which would stall the chip anyway.


As I said, anyone with serious embedded experience will understand my point. People who think triggering watchdog is a normal thing won’t.


Yikes, no kidding. Triggering a watchdog is a “stop the dev cycle and everyone figure out what show-stopping bug is lurking in our code” time.

I have no desire to use embedded rust async, I use C++ with none of the allocating containers, and honestly it’s more of a C with classes style than modern C++, plus FreeRTOS in my production embedded code. I would happily trade out C++ for the memory safety of rust.

I would pick up a lightweight RTOS written in non-async rust if one existed and I could have faith in it, but other than personal projects I can’t advocate for building a hardware project on something like this yet.


I never said that, please stop the "no true scotsman" tone. I'm aware of what a watchdog timer is and what its purpose is.

My point is that async doesn't introduce stalls unless you have a bug - a bug that could introduce a stall anyway. I don't see how async changes that.

I write firmware and operating systems for a living, by the way.


Think of a watchdog timer in the same way you think about that big red handle on a train, 'emergency use only, abuse will be punished'.


You have a task. You do an infinite loop in the task. The task is now blocking lower prio tasks in the RTOS. If the task is broken, it's broken.


I know what a watchdog timer is. I don't see how your point refutes mine. A stall is a stall. Async doesn't change that.


Actually I never found cooperative multitasking a real issue. You don't want faulty tasks in your embedded application anyway. With cooperative multitasking it at least is easy to spot which task is the culprit.

If I understand the article correctly, it seems that in this case the multitasking also isn't controlled by calling yield in custom user code, but rather always from the implementation that makes async/await work.


> You don't want faulty tasks in your embedded application anyway.

Faults are a way of life in software, they are unavoidable because each and every piece of software rests on a bunch of assumptions and if any one of those or a combination of them do not hold then you have a fault. Failure to envision that fault and a way to deal with it in embedded systems can cause damage to property, injury and loss of life.

None of this is simple, not in theory and definitely not in practice.


Sure, but an RTOS doesn't help you much with safety critical guarantees. When you're worrying about that you're also worrying about redundancy of the hardware your software is running on, for example. The only thing that an RTOS is really helpful for is making it a bit easier to argue that some realtime guarantees can be met even if other code running on the device may have worst-case runtimes longer than your deadline (this doesn't really become a mitigation for a truly defective task, though, since at that point all bets are off unless you have some good task isolation). But this is not all embedded systems, not even all safety-critical ones, and there are other solutions available if you are not using an RTOS which can give you the same or better options in that case(like placing the critical code in a high-priority interrupt handler, for example).


If at all possible for such stuff I would use an FPGA and not software.


The WLCSP Lattice FPGAs are waiting for you! Only 2€ a pop for 1k LUTs!


> With cooperative multitasking it at least is easy to spot which task is the culprit.

Yeah I would love to spot that my car’s LED task is faulty while trying to brake on a 100mph highway.


Judging by your pushy comments throughout this thread I'm not really convinced you know what Rust async actually does.


There are many domains in embedded. You're probably referring to safety-critical and hard real-time contexts?

Because cooperative multitasking is bog-standard in OS kernels.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: