I didn't dive that deep into WebRTC configuration yet. But I think it can be achieved with some sophisticated SDP transform configurations.
I have an idea how to make it more usable with with current setup.
There is a new feature in electron comming in future releases:
https://github.com/electron/electron/issues/23923
they claim that it will make possible to run desktopCapturer
without a mouse pointer.
So I was thinking when it will be released, we can make two separate streams: one for entire desktop with really high quality, and second streem just with a tiny fraction of desktopCapturer mouse with its position. Then merging these streams together and delivering to client web browser.
This is basiclly how astropad works. They are delivering high quality picture of entire desktop, but the fast mouse pointer movement in its own small box, you can see it when watching videos, the pointer has a slight bounding box around it when you are watching some Forza Horizon videos and hovering a mouse above it.