I've been experimenting with this too, in the context of the Windows front-end for xi editor. It's absolutely true that the compositor adds a frame of latency, but I have a very different take than "turn it off."
First, it's possible to design an app around the compositor. Instead of sending a frame to the system, send a tree of layers. When updating the content or scrolling, just send a small delta to that tree. Further, instead of sending a single frame, send a small (~100ms) slice of time and attach animations so the motion can be silky smooth. In my experiments so far (which are not quite ready to be made public but hopefully soon), this gets you window resizing that almost exactly tracks the mouse (as opposing to lagging at least one frame behind), low latency, and excellent power usage.
Further, Microsoft engineers have said that hardware overlays are coming, in which the graphics card pulls content from the windows as it's sending the video out the port, rather than taking an extra frame time to paint the windows onto a composited surface. These already exist on mobile, where power is important, but making it work for the desktop is challenging. When that happens, you get your frame back.
So I think the answer is to move forward, embrace the compositor, and solve the engineering challenges, rather than move backwards, even though adding the compositor did regress latency metrics.
The issue with sending a tree of layers is that this prevents some important optimizations: avoiding overdraw with early Z becomes impossible, because your app doesn't know anything about the positions of the scrollable layers and so has to be conservative and paint all of their contents. So you trade a frame of latency for lots of overdraw, which is a tradeoff I'm not really comfortable making. (Note that today, almost all apps overdraw like crazy anyway, but that should be fixed.) :)
> Further, Microsoft engineers have said that hardware overlays are coming, in which the graphics card pulls content from the windows as it's sending the video out the port, rather than taking an extra frame time to paint the windows onto a composited surface.
This seems like the best solution to me. It allows apps to paint content intelligently to prevent overdraw while avoiding any latency.
It might be lots of overdraw in the general web case, but for a text editor almost everything the app renders is going to be shown on the screen. The exception is a bit of stuff just outside the scroll viewport, so it can respond instantly to scroll requests. This feels like a good tradeoff to me.
> So you trade a frame of latency for lots of overdraw, which is a tradeoff I'm not really comfortable making.
I would jump at the chance to make that trade. My 8 year old CPU is rarely taxed by normal usage. What's the point in having a faster computer if it feels slower?
Overdraw doesn't just use some CPU. If that would be all, it'd be an easy problem to solve. It causes extra data to be transferred over various buses, extra stops in a pipeline of operations that is already cramped trying to provide a frame every few ms, and probably other problems that I don't know about. This problem isn't as easy as 'throw some more CPU at it'.
I think it’s only that complex and taxing for GUI frameworks that mix CPU and GPU rendering. Like Win32 GDI or GTK+.
For modern GUI frameworks that were designed for GPU, like MS WPF/XAML or Mozilla WebRender, overdraw just use some GPU resources, but that’s it. GPUs are designed for much more complex scenes. In-game stuff like particles, lights, volumetric effects, and foliage involve tons of overdraw. In comparison, a 2D GUI with overdraw is very light workload for the hardware.
Overdraw actually matters quite a bit for integrated GPUs, especially on HiDPI. On high-end NVIDIA or AMD GPUs, sure, 2D overdraw doesn't matter too much. But when power is a concern, you don't want to be running on the discrete GPU, so it's worth optimizing overdraw.
That seems like a dubious claim to me. In my experience maxing out memory or pcie bandwidth is rarely a bottleneck, and extremely unlikely / impossible with low CPU usage.
Do you have a link or more detail about 'already cramped pipelines'?
Depends heavily on the GPU and the scene being rendered. On Android I've seen low-end GPUs bottlenecked by the shader ALU of all things, even though the 2D renderer only produces trivial shaders. Turns out it's easier to whack off shader compute cores than it is to muck with the memory bus in some cases.
Most common situation is the GPU isn't taxed at all, though, and doesn't even leave idle clocks. We end up just being limited by GL draw call overhead.
You seem to be implying that rendering of a text editor has to be all 2d and that rendering a frame pushes the memory bandwidth to it's limits, both of which can't possibly be true. Why can games run at 144hz and above but a text editor can't afford to overdraw for decreased latency?
> Instead of sending a frame to the system, send a tree of layers.
Which API does one use to do this?
I'm slightly surprised that hardware overlays aren't already a feature, especially given that they're handled by the graphics card. I know there's a special API for video overlay, especially DRM video (where part of the requirement is that the system doesn't allow it near the compositor where it could be screenshotted). Can you do video "genlock" on Windows? (edit: https://msdn.microsoft.com/en-us/library/windows/desktop/dd7... )
I'm also wondering how VR/AR stuff handles this on Windows.
It's possible some hardware already does overlays, given that the talk I linked above was 2 years ago. I haven't researched this carefully.
There's a bunch of stuff in the interface to support video and also integrated 3D content ("create swapchain for composition"), but I don't know how well it works. In my experiments and reading through the Chromium bug tracker, Microsoft's implementation of all this stuff is far from perfect, and it's hard, for example, to completely avoid artifacts on resizing.
> It's possible some hardware already does overlays, given that the talk I linked above was 2 years ago. I haven't researched this carefully.
All hardware does some overlays, though not a large number. (You can see Intel GPU overlay setup at [1].) One scanout overlay is already in use on all major OS's, to draw the mouse cursor.
Typically hardware will have the base layer, a cursor layer, and possibly a video overlay. But in many cases, the video "overlay" is actually emulated by the driver using a shader to do the colorspace conversion and the texture hardware to do scaling.
People generally aren't composing other effects on top of a video overlay. Making it work in the general case requires hardware that can do all the things the compositing engine is doing in software.
The usecase mentioned in Jesse's talk (linked in my root comment) is displaying a notification from some other app while playing a game. They added the "flip_discard" swapchain effect so that the OS can paint the notification on top of the game content before flipping it in hardware. This is something of a hack; I think you're right that the endgame for this is that the hardware can indeed implement the full compositing stack. I'm not sure how far away we are from this.
In a text editor, we don't want to compose a whole frame just because a character was inserted. The situation of drawing directly into the visible frame buffer at any time we like without caring about V-sync is pretty much ideal.
If the user inserts three characters very rapidly, but each draw has to wait 1/60th of a second for a V-sync, that will be visible.
> If the user inserts three characters very rapidly, but each draw has to wait 1/60th of a second for a V-sync, that will be visible.
A new character won't tricker a draw call though. Rather the three characters will queue up, added to an internal buffer and when the next draw is due, they will all three be drawn.
That's how it should work. But when you look at the scatter plots, note how there are multiple bands under Windows 10. E.g. for gvim, not only is there an extra overall delay, but extra clusters of additional latency.
How are you testing the mouse cursor following the window borders on resize? I've read that windows turns off the hardware sprite mouse cursor when resizing windows so that it can software render it to always line up properly.
Visual inspection for now. I've got an Arduino and a high-speed camera, so my plan for the next step is to send mouse and keyboard events from the Arduino, blinking an LED at the same time, then capture both the LED and the monitor in the video. Then a bit of image analysis. This is the only way to be quantitative and capture all the sources of latency.
My reason being, I had adjusted the screen brightness using software - but the mouse cursor is still brighter than white anywhere else. When I dragged a window, the cursor turns dim.
Oh. This explains why the cursor flickers briefly and gets drawn partially across multiple monitors when at the split instead of the normal mouse cursor which is drawn only on one monitor at a given time.
> Further, Microsoft engineers have said that hardware overlays are coming, in which the graphics card pulls content from the windows as it's sending the video out the port, rather than taking an extra frame time to paint the windows onto a composited surface.
First, it's possible to design an app around the compositor. Instead of sending a frame to the system, send a tree of layers. When updating the content or scrolling, just send a small delta to that tree. Further, instead of sending a single frame, send a small (~100ms) slice of time and attach animations so the motion can be silky smooth. In my experiments so far (which are not quite ready to be made public but hopefully soon), this gets you window resizing that almost exactly tracks the mouse (as opposing to lagging at least one frame behind), low latency, and excellent power usage.
Further, Microsoft engineers have said that hardware overlays are coming, in which the graphics card pulls content from the windows as it's sending the video out the port, rather than taking an extra frame time to paint the windows onto a composited surface. These already exist on mobile, where power is important, but making it work for the desktop is challenging. When that happens, you get your frame back.
So I think the answer is to move forward, embrace the compositor, and solve the engineering challenges, rather than move backwards, even though adding the compositor did regress latency metrics.
Edit: here's the talk I was referencing that mentions overlays: https://www.youtube.com/watch?v=E3wTajGZOsA