I've been experimenting with this too, in the context of the Windows front-end f...

pcwalton · on Nov 21, 2017

The issue with sending a tree of layers is that this prevents some important optimizations: avoiding overdraw with early Z becomes impossible, because your app doesn't know anything about the positions of the scrollable layers and so has to be conservative and paint all of their contents. So you trade a frame of latency for lots of overdraw, which is a tradeoff I'm not really comfortable making. (Note that today, almost all apps overdraw like crazy anyway, but that should be fixed.) :)

> Further, Microsoft engineers have said that hardware overlays are coming, in which the graphics card pulls content from the windows as it's sending the video out the port, rather than taking an extra frame time to paint the windows onto a composited surface.

This seems like the best solution to me. It allows apps to paint content intelligently to prevent overdraw while avoiding any latency.

raphlinus · on Nov 21, 2017

It might be lots of overdraw in the general web case, but for a text editor almost everything the app renders is going to be shown on the screen. The exception is a bit of stuff just outside the scroll viewport, so it can respond instantly to scroll requests. This feels like a good tradeoff to me.

CyberDildonics · on Nov 21, 2017

> So you trade a frame of latency for lots of overdraw, which is a tradeoff I'm not really comfortable making.

I would jump at the chance to make that trade. My 8 year old CPU is rarely taxed by normal usage. What's the point in having a faster computer if it feels slower?

roel_v · on Nov 22, 2017

Overdraw doesn't just use some CPU. If that would be all, it'd be an easy problem to solve. It causes extra data to be transferred over various buses, extra stops in a pipeline of operations that is already cramped trying to provide a frame every few ms, and probably other problems that I don't know about. This problem isn't as easy as 'throw some more CPU at it'.

Const-me · on Nov 22, 2017

I think it’s only that complex and taxing for GUI frameworks that mix CPU and GPU rendering. Like Win32 GDI or GTK+.

For modern GUI frameworks that were designed for GPU, like MS WPF/XAML or Mozilla WebRender, overdraw just use some GPU resources, but that’s it. GPUs are designed for much more complex scenes. In-game stuff like particles, lights, volumetric effects, and foliage involve tons of overdraw. In comparison, a 2D GUI with overdraw is very light workload for the hardware.

pcwalton · on Nov 22, 2017

Overdraw actually matters quite a bit for integrated GPUs, especially on HiDPI. On high-end NVIDIA or AMD GPUs, sure, 2D overdraw doesn't matter too much. But when power is a concern, you don't want to be running on the discrete GPU, so it's worth optimizing overdraw.

CyberDildonics · on Nov 22, 2017

That seems like a dubious claim to me. In my experience maxing out memory or pcie bandwidth is rarely a bottleneck, and extremely unlikely / impossible with low CPU usage.

Do you have a link or more detail about 'already cramped pipelines'?

pcwalton · on Nov 22, 2017

2D rendering on GPUs is almost entirely memory bound.

Source: I've been working on GPU 2D rendering full time for years now.

jreck · on Nov 29, 2017

Depends heavily on the GPU and the scene being rendered. On Android I've seen low-end GPUs bottlenecked by the shader ALU of all things, even though the 2D renderer only produces trivial shaders. Turns out it's easier to whack off shader compute cores than it is to muck with the memory bus in some cases.

Most common situation is the GPU isn't taxed at all, though, and doesn't even leave idle clocks. We end up just being limited by GL draw call overhead.

CyberDildonics · on Nov 23, 2017

You seem to be implying that rendering of a text editor has to be all 2d and that rendering a frame pushes the memory bandwidth to it's limits, both of which can't possibly be true. Why can games run at 144hz and above but a text editor can't afford to overdraw for decreased latency?

pjc50 · on Nov 21, 2017

> Instead of sending a frame to the system, send a tree of layers.

Which API does one use to do this?

I'm slightly surprised that hardware overlays aren't already a feature, especially given that they're handled by the graphics card. I know there's a special API for video overlay, especially DRM video (where part of the requirement is that the system doesn't allow it near the compositor where it could be screenshotted). Can you do video "genlock" on Windows? (edit: https://msdn.microsoft.com/en-us/library/windows/desktop/dd7... )

I'm also wondering how VR/AR stuff handles this on Windows.

raphlinus · on Nov 21, 2017

DirectComposition. It's been there since Windows 8, and is used by Chrome among other apps (see https://bugs.chromium.org/p/chromium/issues/detail?id=524838).

It's possible some hardware already does overlays, given that the talk I linked above was 2 years ago. I haven't researched this carefully.

There's a bunch of stuff in the interface to support video and also integrated 3D content ("create swapchain for composition"), but I don't know how well it works. In my experiments and reading through the Chromium bug tracker, Microsoft's implementation of all this stuff is far from perfect, and it's hard, for example, to completely avoid artifacts on resizing.

pcwalton · on Nov 21, 2017

> It's possible some hardware already does overlays, given that the talk I linked above was 2 years ago. I haven't researched this carefully.

All hardware does some overlays, though not a large number. (You can see Intel GPU overlay setup at [1].) One scanout overlay is already in use on all major OS's, to draw the mouse cursor.

[1]: https://github.com/torvalds/linux/blob/e60e1ee60630cafef5e43...

crzwdjk · on Nov 23, 2017

Typically hardware will have the base layer, a cursor layer, and possibly a video overlay. But in many cases, the video "overlay" is actually emulated by the driver using a shader to do the colorspace conversion and the texture hardware to do scaling.

kevin_thibedeau · on Nov 21, 2017

People generally aren't composing other effects on top of a video overlay. Making it work in the general case requires hardware that can do all the things the compositing engine is doing in software.

raphlinus · on Nov 21, 2017

The usecase mentioned in Jesse's talk (linked in my root comment) is displaying a notification from some other app while playing a game. They added the "flip_discard" swapchain effect so that the OS can paint the notification on top of the game content before flipping it in hardware. This is something of a hack; I think you're right that the endgame for this is that the hardware can indeed implement the full compositing stack. I'm not sure how far away we are from this.

kazinator · on Nov 21, 2017

In a text editor, we don't want to compose a whole frame just because a character was inserted. The situation of drawing directly into the visible frame buffer at any time we like without caring about V-sync is pretty much ideal. If the user inserts three characters very rapidly, but each draw has to wait 1/60th of a second for a V-sync, that will be visible.

jhasse · on Nov 21, 2017

> If the user inserts three characters very rapidly, but each draw has to wait 1/60th of a second for a V-sync, that will be visible.

A new character won't tricker a draw call though. Rather the three characters will queue up, added to an internal buffer and when the next draw is due, they will all three be drawn.

kazinator · on Nov 21, 2017

That's how it should work. But when you look at the scatter plots, note how there are multiple bands under Windows 10. E.g. for gvim, not only is there an extra overall delay, but extra clusters of additional latency.

CyberDildonics · on Nov 21, 2017

How are you testing the mouse cursor following the window borders on resize? I've read that windows turns off the hardware sprite mouse cursor when resizing windows so that it can software render it to always line up properly.

raphlinus · on Nov 21, 2017

Visual inspection for now. I've got an Arduino and a high-speed camera, so my plan for the next step is to send mouse and keyboard events from the Arduino, blinking an LED at the same time, then capture both the LED and the monitor in the video. Then a bit of image analysis. This is the only way to be quantitative and capture all the sources of latency.

CyberDildonics · on Nov 21, 2017

The implication of what I was saying is that windows will always show the cursor lining up with the border of a window during resizing.

raphlinus · on Nov 21, 2017

Would that it were so.

eptcyka · on Nov 27, 2017

Hmm, are you inspired or were part of https://github.com/google/walt ?

phinnaeus · on Nov 21, 2017

Would you be testing this with a high refresh rate monitor (120hz+) ? Would that matter?

raphlinus · on Nov 21, 2017

Yes and yes. My son has a 144Hz gaming monitor, and my main monitor for coding is a Dell 4k. We'll test both.

jenscow · on Nov 21, 2017

I also believe that is true.

My reason being, I had adjusted the screen brightness using software - but the mouse cursor is still brighter than white anywhere else. When I dragged a window, the cursor turns dim.

ygra · on Nov 21, 2017

Oh. This explains why the cursor flickers briefly and gets drawn partially across multiple monitors when at the split instead of the normal mouse cursor which is drawn only on one monitor at a given time.

snvzz · on Nov 23, 2017

>attach animations so the motion is silky smooth

Yeah, because there isn't enough latency already.

Hell no.

rjsw · on Nov 22, 2017

> Further, Microsoft engineers have said that hardware overlays are coming, in which the graphics card pulls content from the windows as it's sending the video out the port, rather than taking an extra frame time to paint the windows onto a composited surface.

So they are copying the Xerox Alto.