Perhaps it ends up being multiple vsync waits for a given rendered frame? Something like the application or OpenGL driver waiting for vsync before rendering into its buffer, then the compositor waiting for the next vsync before actually compositing/flipping.
This is a common source of delay in composited apps/games, yes. Ideally what you want is to have a completed frame ready for the compositor at least a few milliseconds before the next vertical sync arrives, but it's easy to screw that up, especially if you're getting fancy. Triple Buffering also enters the picture here (though mostly for games), because in the bad old days you had exactly two backbuffers, and if both were in use (one being scanned out to the monitor, the other your most recent completed frame) everything had to grind to a halt and wait before rendering or game code could continue. Triple buffering solved this by adding an extra frame, at the cost of an entire frame worth of display latency in exchange for your code spending less time spinning and waiting on the GPU. If someone is careless they could definitely end up with triple buffering enabled for their app (like if they're rendering using a media-oriented framework that turns it on.)
The 'Fast Sync' option NVIDIA added to their drivers in the last year or two is a fix for the triple buffering problem - you get spare buffers, but instead of adding a frame of latency the GPU always grabs the most recently completed frame for scanout. Of course, if a compositor is involved you now need the compositor to do this, and then for the compositor to utilize this feature when presenting a composited desktop to the GPU. I don't think any modern compositor does this at present.
This looks way more like badly designed animations than some fundamental problem coming from the hardware.