HN2new | past | comments | ask | show | jobs | submitlogin

This is still bad engineering on Netflix's part. You can't and shouldn't rely on your audio handler getting called on time via a timer in order to keep playback stable, especially not on a non latency sensitive use case, which Netflix very much isn't.

If you're doing real time audio, your processing loop needs to be directly driven by audio hardware buffer events/interrupts (through as many abstraction layers as you want, but a traceable event chain nonetheless, no sleeping guesswork or timers - and ideally all those threads should be marked real-time, never allocate any memory, and never lock or block on anything else). This is what serious real time audio systems like JACK do. And if you're not, you need to be capable of buffer filling way more than one frame's worth of audio - at least 100ms, preferably much more. Not just for robustness, but also for battery life.

Sure, there was an Android bug here, but the way Netflix designed this is fragile and, as someone who has messed with audio/video programming enough, wrong. Had they done things properly, they would've been insulated from this OS bug.

This kind of bug is best used as a learning experience: what happened here consistently and resulted in completely destroyed playback on one device, is something that has always been happening sporadically and hurting all of your users whenever something causes the CPU to stall or slow for long enough to miss the deadline anyway. Instead of just fixing the bug, make your software robust against these cases, and then all your users win.



Android does not have a functioning audio API. And really doesn't have the ability to mark threads as real-time that will stick. I've worked on Android game ports and done audio stack work, the whole thing is a shitshow and barely functions as documented. What their support team recommends in one version is quickly broken by hacks done by chip vendors and random bugs added to OS versions. Nobody at Google even tests basic scheduler behavior such that an extra 40ms can sneak in completely unnoticed. I've seen stuff far worse than this, they once broke their own cert test and nobody noticed for half a year.


Sure, but this is all painful for real-time use cases, like audio production and games. Playing video is not one of those, and there the more you buffer the better the battery life thanks to reduced interrupt load and longer CPU idle states.


>Android does not have a functioning audio API.

I work on an app which relies on fairly low latency real-time audio (triggered by a MIDI keyboard) on both iOS and Android and have had more than my fair share of headaches with Android audio, but have to say that the Oboe audio API in the newer versions of Android has made things quite a lot better. We use JUCE as an abstraction layer over the audio APIs.

> And really doesn't have the ability to mark threads as real-time that will stick.

Yeah, we found this is the cause of a lot of real-time audio issues on Android. If you use systrace you can see that your audio thread is hopping from core to core. We get around this by manually setting the affinity of the audio thread to a certain core (working out which core to use requires some guess work sadly, there's meant to be a way to ask for the "best" core but it doesn't really seem to work).


Out of curiosity, are the audio APIs any better on iOS? Core audio seemed to work pretty well the last time I used it on OS X, so I assume it does, but I'd be curious to hear your take on it.


Much better. In general, "creative" APIs (e.g. audio playback, camera/image processing) tend to have well-rounded APIs on iOS.


Probably unrelated observation. I use a MacBook for personal use and a Windows laptop for work.

For many things, they both work well in general (Windows is better for "document stuff" and file management, Mac a little better in window management etc., but comparable in general).

But when it comes to media stuff (think: cropping a lossless audio file and compressing the result into an AAC or MP3, resizing 20 images, transcoding a movie from one format to the other, removing one page from a PDF), on Mac it can usually be done with built in tools. On Windows you often need either a prohibitively expensive software suite or go on an ad infested crapware hunt resulting in a borderline unusable tool.


> Mac a little better in window management

Can you elaborate on this? I've been using a Mac for work for a couple of years now and I find the out-of-the-box window management experience pretty bad compared to Windows. No snapping to sides or corners, no quick resize to standard sizes other than maximum, and most importantly no easy way to change between multiple windows of the same program using keyboard shortcuts. I've been using Magnet [1] and Contexts [2] to improve things, but it's a shame I have to use paid third party apps for basic functionality like this.

[1] https://magnet.crowdcafe.com/ [2] https://contexts.co/


Windows definitely improved significantly with Win10 in this regard. But Exposé (or whatever it's called now) invoked via a swipe gesture is fantastic. Swipe up with 4 fingers and I see all windows. Swipe down with 4 and I see all windows of the current app. Combine this with Spotlight to quickly find and open apps/documents.

But, everyone has a different workflow. I can imagine that if you're a heavier shortcuts user, Windows works better.


Expose is pretty good! But the Windows 10 task view [1] (Windows+Tab) does practically the same thing.

[1] https://www.windowscentral.com/how-use-task-view-windows-10

Spotlight is great and puts Windows 10's search to shame, although it feels like it has gotten worse in the past few years.


I found Spectacle (since replaced by Rectangle [0], I hear?) to be a treasure in this regard. Key bindings to maximize and resize, and even change what desktop a window is on.

I picked it because it was closest to the window-snapping / sizing bindings I had in XFCE years ago, but it basically makes OSX window management no longer frustrating.

0: https://rectangleapp.com/


It might not be quite what you want, but Command-(Shift-)-` should cycle through an app’s windows.


Only works if your keyboard layout includes backtick unfortunately.


At least on Gnome (which AFAIK copied this from Mac) it's actually mapped to "the key above Tab", even if that key is something other than backtick.


They improved it on copying, this does not work in MacOS.


You can change the shortcut in System Preferences > Keyboard > Shortcuts > Keyboard > Move focus to next window.

Try also using “Move focus to active or next window”. I have that mapped to Option-Command-` (and implicitly Option-Shift-Command-` for reverse order).


Contexts is nice but hasn't been updated since 2018! If you know a similar nice alternative, let me know.


They have said they're working on a Big Sur compatible version (it works currently, but the rendering is a bit broken).

That's fine by me to be honest—I paid for it once and it's done everything I need it to ever since.


> no easy way to change between multiple windows of the same program using keyboard shortcuts

CMD + ` (backtick)


I used various window management helpers over the years but recently settled on Hammerspoon.


This is my experience as well. I recently bought a Windows machine since a very long time as I want access to all 3 big operating systems (Mac, Windows, Linux). I forgot how terrible Windows can be with downloading stuff from the internet.

I wanted to download a program so that I could control my fans a bit better. Before I knew it, it took over my browser and would redirect me to other sites, couldn't remove the program either. Norton didn't do its job, and I didn't do mine because I probably unchecked one box too few during the installation process. The only way to fix this was to wipe the disk completely.

I've never had this experience on Mac or Linux.


Sounds like the problem exists between keyboard and chair.

Executing strange binary files from untrusted sources could ruin your day on Mac and Linux as well. It is only less common because market share is so small compared to Windows, and as such these systems are less of a target.


How do you establish that a binary is trustworthy?

I know what I do, I generally try to find user reviews, but someone had to install it first.


You can upload the binary to virustotal and see if it got flagged.


Oh, that’s an interesting tool. Thanks.


Just as a heads up to anyone out there, if you ever get infected run Malwarebytes (free) first and that should probably fix your woes or else you can download bitdefender (also free) and run that. I see a lot of people using norton or whatever else and imo they're all just a waste of resources and time.

I personally just uninstall Malwarebytes and bitdefender after I clean up and make sure windows defenders up and running and I'm ready to download all kinds of shit off the internet again.

I think the best way to find niche software on windows is to search for open source alternatives.

Making a system restore point from time to time can help greatly too on windows! (make sure you change the setting so that it doesn't delete your old system restore points due to low disk space making the whole thing useless)


I think that if you have malware, then formatting of hard drive is minimum of what should be done.

I would not assume that malware is actually purged, and unpacking backup archive and running script to reinstall software is not so troublesome (assuming that you have backups and script to automate installing your programs, but if you are on HN then you probably have it)


Well, I normally do :P But this was a fairly fresh install, so I didn't set it up yet.

I use Clonezilla for backups.


> But when it comes to media stuff (think: cropping a lossless audio file and compressing the result into an AAC or MP3, resizing 20 images, transcoding a movie from one format to the other, removing one page from a PDF), on Mac it can usually be done with built in tools.

Which tools are these? On Mac I ended up using Audacity for cropping audio files, ffmpeg (cli) to compress it, and used either ffmpeg or Handbrake for transcoding. I don't remember what I used for pdf page split/combine, but I do remember that it was a pain.

Honest question - as I still use Mac occasionally, it'd be great to know what are those builtin tools so I can actually use the features the OS has.


For cropping audio files, QuickTime can do it. Look in its menus. Also transcoding video files (to what extent this might be “compression” depends on your use case).

PDF page manipulation is the easiest thing on macOS, also quite robust usually. Just open the document in Preview and open the page sidebar. You can reorder or remove pages there; or drag and drop pages from other documents.


>For cropping audio files, QuickTime can do it. Look in its menus. Also transcoding video files

Quicktime Pro could do this all very well, Quicktime X however? It can trim but the UI is bad and encoding options are so limited I stopped using it all together.


It often comes to what we are used to. I switched from Windows to Mac on purpose: being a 20+ years Windows power user (software engineer) I wanted to learn a Mac to be a bit more versatile. It took me a year just to be decent MacOS user and I stil "hate it". For me Windows is still better, as for me it is easier and I know all the tools, etc.

I am still to discover a built in set of tools on either platform. I use 3rd party to accomplish my daily tasks on both.


On a very basic level Quicktime can do that. And Preview lets you easily remove pages from PDFs - select the page in the sidebar, press Delete.

Apart from that, more advanced 1st party apps for editing (for certain values of advanced) are available free - iMovie for video, GarageBand for audio, Pages for documents, etc.


I personally know people who switched to Mac because of Print/Save as PDF. Not just because of that but it was the proverbial last straw. (That was before it was built in in Windows.)


> Mac a little better in window management

I need to echo the sibling comment. I was overwhelmed with how bad Mac window management is in a multi-monitor setup. This is compounded by Apple's apparent hatred of keyboard shortcuts and power users. Speaking to my longtime Mac-loving coworkers, they displayed Sapir-Whorf-like inability to understand why the missing features/behaviors are even a problem.

In a the Covid/WFH world, I'm happy to let the 16" MBP collect dust while I use computers that adapt to me, rather than forcefully adapting myself to one particular computer's eccentricities.


ffmpeg is open source and cross-platform, no need for infested crapware.


GUI apps are easier to pick up and learn than CLI commands with cryptic namings.


You end up using FFMPEG all the time on Mac anyway because the built in tools only really support apple approved formats and if you're dealing with video being captured from elsewhere you end up with piles of webm containers full on non-apple endorsed codecs.

Didn't use to be this way in the QuickTime Pro ecosystem but since QuickTime X you can't just work with any old video files.


Meh, pick your poison. I was just pointing out there are more alternatives than crapware and expensive tools.


I still remember how bewildered I was when I realized how easy it was on my first Mac to trim or convert an audio file, crop or resize a picture, build a pdf from images or sign a damn pdf ^^


> On Windows you often need either a prohibitively expensive software suite or go on an ad infested crapware hunt resulting in a borderline unusable tool.

For basic audio/video stuff on Windows, try http://avidemux.sourceforge.net/


I've used FFMPEG for these kind of things with no problem on Windows and *nix systems.

If your willing to spend the money Adobe Media Encoder works perfectly fine on Windows as well.


As a person who develops a video player SDK on iOS, I can say that video processing is horribly undocumented, buggy, and randomly changes behavior depending on device and OS version. And Apple being Apple, you pretty much have no channel to report issues.


Metal is a pleasure to work with, versus the "build your own API" experience of using Vulkan on Android.


Much better. Last time I checked, there was a baseline audio latency on Android that makes realtime music apps (and decent audio responsiveness in games) basically impossible, vs on iOS where there's a giant ecosystem of them.

See this blogpost (from 2018) that seems to confirm my recollections:

https://www.synthtopia.com/content/2018/02/17/10-years-later...


> Based on MAQ stats, devices from Apple & Google are suitable for live audio use, and devices from other manufacturers generally are likely to have glitchiness and latency that makes them unsuitable.

Seems to be more of an issue with manufacturers than a specific issue with Android as Google's own phones get low latancies.


That's kind of a moot point - most Android devices are not Google's own phones, so you can't ship a reliable user experience on most Android devices.


My point was more that if Google can do it, other manufacturers can do it.

The issue with other manufacturers is that they concentrate a lot more on features that are visible to users whereas features like these aren't necessarily visible directly to the user and aren't given attention even though they're possible to implement.


I haven't used it, to be quite honest, the games I worked on had the Mac/iOS ports done by a different team, but since I was usually called in to help diagnose customer issues in live and never had to touch the Mac/iOS audio stuff, it probably worked well enough!


You can tell by the plethora of realtime audio production apps for iOS that it’s pretty solid. CoreAudio is the best in its league.


No wonder Apple is lightyears ahead in A/V-land. At least with reliable OS and expected behaviour of hardware you don't need to worry about this.


AAudio is supposed to overcome that, although I never used, only care about graphics stuff.

Isn't that the case in your experience?


Author here: Fair criticism. There are some details that are missing in the interest of keeping the story simple.

1) The audio and video playback is driven by the hardware, using the audio stream as the master clock.

2) Ninja does fill the Android and hardware buffers and keep them full. In the pathological case, the hardware and Android buffers had emptied.

3) When the Android buffer isn't full, the thread yields to Android (to be a good citizen) but asks to be invoked again right away. The 40ms "background thread" delay broke this behavior. The comment about "changing this behavior involved deeper changes than I was prepared to make" was when I explored changing this behavior (copying multiple samples per invocation) and decided it was more likely to introduce more bugs.


Excellent article btw!

But why did the problem only appear on this particular TV and not any other using the same version of Android? Was this the only TV to use Lollipop?


Yes. This was the only TV device on L. It was already 2 years old at this point, which for pure Android TV would not be allowed by Google. Since this was an AOSP device, the operator could stay on an older version.


Quick simple question for you. What does the TV testing landscape look like at a Netflix? Is it hundreds of various TVs in a room on a variety of networks or am I imagining too much here?


TV and set top box testing is shared between our partners (manufacturers and operators) and Netflix. Our partners must run our test suite before submitting for certification. We don't have all the devices under active testing all the time, but almost all of the devices have been tested at Netflix at some point. Most operator devices require VPNs to the operator network to function.

So yes, hundreds of devices on a variety of networks, but at any given moment most of the devices are in storage.


The "buffering" parts of the graph show the "fill the buffer" behavior described in (3). The time between calls is very short and large amounts of data are moved.


How does the buffering proceed quickly, but then the (slower than usual) periodic calls only copy a limited amount of data? That's the part that doesn't make sense to me. Normally if you're filling buffers, you copy as much data as is needed to keep the buffer full, no matter when you're invoked.


The handler always copies the same amount of data. It has to be called more often to copy a lot of data, which is what usually happens when Android buffer isn't full. I agree with you that this isn't usually how this is done. Doing it this way means that Android has more freedom to schedule other tasks during the buffering phase.


Okay, so the real problem here is there was effectively a yield() inside a buffer fill loop with a small block size, and Android decided to make each of those cost 40ms. The article made it sound like you were always waiting 15ms, but you were actually waiting 0ms when there was more work to do.

I still think the blocksize should've been larger for various reasons in this case (it's a trade-off still, usually larger block sizes are mildly more CPU-efficient, besides preventing pathological scheduling cases like this one) but this explanation makes more sense.


Yes, that's all correct. The reason for the small block size has to do design decisions made in the rest of the streaming stack, which is shared between all TV devices.


That is indeed interesting. The pipeline should be filling up if the OS scheduler delays the buffering for 40ms, giving enough data on the next call. Except if the buffering isn't greedy enough or aborted by the OS in some way.


>You can't and shouldn't rely on your audio handler getting called on time via a timer in order to keep playback stable, especially not on a non latency sensitive use case

Why not? According to [1], using timers is how Windows, CoreAudio, and PulseAudio all work under the hood, and on Windows and in PulseAudio it replaced the previous interrupt-based implementations. On the app-end of the APIs, Windows' WASAPI code example uses Sleep polling [2], and PulseAudio's write callback is optional and VLC doesn't use it [3], foobar2000 has a polling-based output mode[4], Windows has specific APIs for audio thread scheduling [5], etc.

Is this a specific deficiency of Android?

[1] https://fedoraproject.org/wiki/Features/GlitchFreeAudio

[2] https://docs.microsoft.com/en-us/windows/win32/coreaudio/ren...

[3] http://www.videolan.org/developers/vlc/modules/audio_output/...

[4] http://wiki.hydrogenaud.io/index.php?title=Foobar2000:Compon...

[5] https://docs.microsoft.com/en-us/windows/win32/procthread/mu...


The way to get the best latency on a device is not for your processing loop to be directly driven by an audio hardware events/interrupts. This is for many reasons:

1) events/interrupts can be delayed (usually by (bad) drivers disabling interrupt for a long time).

2) latency is not adjustable. You may want to give your processing more time if you know more work needs to be done, or the system is under variable load.

3) you will usually finish generating your audio too early. Eg: if the interrupts is every 2ms, but your processing is 1ms long, you are adding an unnecessary 1ms latency by starting as soon as the interrupt is received.

4) the device will consume more power as it can not know when it will next wakeup (theoretically, I do not think it makes a difference in practice)

Modern mobile devices use a DMA to copy the content of the application processor (AP) to the audio DSP. This DMA will copy periodically the content of a AP buffer to a DSP buffer, this is called a DMA burst.

You want to track this burst and wakup just enough time before it to have the time to generate your audio data and write it to the AP buffer + a safety margin. This allows you to track the system performance & application load to adjust your safety margin and optimize latency. It also allows the scheduler to know when your app will need to be woken up far in advance leading.

The Android Aaudio API [1] implement what I just described, as well as memory mapping the AP buffer to the application process to achieve 0 copy. It is the way to achieve the lowest latency on Android.

I believe Apple low latency APIs uses a similar no interrupt design.

Source: worked 3 years on the Android audio framework at Google.

Disclaimer: Opinions expressed are solely my own and do not express the views or opinions of my employer.

[1]: https://developer.android.com/ndk/guides/audio/aaudio/aaudio


The Android Aaudio API is newer than the device in question.

The previous options are either SLES or AudioDriver, both of which are bad in their own ways.


You are right, Aaudio is available since Oreo only and the device is 3 version older. Additionally Netflix 's video playback is not a low latency use-case, so it shouldn't use Aaudio even on recent devices.

My comment was about the parent's affirmation that low latency apps should be scheduled by interrupt.


Saying in hindsight when the error has been found that something is a wrong approach and should have never been done like this is easy. But everyone of us is daily coding simple „tricks“ to not implement a more complex solution because the simple approach seems to be working, until some months/years later it does not anymore. And then someone will point their figures at you/us and say it‘s obvious it should have never been done like that...


This isn't just hindsight - anyone who knows audio coding or really anything about thread-sleeping will have the same reaction of disgust to the design of that audio buffering. It is a design that is bad in ways it never needed to be, that would not have saved time in coding.


This kind of attitude, that there is only one "right" way to do it, and any other method is disgusting, is why there are at least three different, incompatible, audio subsystems for Linux. Every time a self-proclaimed expert on audio decides that the current most popular implementation is wrong, they just start writing (and leave incomplete) a different one.

This same jousting over technical windmills happens in other userspace elements of Linux. Linus, for better or worse, keeps a tight rein on what's allowed in the kernel, but when it comes to audio, window management, display, UX, everything is a shitshow of mixed metaphors and dysfunctional interoperability.


Replace audio with security and you get the reason why everything is so good, everyone does what they think is best for security and security bugs slowly lessen.

Wait... That's not how that works.


Honest question (I don't know much about audio coding), but from what I've read here on HN, Netflix offer some of the best remuneration packages - so don't they have some of the best developers working there?

Surely they'd have picked this up if it were that straight forward?


> offer some of the best remuneration packages - so don't they have some of the best developers working there?

Having worked in several top-tier for employee-compensation firms, I can wholeheartedly assure you that best pay does not result in best developers (or any other job function).

It does generally result in a higher “floor” of employee, but that’s about it.


Have you seen how broken the hiring process is at these bigger places.

If you spend your time learning about the details of audio playback what kind of additional time do you have to learn leetcode?


Can't say I'm surprised; I have a lot of other small beefs with the Netflix app that collectively hint to me the developers who coded it may not have been top of their game.

Some examples:

- Unacceptably high latency for any sort of seek operation - even if it's just going back a few seconds to what you just watched and ought to still be somewhere in a local buffer.

- Trying to seek multiple times sometimes forces you to wait for the previous operation to finish instead of "accumulating" your taps to determine a new time target

- Grabbing the playback marker and dragging it left and right is very imprecise, it should converge into "logarithmic" behaviour for fine movements

- Flakey Chromecast button. More than half the time I have to kill the app and relaunch it to get the cast button to show up.

- Occasionally flakey UI layout (random bugs which present when moving between screens or searching, or flickering due to layout engine doing multiple repaints over the same controls)

- Likes to plaster up things I don't want. They seem to prioritize "discovery" over "user intent", which would work if only they surfaced articles I'm interested in (which is rare). When I just want to find a "comedy" or get to the series I'm watching, it sometimes feels like I'm fighting against the UI

- Constantly moving my cheese - it's like the UI designers have ADD and can't commit

Amazon Prime is just as bad (maybe worse). I wish these companies didn't control the vertical - i.e. just provide content, and let other vendors make the client software.

/rant


Good points, but I have to say that even with all that, Netflix app/site is by far the best of the streaming services that I've used. HBO Nordic, Viaplay and YouTube movies have pretty bad UI/UX problems and their streams become borderline unwatchable with subpar internet connections, whereas Netflix has in my experience still been able to push through video in those conditions with good enough quality that I can still enjoy it.


> Unacceptably high latency for any sort of seek operation - even if it's just going back a few seconds to what you just watched and ought to still be somewhere in a local buffer.

Unacceptable for what use-case? I had assumed this was somewhat intentional on Netflix's part, or at least not a priority for them.


For the use case of watching a movie?

You seek back a scene or two while watching something when your mind has wandered away and you missed what was going on a few seconds ago. For some non-neurotypical people it's a really common thing to do while watching a longer movie as it may be hard to follow what's going on otherwise.


Id generalize this to:

If your application is timing critical, and you aren't using things specifically desiged for time critical applications (like RTOS), then you should be doing as much low level implementation yourself as possible.



Or the other way: don't make your application timing critical if it does not have to be. As others point out the hardware provides a buffer and the app only fills in one frame in advance instead of filling the buffer completely when possible.


> And if you're not, you need to be capable of buffer filling way more than one frame's worth of audio - at least 100ms, preferably much more.

Author addressed this point and agrees with it:

> Why don’t you just copy more data each time the handler is called? This was a fair criticism, but changing this behavior involved deeper changes than I was prepared to make

In general, "real time audio" usually refers to low-latency audio, which uninteractive video playback is not.


Yes he addressed it but just said he was unwilling to do anything about it as it resulted in too much change/work

That does answer the issue of why it was implemented like that in the first place


Well, it's been my experience that you only find out the proper way to implement something after you do it one or a couple of times.


>If you're doing real time audio, your processing loop needs to be directly driven by audio hardware buffer events/interrupts

Is this realistic? I don't know much about providing software for a wide range of hardware, but I would imagine this would require either Netflix or the integrator to write drivers for Netflix to communicate to the hardware dsp right (or standardize)? It seems more feasible to piggyback on a platform that has done that already and is already widely deployed.


If I'm reading correctly, they're saying that Netflix is not doing realtime audio, so they need not have this level of precision. They should buffer more than 40ms of audio data.


I think they are doing so ("The Netflix application is complex, but at its simplest it streams data from a Netflix server, buffers several seconds worth of video and audio data on the device").

The system was built to fill the (hardware) buffers for 15ms worth of audio. The problem is that the 15ms window was not invoked in time (sometimes it took 40ms). Some suggested Netflix that in that case they could fill audio data for 40ms into the HW, but they didn't wanted to because it required low-level modifications.


Except buffering wasn't the problem, an OS-managed thread was.


The reason it caused a problem is because Netflix wasn't loading more than a frame worth of data at a time. That's a strategy that makes sense for a game streaming app like Stadia or Steam Link where every frame of latency matters, but for a movie/TV streaming app it really doesn't.

The point is that there was never any good reason to make the Netflix app so timing-dependent. Had Netflix not done this they wouldn't have even noticed.

---

That said, if a company was releasing an Android 5.0 powered device in late 2017 when Android 8.0 had already been released I can't help but say they brought this on themselves by being idiotically out of date. There's no excuse for launching a device with an OS three years and three versions out of date.

Everybody in this story is doing really dumb things.


> wasn't loading more than a frame worth of data at a time

So the real question should be, does Netflix have a valid reason for only delivering a single frame of video and audio at a time? Is it bad software design, or good software design for an unknown problem (to us)?

I've seen plenty of things that looked stupid out of context, then made sense later on after understanding why the "stupid" things was done in the first place.


Stupid thought, especially thinking about Netflix's history as one of the earliest services that streamed licensed content, it would not surprise me if this was at some point a way to minimize how much unencrypted data was in memory or similar.

Other thought would be, whether the architecture was somehow related to how one had to do things in Silverlight (any silverlight devs still remember how one would do this sort of thing? Any Silverlight devs still around?)


It's probably easier to keep the audio and video in sync with a simple implementation. They can't really drift out of sync if all that's being one is handling one frame at a time.


Though it sounds like they aren't even getting that advantage, because the audio thread is separate, and the described problem is starvation of the audio buffer alone.


It probably makes it easier to implement things like variable frame rates.


"There's no excuse for launching a device with an OS three years and three versions out of date." Really? Why not? I mean if you only launch for the newest OS version, you cut out a lot of devices which were not or cannot be upgraded. Maybe in the US this is not a problem, but in Europe I wouldnt be surprised if this were actually a larger than 50% market share you cut out.


You have misunderstood. The hardware device that was misbehaving had not yet been released to the public, yet it was running an ancient and out-of-date OS build. That's what I'm saying is idiotic.

Netflix can and should support their software on as old of OS releases as makes sense for them, but a hardware vendor introducing a new device to market on an outdated OS is inexcusable. Android 5.x received its final update in early 2015, over two full years before this mystery TV box was to be released. There is no good reason it couldn't have been on Android 7 or 8.


It was the interaction between the expected amount of required buffering and the thread scheduling behavior that caused the problem. As typically happens when there's a video programmer around, the audio timing is considered less important, and there's a belief that "we can always get the audio we need for the video". Ergo, not much need for buffering. In this case, the odd/incorrect behavior of the thread scheduler exposed the optimistic assumption that you can rely on waking up every 15ms and handling the audio.


And further, video latency is much less noticeable than audio latency. ~nobody will notice one dropped frame, but ~everyone will notice 16ms of audio missing, because you get a blatant click. Therefore, it is always important to prioritize audio buffering over video buffering.


oh, yes, I had to use some videoconferencing software recently that appears to drop audio before video. It was basically unusable.


I see no reason that interrupts are need at all. On a properly working system, the system clock will track the audio clock quite well. If the audio hardware tells you that it will need more data in x ms, it will be very close to x ms as seen by the CPU. If you use a deadline scheduler, you don’t want to wait for an interrupt — you tell the scheduler that you need to run before a specific deadline, and that that. A hardware interrupt is pure overhead, especially on platforms like x86 with silly interrupt latency.

(A good kernel can account for the time it takes to wake from idle and compensate when programming a timer (as could a good CPU, but I’ve never heard of hardware that does this) and can wake a user program a bit early if the CPU is otherwise idle and would take a short nap. A sound hardware interrupt can’t do these tricks.)


This is both true and false at the same time.

(1) It is absolutely not true that the system clock will track the audio clock "quite well". Real world numbers would typically involve drift measurable in seconds within a low integer number of hours.

(2) It absolutely is true that on most sensible audio interface designs, you don't need interrupts except early after device open. At that point, you use them to set up a DLL (Delay Locked Loop) that will enable you to use the system clock to know where the audio hardware is reading/writing to in the buffer used to move data to/from the hardware. Once the DLL is correctly configured, you can just be woken by the system timer, determine the current audio hardware state and read/write data appropriately.


But on (2): unless you're a low level app directly driving the hardware, you shouldn't do this. And you especially shouldn't do this without good guarantees that system timers are indeed well behaved. Effectively, using a DLL and system timers instead of straight IRQs is a low level implementation detail, but that belongs either in the driver or in a very low level piece of code anyway (like JACK or some other audio daemon), not user applications. I was sort of throwing all of this together when I said hardware "events"; locking a DLL with a different time base to the playback hardware buffer pointer stats is still effectively that, and should be done, but only at a level low enough where you know it'll work. Which definitely isn't Android's high level thread scheduling framework.


For (1), by “quite well” I mean well enough to avoid buffer underruns. So I think we agree.


No, we don't agree. The drift will exceed the buffer size is less than an hour, or certainly could with many hardware configurations.


>"A hardware interrupt is pure overhead, especially on platforms like x86 with silly interrupt latency."

Relatively speaking maybe. But I have a device I've designed that communicates to PC using interrupt transfer at the rate of 250 requests/s. I've watched it then over oscilloscope and it is stable like a rock. Sure if you overload PC it might get into trouble but it is a game like application so when the app is ran PC is not doing much else. I recently ported the same thing to Raspberry Pi 4.


Hah, you must never have tried this on an otherwise idle stock Sandy Bridge before Linux added mitigations :)

x86 interrupts are slow but at least reasonably consistent on a non-idle system. If your system is deeply enough into its various idle states, this is not at all true any more. I’ve seen latencies over 10ms on Sandy Bridge with C1E enabled.


Linux was on Pi 4. On PC it was actually Windows. The first version went out some 7 years ago I think and was rock solid since the beginning. They must've done something right ;)


It isn't on Android, but as I said, Netflix isn't a real-time app, so Android's deficiencies in the low latency audio department should not concern them. They should just buffer more audio per wakeup.


> as I said, Netflix isn't a real-time app

I honestly had a tough time parsing the sentence when you said that due to the triple negative, so it's possible that others might have misunderstood it.


Multiple negatives are common in some languages and absolutely do connote a distinct meaning from the simplified language.

https://linguistics.stackexchange.com/questions/15334/what-i...


Yep, I don't disagree! I think they can still be tough to parse in English though, even as a native speaker


That sentence was indeed a bit awkward, but I hope the meaning gets across since it wouldn't really make much sense the other way around, I think :-)


I'm not a native speaker and the sentence is honestly absolutely unparseable for me and reads like a contradiction. Is it even... correct?


I think it's grammatically correct, yes! But being grammatically correct isn't always the same as being easy to understand; some correct things can be hard to understand, and some incorrect things can be pretty easy to understand!


> You can't and shouldn't rely on your audio handler getting called on time via a timer in order to keep playback stable, especially not on a non latency sensitive use case, which Netflix very much isn't.

I don't understand this criticism, but I am likely missing something.

From what I gathered from the article, Ninja is specifically for firmware. It buffers frames, and delivers those frames whenever requested.

> your processing loop needs to be directly driven by audio hardware buffer events/interrupts

Again, I could be missing the thrust of the criticism, but isn't that exactly what's happening, here? Ninja supplies an endpoint, and it's the hardware that calls it?


> It buffers frames, and delivers those frames whenever requested.

Going by the article they use a plain android thread that runs at fixed 15ms intervals and buffers exactly one frame in advance. For 30 fps this is more often than needed (yay battery life) and for 60 fps that just asks for trouble when your OS doesn't guarantee real time behavior.

> Ninja supplies an endpoint, and it's the hardware that calls it?

As far as I understand it provides a cross platform endpoint, called by a platform specific wrapper. In the Android case this was a background worker and Android considered background workers not real time critical.


Thanks for taking the time to explain! Much appreciated.

> They use a plain android thread that runs at fixed 15ms intervals and buffers exactly one frame in advance

From the article: "...buffers several seconds worth of video and audio data on the device, then delivers video and audio frames one-at-a-time to the device’s playback hardware"

And: "under normal playback the thread copies one frame of data into the Android playback API, then tells the thread scheduler to wait 15 ms and invoke the handler again"

And: "When you create an Android thread, you can request that the thread be run repeatedly, as if in a loop, but it is the Android Thread scheduler that calls the handler, not your own application."

But from OP ...you need to be capable of buffer filling way more than one frame's worth of audio - at least 100ms, preferably much more. Not just for robustness, but also for battery life.

I suspect there is a misapprehension here. "...under normal playback the thread copies one frame of data into the Android playback API, then tells the thread scheduler to wait 15 ms and invoke the handler again". I suspect that one can call that handler more often than every 15ms to fill the playback buffer, and then in normal playback one adds frames one at a time to keep the buffer filled and no faster, but on quick reading it sounds like as the OP stated.


I agree this is bad engineering: don't use Android threads for real-time stuff. This should be running at module level.

A better approach, if the micro you are using allows it, is to have an interrupt when (for example) the I2S buffer is empty. I would then point the DMA to fetch the next buffer (already processed and mixed) and fire the DMA transfer.

> at least 100ms

This calls for huge latency problems. But I understand the approach if your buffer-filling/reading procedures are slow or unreliable.

I disagree with the timer thing. There are systems that provide precise timers for media (for example Win32 multimedia timers[1], that I never used but I know they exist).

[1] https://docs.microsoft.com/en-us/windows/win32/multimedia/ab...


>This calls for huge latency problems

It doesn't mean there is 100ms of latency, it just means that 100ms of audio is buffered so that you have ~100ms of leeway about when you app's audio thread is scheduled. Changes to the audio stream such, as stop/start/volume control, can be achieved with much lower latency using buffer rewriting or by applying changes lower down the stack where the buffers are smaller, or both. By default PulseAudio will buffer ~2000ms of audio from clients [1]

[1] https://freedesktop.org/software/pulseaudio/doxygen/structpa...


It DOES mean there is 100ms of latency.

Anything above 10ms (some say max. 20ms) for real-time audio processing (especially musical instruments) is prohibitive. Imagine an electronic drum set: if you hit the snare and the audio is output 100ms later from the speakers, I bet you'll notice it :)


Ah but GP wasn't talking about real-time in that context ("And if you're not...").

You can still have 100ms buffer without 100ms latency: within 10ms of the drum being hit, write 100ms of the drum sample into the playback buffer and immediately trigger its playback (or write it into the buffer starting at the cursor position that is just about to be played).

The only trouble is when you need to modify some of that 100ms before it is played back, for example if the user hits another drum 50ms later. In that case it becomes more complex, you'd have to overwrite some of the existing buffer with a new mix of both drum samples. The complexity is not worth it for that kind of app.

For a simple app like a video player, the audio stream is much more predictable so you can buffer more. Volume changes and pausing can still be applied with no perceivable latency by modifying the existing buffered data [1]

[1] https://www.freedesktop.org/wiki/Software/PulseAudio/Documen...


> within 10ms of the drum being hit, write 100ms of the drum sample into the playback buffer

You can do that if you can predict the future and fill the buffer with data you don't know (samples from the future). Otherwise, you still have to wait for 100ms of samples to output. So if you have to wait 100ms for the samples, then the output happens after 100ms, hence 100ms of latency.

100ms samples can be fine for a video player. For real-time you (usually) do this: 2 buffers of 10ms each. While one buffer is playing, you fill another buffer with real-time data. After 10ms has passed, you start playing the buffer with real-time data, while the other buffer gets filled.


The point is that in a video player, you can predict the future: the video will continue playing.

> Otherwise, you still have to wait for 100ms of samples to output

No, you don’t, not if you can write into the buffer at an arbitrary point. This is what the whole page I linked is about.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: