Any idea what the training data for this is? Looking at the model, it looks like it is literally just copy-paste from Karpathy's nanoGPT, so the training data is what's most interesting. Pretty amazing anyway.
I found a secret demo page that shows in real time how they assess any sound file's mood swings along with number of detected laughs, coughs, etc. Guessing that ability is involved somehow.