Honestly? The exact problem I'm dealing with at work right now.
We're trying to re-write our recommender for artist music stations at iHeartRadio (aka "I'll listen to Drake or Kendrick Lamar's station at the gym today"). Just today, I tried adding negative sampling to the matrix I'm factorizing, hoping it encourages spread in the embeddings learned for artists in certain types of genres.
I have a MS, but not a lot of research experience. It would have taken me a while to find this solution on my own. However, the moment I described this problem to my manager - a PhD graduate with several years of research and industry experience - he immediately suggested negative sampling.
What I learned during my MS helped me grok the math immediately. We're adding noise to the training set and penalizing vectors lengths to avoid overfitting. Easy! Identifying a solution worth exploring? Not easy, at least without a degree or significant experience.
(There's also the chance I should know this, in which case I have some reading to do. ¯\_(ツ)_/¯)
Aren't you sort of glossing over the fact that he is in high up machine learning position at a company that specializes in recommender systems? Doesn't that by itself increase the likelihood that he deeply understands implicit and explicit matrix factorization?
I am a good ways through my masters (second CS degree, first specializing in ML), and the more I learn, the more I realize that on any given topic, there is no guarantee the PhD in the room has the most expertise. Machine learning is a broad field that contains many subfields, methodologies, and many applications. It is a bit like computer systems or software engineering: nobody knows it all, people who are experts have intimate knowledge of a specific subset of the field. Of course, you can more around over time, but it takes years to build up expertise in even two or three subfields of machine learning.
Side note: sounds like we do similar work. I work at Vevo, also do a lot of matrix factorization to learn latent factors of items such as artists, videos, etc.
> Aren't you sort of glossing over the fact that he is in high up machine learning position at a company that specializes in recommender systems? Doesn't that by itself increase the likelihood that he deeply understands implicit and explicit matrix factorization?
Sure thing, but someone in that position needs years of experience in recommender systems, as well as working with researchers.
Folks are hanging on to the PhD part of my claim, instead of the "PhD or experience" part. The fact is, a PhD + prior industry work means the person is close to a decade of relevant background, grad degree or not. They will unstick a co-worker far faster than an experienced backend developer with, say, a year of Keras experience.
> Side note: sounds like we do similar work. I work at Vevo, also do a lot of matrix factorization to learn latent factors of items such as artists, videos, etc.
Seems like it! Email me if you'd like to chat some more offline (it's in my profile).
That has little to do with a PhD, it's the kind of thing you get with experience leading to a deeper understanding.
3D programming started as a field where only PHD's had any deep understanding of what was going on simply because they had experience when nobody else did. You see this pattern repeated frequently, in any complex domain.
The PhD is sufficient but not necessary here, right? A PhD researcher's job description is basically "learn necessary math, become a domain expert, and publish papers advancing that domain." It's difficult (but possible) to gain the same experience in industry if you don't have a graduate degree. Which company would pay you to work through Bishop or Goodfellow for a few months? Even a principal DS doesn't get that deal, much less a junior/associate.
Also remember: my comment addressed non-vanilla cases. In your example, this is the difference between a researcher advancing 3D programming and someone using Unity or Unreal.
I would say PHD is sufficient to advance the field. That's no small thing, but only really overlaps at the start when just about anything advances the field and you need a broad focus.
Machine leaning for sorting peas at high speed is a very well trodden area at this point with a lot of industry specific domain knowledge. I expect self driving cars for example to reach a similar state in ~10-25 years.
The risk with a PHD is you miss the specific wave. But, if you want to stay on the bleeding edge it's probably well worth it.
You can spend many months working through papers and books without a company paying you for that. That's something that I continually do and have always done, in my own time (and many different fields). Sufficient and not necessary indeed.
ali rahimi alludes to the problem of google engineers simply needing to tweak models that were previously tuned by google researchers who do have well-developed intuition [0]. because the intuitions in explicit form are at best heuristic and not necessarily even consistent, signing up to improve a model without them might result in spending indefinite time and compute resources without guarantee of positive results. which is a terrible perf-theoretic strategy...
Model divergence, nonsense predictions. The whole black art of ML (specifically neural nets) is coaxing them into working.
If you take some sophisticated deep neural net and try to train it on a binary classification where tails occurs 99% of the time - unless you specifically take measures to correct for this bias - the net will just learn to predict tails.