There isn't an obvious unsupervised problem to train medical imaging with. What'...

gwern · on March 15, 2023

It's the same thing. Predict the next pixel, or the next token (same way you handle regular images), or infill missing tokens (MAE is particularly cool lately). Those induce the abstractions and understanding which get tapped into.

reubens · on March 15, 2023

There is none. But if the multimodal model is exposed to enough medical knowledge, it may be able to interpret images without specific training

rjtavares · on March 15, 2023

Labelling data is easier, I think. It will just take a while...

asperous · on March 14, 2023

Predict next entry in medical chart?

Presumably all these images would be connected with what ended up happening with the patient months or years later

alexthehurst · on March 15, 2023

If you has this level of data, wouldn’t it be trivial to label the images?

haldujai · on March 15, 2023

It's incredibly hard to disambiguate and accurately label images using the reports (area of my research).

Reports are also not analogous to ground truth labels, and you don't always have histopathologic/clinical outcomes.

You also have drift in knowledge and patient trends, people are on immunotherapy now and we are seeing complications/patterns we didn't see 5 years ago. A renal cyst that would have been follow-up to exclude malignancy before 2018 is now definitively benign, so those reports are not directly usable.

You would have to non-trivially connect this to a knowledge base of some form to disambiguate, one that doesn't currently exist.

And then there's hallucination.

Currently if you could even extract actionable findings, accurately summarize reports and integrate this with workflow you could have a billion dollar company.

Nuance (now owned by Microsoft) can't even autofill my dictation template accurately using free-text to subject headings.