Yeah, tree based models are great for tabular datasets that are primarily numeri...

Tenoke · on Sept 5, 2022

>categorical variables have a 1000+ potential values that need 1-hot encoding

You typically do not need to 1-hot encode categorical variables as the common implementations like LightGBM and Catboost have native efficient ways to handle them. Googling around I can't easily find cases where people get better results with GBM+one-hot and I haven't either, though I haven't worked with 1000+ values categorical variables much.

>deep learning almost always outperforms

This doesn't the case in the article we are commenting on, nor on Kaggle but given that DL models occasionally (though rarely) outperform I'm willing to believe this is one of those cases. Any recommendation on which DL models in particular I should test this claim?