Curious how you guys got training data for this. Did someone have to go through ...

bowlesbe · on Nov 19, 2016

Thanks for the comment! There were a few hundred sentences of each, collected internally from from a wide number of descriptions.

Yes, I'd definitely agree- more data is what we need here for further model improvements.

jackschultz · on Nov 19, 2016

I actually had this issue recently when trying to get training data for a project of mine as well [0], so I built an app [1] as a way to more easily classify documents.

Basically I have simpler interfaces and the ability for multiple people to quickly answer questions like this on a set of data. Easily exportable in the end as well. If you're interested in using that to get some more data on sentences, let me know. I'm really curious how much better the results get with more data, and this could help.

[0] https://bigishdata.com/2016/11/01/classifying-country-music-...

[1] https://fierce-mountain-21498.herokuapp.com/

bowlesbe · on Nov 19, 2016

This is really great idea. Actually if there is something you can share along these lines, that would be amazing. I know Crowd Flower has a great "internal only" tool, which is kind of similar to what you are designing, but you have to pay for it. Actually I think there is a huge need for a generic tool along the lines of what you have started to build.

jackschultz · on Nov 19, 2016

Haven't heard of CrowdFlower but yeah this is along those lines. Pretty similar. But I could definitely make something quick to fit this specifically. I've been looking for other uses along with what I'm doing and this fits exactly. Shoot me an email at the address listed on my profile and I can get going.