Curious how you guys got training data for this. Did someone have to go through and rate whether or not a sentence was quality or not? And how many training examples did you use? You say it was "difficult to develop a large set" but I'm curious how large that set actually was.
Edit: Also, do you think more data or a "better" or "more sophisticated" model would make the results better? I would guess more data would trump better model, but not sure.
I actually had this issue recently when trying to get training data for a project of mine as well [0], so I built an app [1] as a way to more easily classify documents.
Basically I have simpler interfaces and the ability for multiple people to quickly answer questions like this on a set of data. Easily exportable in the end as well. If you're interested in using that to get some more data on sentences, let me know. I'm really curious how much better the results get with more data, and this could help.
This is really great idea. Actually if there is something you can share along these lines, that would be amazing. I know Crowd Flower has a great "internal only" tool, which is kind of similar to what you are designing, but you have to pay for it. Actually I think there is a huge need for a generic tool along the lines of what you have started to build.
Haven't heard of CrowdFlower but yeah this is along those lines. Pretty similar. But I could definitely make something quick to fit this specifically. I've been looking for other uses along with what I'm doing and this fits exactly. Shoot me an email at the address listed on my profile and I can get going.
Edit: Also, do you think more data or a "better" or "more sophisticated" model would make the results better? I would guess more data would trump better model, but not sure.