> spaCY, the leading open-source NLP tool? Sounds like marketing BS. what about ...

abecode · on Aug 4, 2017

Anecdotally, I got better NER results with spaCY than OpenNLP and CoreNLP with their respective default models, and spaCY was easier to install (though I'm biased, being more familiar with Python tooling and documentation style). I was eventually implementing in Java so I did use OpenNLP for sentence splitting, but I retrained the NER with data bootstrapped from spaCY, in a way similar to what the Prodigy tool is aiming to facilitate, by first classifying using the default/vanilla model and then manually correcting labels where they were incorrect.

nrub · on Aug 4, 2017

They have some hard comparisons in some of their earlier blog posts on how spacy compares to the other popular open source NLP libraries. In my experience it has been much easier to use and faster than things like Stanford's library or NLTK. In general it's aimed at production or commercial use, whereas the other libraries I typically hear mentioned are aimed at a more academic audience.

nl · on Aug 5, 2017

OpenNLP is.... well I've never heard of anyone using it (except once in a ensemble). I think NLTK is more widely used.

Stanford CoreNLP give good accuracy and is pretty much the benchmark in English for accuracy. BUT it isn't great software. It falls over if you pass large amounts of text to it, the code is dreadful, it's hard to integrate (even in Java because of its own wacky config system), various parts aren't integrated (eg, SUTime), it doesn't have an embedding representation and it is pretty slow.

Having said all that I still use it sometimes. But Spacy is much nicer to use, and 99% (probably more) of the time the slightly lower accuracy is offset by things like the easy availability of word embedding right with the word tags.

I think it's pretty fair to say Spacy is the leading open-source NLP tool.

samfriedman · on Aug 4, 2017

This is a product from the same devs as spaCy, Explosion.

https://github.com/explosion

dna_polymerase · on Aug 4, 2017

I have to agree. Explosion/Spacy has a lot of Marketing BS. That said I think SpaCy is actually pretty solid and when you are in the NLP field give it a try.

kafkaesq · on Aug 4, 2017

spaCY, the leading open-source NLP tool?

Agreed, the description is definitely cringe-worthy.

As if whoever wrote that wasn't aware that these are language geeks they're marketing to.

retainingwall · on Aug 4, 2017

Self-respecting language geeks keep up with the times. What's your case for "leading open-source"? Here's a look at Spacy blowing Stanford Core NLP out of the water (via github stars, you can take a look at commits and more from the same tool): https://www.datascience.com/trends?trends=4812,7214,7165&tre...

kafkaesq · on Aug 4, 2017

Actually I don't follow any of these tools closely to know whether they're currently "leading" or not.

It's just that, wording-wise "the leading open source X" exudes marketing-speak, which I find language geeks tend to have robust antibodies against.

This kind of lingo work (sort of) for the market, say, MongoDB is in. But for the users of these tools, I suspect not so much.

1024core · on Aug 4, 2017

When a new tool is announced, there is a lot of casual interest. Hence the 'explosion' in the beginning. After some time, things reach steady-state, and you can see that Spacy's interest is starting to fall below Core NLP's in the last few weeks.

In other words: Spacy is sinking.

visarga · on Aug 5, 2017

> spaCY, the leading open-source NLP tool?

* only supports 3 languages, though