Anecdotally, I got better NER results with spaCY than OpenNLP and CoreNLP with their respective default models, and spaCY was easier to install (though I'm biased, being more familiar with Python tooling and documentation style). I was eventually implementing in Java so I did use OpenNLP for sentence splitting, but I retrained the NER with data bootstrapped from spaCY, in a way similar to what the Prodigy tool is aiming to facilitate, by first classifying using the default/vanilla model and then manually correcting labels where they were incorrect.
They have some hard comparisons in some of their earlier blog posts on how spacy compares to the other popular open source NLP libraries. In my experience it has been much easier to use and faster than things like Stanford's library or NLTK. In general it's aimed at production or commercial use, whereas the other libraries I typically hear mentioned are aimed at a more academic audience.
OpenNLP is.... well I've never heard of anyone using it (except once in a ensemble). I think NLTK is more widely used.
Stanford CoreNLP give good accuracy and is pretty much the benchmark in English for accuracy. BUT it isn't great software. It falls over if you pass large amounts of text to it, the code is dreadful, it's hard to integrate (even in Java because of its own wacky config system), various parts aren't integrated (eg, SUTime), it doesn't have an embedding representation and it is pretty slow.
Having said all that I still use it sometimes. But Spacy is much nicer to use, and 99% (probably more) of the time the slightly lower accuracy is offset by things like the easy availability of word embedding right with the word tags.
I think it's pretty fair to say Spacy is the leading open-source NLP tool.
I have to agree. Explosion/Spacy has a lot of Marketing BS. That said I think SpaCy is actually pretty solid and when you are in the NLP field give it a try.
Self-respecting language geeks keep up with the times. What's your case for "leading open-source"? Here's a look at Spacy blowing Stanford Core NLP out of the water (via github stars, you can take a look at commits and more from the same tool): https://www.datascience.com/trends?trends=4812,7214,7165&tre...
When a new tool is announced, there is a lot of casual interest. Hence the 'explosion' in the beginning. After some time, things reach steady-state, and you can see that Spacy's interest is starting to fall below Core NLP's in the last few weeks.
Sounds like marketing BS. what about OpenNLP and Stanford's for NLP?