n=50000 of tabular data is a good sample size, and results will likely have a lo...

n=50000 of tabular data is a good sample size, and results will likely have a low standard error assuming no systemic bias. (Although it's not "big" data)

n=50000 of text data is different, since there will be less repetition of contextual structures and words (particularly with proper nouns). The fact that the dataset only uses "hundreds" as mentioned in the original post is interesting.