Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

Are they releasing the weights for download? The links to HuggingFace in the readme are giving me 404. This dataset they built on-top of "The Pile" sounds interesting - looking forward to evaluating their claim that 3-7 billion param models can perform on par with 175 billion param GPT-3



Did they claim this? I didn’t see that claim made in the above post.


"The richness of this dataset gives StableLM surprisingly high performance in conversational and coding tasks, despite its small size of 3 to 7 billion parameters (by comparison, GPT-3 has 175 billion parameters)."

So they did not explicitly say it is comparable, but implicitly compared the two. I'm curious to evaluate what "surprisingly high performance" means exactly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: