Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

the models: https://huggingface.co/stabilityai/stablelm-base-alpha-3b, https://huggingface.co/stabilityai/stablelm-base-alpha-7b

There are also tuned version of these models: https://huggingface.co/stabilityai/stablelm-tuned-alpha-3b https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b, these versions are fine-tuned on various chat and instruction-following datasets.

The Github repo mentions that the models will be trained on 1.5T tokens, this is pretty huge in my opinion, the alpha models are trained on 800B tokens. The context lenght is 4096.



These models are huge. I assume they are not quantized down to 4bits yet.


Quantized versions will pop up on huggingface very soon, if they arent already there. It takes basically no time, much less than something like a alpaca finetune.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: