Hacker Timesnew | past | comments | ask | show | jobs | submit | aqader's commentslogin

this is really cool, can’t wait to try it out for some ML pipeline development. kudos myles and akshay!




Depends on the model size. A model like GPT3 that has hundreds of billions of paramaters, you can do few-shot learning with. You'll still pay for the tokens processed and it'll at least linearly increase response times the larger your input is.

Fine-tuning can get you similar results on smaller / faster models. The downside is you have to craft the dataset in the right way. There are trade-offs to both approaches but fwiw, I don't think Alpaca-7b can do few-shot learning.


Almost. If your dataset contains questions and answers about your own projects documentation, then yes. The UX around how to prompt a fine-tuned model depends on the format of the dataset it's trained on.

One way you can do this is pass your documentation to a larger model (like a GPT3.5 / OSS equivalent) and have it generate the questions/answers. You can then use that dataset to fine-tune something like Llama to get conversation / relevant answers.


to my understanding, fine tuning is slow and would be quite bad to update. embeddings seems to be the way to go. i don't understand it well enough, but it seems with the langchain framework you can create an embedding of your own data and submit it to the GPT API and i believe emeddings should be a similar principle in llama. at least i did it with diffusers in stablediffusion.


Yeah, this is running in 8bit mode. The 30b 8bit version we released seems to do a lot better but it requires significantly more compute.

https://huggingface.co/baseten/alpaca-30b


For this demo, we're using the 8bit version here: https://huggingface.co/tloen/alpaca-lora-7b

We also fine-tuned and OSS'd a 30b version here that you can checkout (on the cleaned 52k Alpaca dataset) https://huggingface.co/baseten/alpaca-30b


Can you comment on the '8bit version' from above? Does that mean these parameters are uint8's (converted from the original float16 params)? Looking in your pytorch code I see some float16 declarations.

I've been running alpaca.cpp 13b locally and your 7b model performs much better than it does. I had assumed this was because alpaca.cpp was converting weights to 4bits from float16, but is there some other fine tuning you're doing that might also account for the better performance of chatLLaMA over alpaca.cpp?


did you use the cleaned and improved alpaca dataset from https://github.com/tloen/alpaca-lora/issues/28 ?


Yes, we did! The dataset has since been cleaned even more so we're due to update the model.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: