aqader's comments

aqader · on Jan 12, 2024

this is really cool, can’t wait to try it out for some ML pipeline development. kudos myles and akshay!

aqader · on March 22, 2023

There are a couple open source implementations. I'll list a couple below:

7B: - https://huggingface.co/tloen/alpaca-lora-7b - https://huggingface.co/ozcur/alpaca-native-4bit 13B: - https://huggingface.co/samwit/alpaca13B-lora - https://huggingface.co/Dogge/alpaca-13b 30B: - https://huggingface.co/baseten/alpaca-30b - https://huggingface.co/Pi3141/alpaca-30B-ggml

underlines · on March 22, 2023

check my summary of resources:

https://github.com/underlines/awesome-marketing-datascience/...

aqader · on March 22, 2023

Depends on the model size. A model like GPT3 that has hundreds of billions of paramaters, you can do few-shot learning with. You'll still pay for the tokens processed and it'll at least linearly increase response times the larger your input is.

Fine-tuning can get you similar results on smaller / faster models. The downside is you have to craft the dataset in the right way. There are trade-offs to both approaches but fwiw, I don't think Alpaca-7b can do few-shot learning.

aqader · on March 22, 2023

Almost. If your dataset contains questions and answers about your own projects documentation, then yes. The UX around how to prompt a fine-tuned model depends on the format of the dataset it's trained on.

One way you can do this is pass your documentation to a larger model (like a GPT3.5 / OSS equivalent) and have it generate the questions/answers. You can then use that dataset to fine-tune something like Llama to get conversation / relevant answers.

underlines · on March 22, 2023

to my understanding, fine tuning is slow and would be quite bad to update. embeddings seems to be the way to go. i don't understand it well enough, but it seems with the langchain framework you can create an embedding of your own data and submit it to the GPT API and i believe emeddings should be a similar principle in llama. at least i did it with diffusers in stablediffusion.

aqader · on March 22, 2023

Yeah, this is running in 8bit mode. The 30b 8bit version we released seems to do a lot better but it requires significantly more compute.

https://huggingface.co/baseten/alpaca-30b

aqader · on March 22, 2023

For this demo, we're using the 8bit version here: https://huggingface.co/tloen/alpaca-lora-7b

We also fine-tuned and OSS'd a 30b version here that you can checkout (on the cleaned 52k Alpaca dataset) https://huggingface.co/baseten/alpaca-30b

UncleOxidant · on March 23, 2023

Can you comment on the '8bit version' from above? Does that mean these parameters are uint8's (converted from the original float16 params)? Looking in your pytorch code I see some float16 declarations.

I've been running alpaca.cpp 13b locally and your 7b model performs much better than it does. I had assumed this was because alpaca.cpp was converting weights to 4bits from float16, but is there some other fine tuning you're doing that might also account for the better performance of chatLLaMA over alpaca.cpp?

underlines · on March 22, 2023

did you use the cleaned and improved alpaca dataset from https://github.com/tloen/alpaca-lora/issues/28 ?

aqader · on March 22, 2023

Yes, we did! The dataset has since been cleaned even more so we're due to update the model.