Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

How?


You can finetune 7B in a couple of hours on a $200 3060 with https://github.com/johnsmith0031/alpaca_lora_4bit



That dataset is licensed under CC BY NC 4.0, which is not open. It also has a bunch of garbage in it; see https://github.com/gururise/AlpacaDataCleaned


I wonder what happens if you just feel that dataset back into another LLM to re-write it and filter out the low quality items? IS there still any connection to the original copyright? How would that even be proven?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: