Hacker Timesnew | past | comments | ask | show | jobs | submitlogin
Unweight: We compressed an LLM 22% without sacrificing quality (cloudflare.com)
5 points by subset 62 days ago | hide | past | favorite | 1 comment


I love these optimization tales. Memory throughput bottlenecks (extremely common, perhaps moreso than they seem) are my favorite to tackle - there are frequently some juicy optimizations that can apply there.

Do model weights have any spatial locality that can be exploited? If so, there are some more general pre-compression techniques that might be interesting to try, e.g. bitshuffle is one I've worked with (https://github.com/kiyo-masui/bitshuffle).

Another fun fact: in some scenarios (depends a lot on CPU and memory characteristics), gzip+memcpy+gunzip can be faster end-to-end than just memcpy. I forget where I first heard this but my familiarity comes from the blosc compression library.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: