Even if it is “only” the 40% lower end, that is a gargantuan savings. So many gr... | Hacker News

Hacker Timesnew | past | comments | ask | show | jobs | submit

0cf8612b2e1e on May 20, 2024 | parent | context | favorite | on: 26× Faster Inference with Layer-Condensed KV Cache...

Even if it is “only” the 40% lower end, that is a gargantuan savings. So many groups are compute constrained, every bit helps.

josephg on May 20, 2024 [–]

Sure; but 40% improvement is much less than a 26x improvement. If 40% is the realistic figure, cite that. Changing the title to include an outlier of 26x is click baity.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact