Hacker Timesnew | past | comments | ask | show | jobs | submit | naasking's commentslogin

This sounds great! TurboQuant does KV cache compression using quantization via rotations, and ParoQuant [1] does weight compression using quantization via rotations! So we can get 4-bit weights that match bf16 precision, the KV cache goes down to 3 bits per key. This brings larger models and long contexts into the range of "possibly runnable" on beefy consumer hardware.

[1] https://github.com/z-lab/paroquant


> Yet, the majority of new apps and services that I see are all AI ecosystem stuff.

The same was true of all this computer science stuff too. We built parsers, compilers, calculators, ftp and http, all cool stuff that just builds up our own ecosystem. Look how that turned out.

An ecosystem has to hit a critical mass of sophistication before it breaks out to the mainstream. It's not going to take very long for AI.


> The semantics imposed on the bit strings does not exist anywhere in the arithmetic operations,

Correct, the semantics is actually in the network of relations between the nodes. That has been one of the major lessons of LLMs, and it validates the systems response to Searle's Chinese Room.


Interesting idea, but I hope people just start switching to ParoQuant and eliminate basically all quantization errors relative to fp16/bf16 even going down to 4-bits:

https://github.com/z-lab/paroquant


> 2. it makes good business sense.

What expected ROI are you basing this on? If it made good business sense on its own, it wouldn't be required by law.


Presumably creating a different class for parameter lists allows you to extend it with operations that aren't natural to tuples, like named arguments.

> just the Python process collecting and parsing the metrics of all programs consumed 30-40% of the processing power of the lower end boxes.

Just write the parsing loop in something faster like C or Rust, instead of the whole thing.


> The human genome contains around 1.5GB of information and DeepSeek v3 weighs in at around 800GB, so it's a bit apples-to-oranges.

The apples-to-apples comparison is comparing the human genome to the code behind a particular LLM. The genome defines the structure that learns and thinks, just like the code for the LLM.


Seems like they're relying on the loss as a measure, at least for now.

Great project. On the matter of data efficiency and regularization, I'd love to see someone try scaling GrokAlign!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: