Die size increases cost exponentially, by decreasing chips per wafer and decreasing yield.
I expect that this kind of burned-in model is also very difficult to verify (how do you know if some of the weights are off), and not amenable to partial disablement to increase yield. For CPUs, you just laser disable bad cores. Can't forego part of a neural net.
You can ablate surprisingly large chunks of a model with near to no effect, you can try this easily - download an open weight model in torch.
Obviously it’s not ideal but you could likely have single digit % of all weights affected and still have a useful model (many caveats here: e.g. locality of damaged weights matters, distribution of errors matters, fail high/low matters, …)
I mean, you probably can just turn off defective parts of the network. You better believe if this becomes popular they would salvage yields by selling "dumber" chips at a discount.
I expect that this kind of burned-in model is also very difficult to verify (how do you know if some of the weights are off), and not amenable to partial disablement to increase yield. For CPUs, you just laser disable bad cores. Can't forego part of a neural net.