Wrt algorithms, it really is an implementation detail of how flexible they make ...

Wrt algorithms, it really is an implementation detail of how flexible they make the crypto engines.

The number of crypto units would be SKU specific depending on the workload. A server box doing service mesh would need one per concurrent flow presumably.

The thing that accelerator offload gets you is an additional thermal budget to spend on general purpose workloads. If you know that you will be doing SERDES, and enc/dec, offloading those to an accelerator frees up watts of TDP (thermal design power) to spend other places. This is also why we see big/little architectures, the OOO processors suck up a lot of power. In-order cores are just fine for latency insensitive workloads.