The ones I’ve stumbled upon seem to be: switch models based on task complexity, use tooling like ASTs and compression, disable unused MCPs, compact often, be verbose with input to give clear guidance…
I don't think compacting often is good for saving money. It generates more output tokens and then the input is no longer from cache, which is priced differently...typically very differently.
The ones I’ve stumbled upon seem to be: switch models based on task complexity, use tooling like ASTs and compression, disable unused MCPs, compact often, be verbose with input to give clear guidance…