Surprised that "controlling cost" isn't a section in this post. Here's my attempt.
---
If you get a hang of controlling costs, it's much cheaper. If you're exhausting the context window, I would not be surprised if you're seeing high cost.
Be aware of the "cache".
Tell it to read specific files (and only those!), if you don't, it'll read unnecessary files, or repeatedly read sections of files or even search through files.
Avoid letting it search - even halt it. Find / rg can have a thousands of tokens of output depending on the search.
Never edit files manually during a session (that'll bust cache). THIS INCLUDES LINT.
The cache also goes away after 5-15 minutes or so (not sure) - so avoid leaving sessions open and coming back later.
Never use /compact (that'll bust cache, if you need to, you're going back and forth too much or using too many files at once).
Don't let files get too big (it's good hygiene too) to keep the context window sizes smaller.
Have a clear goal in mind and keep sessions to as few messages as possible.
Write / generate markdown files with needed documentation using claude.ai, and save those as files in the repo and tell it to read that file as part of a question.
I'm at about ~$0.5-0.75 for most "tasks" I give it. I'm not a super heavy user, but it definitely helps me (it's like having a super focused smart intern that makes dumb mistakes).
If i need to feed it a ton of docs etc. for some task, it'll be more in the few $, rather than < $1. But I really only do this to try some prototype with a library claude doesn't know about (or is outdated).
For hobby stuff, it adds up - totally.
For a company, massively worth it. Insanely cheap productivity boost (if developers are responsible / don't get lazy / don't misuse it).
If I have to be so cautious while using a tool might as well write the code myself lol.
I’ve used Claude Code extensively and it is one of the best AI IDE. It just gets things done.
The only downside is the cost. I was averaging $35-$40/day. At this cost, I’d rather just use Cursor/Windsurf.
Not having to specify files is a humongous feature for me. Having to remember which file code is in is half the work once you pass a certain codebase size.
That sometimes work sometimes doesn’t and takes 10x time. Same with codex. I would have both and switch between them depending on what you feel will get it right better
Yeah, I tried CC out and quickly noticed it was spending $5+ for simple LLM capable tasks. I rarely break $1-2 a session using aider. Aider feels like more of a precision tool. I like having the ability to manually specify.
I do find Claude Code to be really good at exploration though - like checking out a repository I'm unfamiliar with and then asking questions about it.
Aider is a great tool. I do love it. But I find I have to do more with it to get the same output as Claude Code (no matter what LLM I used with Aider). Sure it may end up being cheaper per run, but not when my time is factored in.
The flip side is I find Aider much easier to limit.
After switching to Aider, I realized the other tools have been playing elaborate games to choose cheaper models and to limit files and messages in context, both of which increase their bills.
Get an Openrouter account and you can play with almost all providers, I was burning money on Claude, tried V3 (blocked Deepseek provider for being flaky, let the laypeople mock them) and experimental and GA Gemini models.
The cost of the task scales with how long it takes, plus or minus.
Substitute “cost” with “time” in the above post and all of the same tips are still valuable.
I don’t do much agentic LLM coding but the speed (or lack thereof) was one of my least favorite parts. Using any tricks that narrow scope, prevent reprocessing files over and over again, or searching through the codebase are all helpful even if you don’t care about the dollar amount.
Hard agree. Whether it's 50 cents or 10 dollars per session, I'm using it to get work done for the sake of quickly completing work that aims to unblock many orders of magnitude more value. But in so far as cheaper correct sessions correlate with sessions where the problem solving was more efficient anyhow, they're fairly solid tips.
I agree but optimisation often reveals implementation details helping to understand limits of current tech more. It might not be worth the time but part of engineering is optimisation and another part is deep understanding of tech. It is sometimes worth optimising anyway if you want to take the engineering discipline to the next level within yourself.
I myself didn’t think about not running linters however it makes obvious sense now and gives me the insight about how Claude Code works allowing me to use this insight in related engineering work.
Exactly. I've been using the chat gpt desktop app not because of the model quality but because of the UX. It basically seamlessly integrates with my IDEs (intellij and vs code). Mostly I just do stuff like select a few lines, hit option+shift+1, and say something like "fix this". Nice short prompt and I get the answer relatively quickly. Option+shift+1 opens chat gpt with the open file already added to the context. It sees what lines are selected. And it also sees the output of any test runs on the consoles. So just me saying "fix this" now has a rich context that I don't need to micromanage.
Mostly I just use the 4o model instead of the newer better models because it is faster. It's good enough mostly and I prefer getting a good enough answer quickly than the perfect answer after a few minutes. Mostly what I ask is not rocket science so perfect is the enemy of good here. I rarely have to escalate to better models. The reasoning models are annoyingly slow. Especially when they go down the wrong track, which happens a lot.
And my cost is a predictable 20$/month. The downside is that the scope of what I can ask is more limited. I'd like it to be able to "see" my whole code base instead of just 1 file and for me to not have to micro manage what the model looks at. Claude can do that if you don't care about money. But if you do, you are basically micro managing context. That sounds like monkey work that somebody should automate. And it shouldn't require an Einstein sized artificial brain to do that.
There must be people that are experimenting with using locally running more limited AI models to do all the micromanaging that then escalate to remote models as needed. That's more or less what Apple pitched for Apple AI at some point. Sounds like a good path forward. I'd be curious to learn about coding tools that do something like that.
In terms of cost, I don't actually think it's unreasonable to spend a few hundred dollars per month on this stuff. But I question the added value over the 20$ I'm spending. I don't think the improvement is 20x better. more like 1.5x. And I don't like the unpredictability of this and having to think about how expensive a question is going to be.
I think a lot of the short term improvement is going to be a mix of UX and predictable cost. Currently the tools are still very clunky and a bit dumb. The competition is going to be about predictable speed, cost and quality. There's a lot of room for improvement here.
It usually does, just with a time delay and a strict condition that the firm you work at can actually commercialize your productivity. Apply your systems thinking skills to compensation and it will all make sense.
I don't think about controlling cost because I price my time at US$40/h and virtually all models are cheaper than that (with the exception of o1 or Gemini 2.5 pro).
If I spend $2 instead of $0.50 on a session but I had to spend 6 minutes thinking about context, I haven't gained any money.
If your expectation is to produce the same amount of output, you could argue when paying for AI tools, you're choosing to spend money to gain free time.
4 hours coding project X or 3 hours and a short hike with your partner / friends etc
If what I'm doing doesn't have a positive expected value, the correct move isn't to use inferior dev tooling to save money, it's to stop working on it entirely.
It's interesting that this is a problem for people because I have never spent more than about $0.50 on a task with Claude Code. I have pretty good code hygiene and I tell Claude what to do with clear instructions and guidelines, and Claude does it. I will usually go through a few revisions and then just change anything myself if I find it not quite working. It's exactly like having an eager intern.
I assume they use a conversation, so if you compress the prompt immediately you should only break cache once, and still hit cache on subsequent prompts?
Some tools take more effort to hold properly than others. I'm not saying there's not a lot of room for improvement - or that the ux couldn't hold the users hand more to force things like this in some "assisted mode" but at the end of the day, it's a thin, useful wrapper around an llm, and llms require effort to use effectively.
I definitely get value out of it- more than any other tool like it that I've tried.
Think about what you would do in an unfamiliar project with no context and the ticket
"please fix the authorization bug in /api/users/:id".
You'd start by grepping the code base and trying to understand it.
Compare that to, "fix the permission in src/controllers/users.ts in the function `getById`. We need to check the user in the JWT is the same user that is being requested"
On a shorter timeline than you'd think none of working with these tools will look like this.
You'll be prompting and evaluating and iterating entirely finished pieces of software and be able to see multiple attempts at each solve at once, none of this deep in the weeds fixing a bug stuff.
We're rapidly approaching a world where a lot of software will be being made without an engineer hire at all, maybe not the hardest most complex or novel software but a lot of software that previously required a team of 3-15 wont have a single dev.
> So, AIs are overeager junior developers at best, and not the magical programmer replacements they are advertised as.
This may be a quick quip or a rant. But the things we say have a way of reinforcing how we think. So I suggest refining until what we say cuts to the core of the matter. The claim above is a false dichotomy. Let's put aside advertisements and hype. Trying to map between AI capabilities and human ones is complicated. There is high quality writing on this to be found. I recommend reading literature reviews on evals.
Don’t be a dismissive dick; that’s not appropriate for this forum. The above post is clearly trying to engage thoughtfully and offers genuinely good advice.
I’m thinking you might be a kind of person that requires very direct feedback. Your flagged comment was unkind and unhelpful. Your follow-up response seems to suggest that you were justified in being rude?
You also mischaracterize my comment two levels up. It didn’t wave you away by saying “just google it”. It said — perhaps not directly enough — that your comment was off track and gave you some ideas to consider and directions to explore.
> There is high quality writing on this to be found. I recommend reading literature reviews on evals.
This is, quite literally, "just google it".
And yes, I prefer direct feedback, not vague philosophical and pseudo-philosophical statements and vague references. I'm sure there's high quality writing to be found on this, too.
We have very different ideas of what "literal" means. You _interpreted_ what I wrote as "just Google it". I didn't say those words verbatim _nor_ do I mean that. Use a search engine if you want to find some high-quality papers. Or use Google Scholar. Or go straight to Arxiv. Or ask people on a forum.
> not vague philosophical and pseudo-philosophical statements and vague references
If you stop being so uncharitable, more people might be inclined to engage you. Try to interpret what I wrote as constructive criticism.
Shall we get back to the object level? You wrote:
> AIs are overeager junior developers at best
Again, I'm saying this isn't a good framing. I'm asking you to consider you might be wrong. You don't need to hunker down. You don't need to counter-attack. Instead, you could do more reading and research.
> We have very different ideas of what "literal" means. You _interpreted_ what I wrote as "just Google it". I didn't say those words verbatim _nor_ do I mean that. Use a search engine if you want to find some high-quality papers. Or use Google Scholar. Or go straight to Arxiv. Or ask people on a forum.
Aka "I will make some vague references to some literature, go Google it"
> Instead, you could do more reading and research.
Instead of vague "just google it", and vague ad hominems you could actually provide constructive feedback.
My disagreement with the claim "AIs are overeager junior developers at best" largely has to do with both understanding what is happening under the hood and well as personal experience. Like many people, I have interacted for thousands of hours with ChatGPT, Claude, Gemini, and others, though my interaction patterns may be unusual -- not sure -- which I would characterize as (a) set expectations with a detailed prelude; (b) frame problems carefully; (c) trust nothing; (d) pushback relentlessly; (e) require 'thinking out loud'; (f) resist bundled solutions; (g) actively guide design and problem-solving dialogues; (h) actively mitigate sycophancy, overconfidence, and hallucination.
I've guided some junior / less experienced developers using many of the same patterns above. More or less, they can be summarized as "be more methodical". While I've found considerable variation in the quality of responses from LLMs, I would not characterize this variation as being anywhere close to that of a junior developer. I grant adjusting my interaction patterns considerably to improve the quality of the experience.
LLMs vary across dimensions of intelligence and capability. Here's my current assessment -- somewhat off the cuff, but I have put thought into it -- (1) LLM recall is superhuman. (2) Contextual awareness is mixed, sometimes unpredictably bad. Getting sufficient context is hard, but IMO this is less of a failure of the LLM or RAG and more about its lack of embodiment in a particular work setting. (3) Speed is generally superhuman. (4) Synthesis is often superhuman. (5) Ready-to-go high-quality all-in-one software solutions are not there yet. (6) Failure modes are painful; e.g. going in circles or waffling.
I should also ask what you mean by "overeager"? I would guess you are referring to the tendency of many LLMs to offer solutions problems despite lacking a way to validate their answers, perhaps even hallucinating API calls that don't exist?
The grandparent is talking about how to control cost by focusing the tool. My response was to a comment about how that takes too much thinking.
If you give a junior an overly broad prompt, they are going to have to do a ton of searching and reading to find out what they need to do. If you give them specific instructions, including files, they are more likely to get it right.
I never said they were replacements. At best, they're tools that are incredibly effective when used on the correct type of problem with the right type of prompt.
I have been quite skeptical of using AI tools and my experiences using them have been frustrating for developing software but power tools usually come with a learning curve while "good product" with clean simplified interface often results in reduced capability.
VIM, Emacs and Excel are obvious power tools which may require you to think but often produce unrivalled productivity for power users
So I don't think the verdict that the product has a bad UI is fair. Natural language interfaces is such a step up from old school APIs with countless flags and parameters
Mh. Like, I'm deeply impressed what these AI assistants can do by now. But, the list in the parent comment there is very similar to my mental check-list of pair-programming / pair-admin'ing with less experienced people.
I guess "context length" in AIs is what I intuitively tracked with people already. It can be a struggle to connect the Zabbix alert, the ticket and the situation on the system already, even if you don't track down all the zabbix code and scripts. And then we throw in Ansible configuring the thing, and then the business requriements by more, or less controlled dev-teams. And then you realize dev is controlled by impossible sales-terms.
These are scope -- or I guess context -- expansions that cause people to struggle.
I pretty much one shot a scraper from an old Joomla site with 200+ articles to a new WP site, including all users and assets, and converting all the PDFs to articles. It cost me like $3 in tokens.
I guess the question the is: can't VScode Copilot do the same for a fixed $20/month? It even has access to all SOTA models like Claude 3.7, Gemini 2.5 Pro and GPT o3
Vscode’s agent mode in copilot (even in the insider’s nightly) is a bit rough in my experience: lots of 500 errors, stalls, and outright failures to follow tasks (as if there’s a mismatch between what the ui says it will include in context vs what gets fed to the LLM).
I would have thought so, but somehow no. I have a cursor subscription with access to all of those models, and I still consistently get better results from claude code.
no it's a few hundred lines of python to parse weird and inconsistent HTML into json files and CSV files, and then a sync script that can call the WP API to create all the authors as needed, update the articles, and migrate the images
Never edit files manually during a session (that'll bust cache). THIS INCLUDES LINT
Yesterday I gave up and disabled my format-on-save config within VSCode. It was burning way too many tokens with unnecessary file reads after failed diffs. The LLMs still have a decent number of failed diffs, but it helps a lot.
GitHub copilot follows your context perfectly. I don't have to tell it anything about files. I tried this initially and it just screwed up the results.
> GitHub copilot follows your context perfectly. I don't have to tell it anything about files. I tried this initially and it just screwed up the results.
Just to make sure we're on the same page. There are two things in play. First, a language model's ability to know what file you are referring to. Second, an assistant's ability to make sure the right file is in the context window. In your experience, how does Claude Code compare to Copilot w.r.t (1) and (2)?
---
If you get a hang of controlling costs, it's much cheaper. If you're exhausting the context window, I would not be surprised if you're seeing high cost.
Be aware of the "cache".
Tell it to read specific files (and only those!), if you don't, it'll read unnecessary files, or repeatedly read sections of files or even search through files.
Avoid letting it search - even halt it. Find / rg can have a thousands of tokens of output depending on the search.
Never edit files manually during a session (that'll bust cache). THIS INCLUDES LINT.
The cache also goes away after 5-15 minutes or so (not sure) - so avoid leaving sessions open and coming back later.
Never use /compact (that'll bust cache, if you need to, you're going back and forth too much or using too many files at once).
Don't let files get too big (it's good hygiene too) to keep the context window sizes smaller.
Have a clear goal in mind and keep sessions to as few messages as possible.
Write / generate markdown files with needed documentation using claude.ai, and save those as files in the repo and tell it to read that file as part of a question. I'm at about ~$0.5-0.75 for most "tasks" I give it. I'm not a super heavy user, but it definitely helps me (it's like having a super focused smart intern that makes dumb mistakes).
If i need to feed it a ton of docs etc. for some task, it'll be more in the few $, rather than < $1. But I really only do this to try some prototype with a library claude doesn't know about (or is outdated). For hobby stuff, it adds up - totally.
For a company, massively worth it. Insanely cheap productivity boost (if developers are responsible / don't get lazy / don't misuse it).