Surprised that "controlling cost" isn't a section in this post. Here's my attemp...

sagarpatil · on April 20, 2025

If I have to be so cautious while using a tool might as well write the code myself lol. I’ve used Claude Code extensively and it is one of the best AI IDE. It just gets things done. The only downside is the cost. I was averaging $35-$40/day. At this cost, I’d rather just use Cursor/Windsurf.

BeetleB · on April 19, 2025

Oh wow. Reading your comment guarantees I'll never use Claude Code.

I use Aider. It's awesome. You explicitly specify the files. You don't have to do work to limit context.

jjallen · on April 19, 2025

Not having to specify files is a humongous feature for me. Having to remember which file code is in is half the work once you pass a certain codebase size.

carpo · on April 20, 2025

Use /context <prompt> to have aider automatically add the files based on the prompt. It's been working well for me.

m3kw9 · on April 20, 2025

That sometimes work sometimes doesn’t and takes 10x time. Same with codex. I would have both and switch between them depending on what you feel will get it right better

boredtofears · on April 19, 2025

Yeah, I tried CC out and quickly noticed it was spending $5+ for simple LLM capable tasks. I rarely break $1-2 a session using aider. Aider feels like more of a precision tool. I like having the ability to manually specify.

I do find Claude Code to be really good at exploration though - like checking out a repository I'm unfamiliar with and then asking questions about it.

LeafItAlone · on April 19, 2025

Aider is a great tool. I do love it. But I find I have to do more with it to get the same output as Claude Code (no matter what LLM I used with Aider). Sure it may end up being cheaper per run, but not when my time is factored in. The flip side is I find Aider much easier to limit.

Game_Ender · on April 19, 2025

What are those extra things you have to do more of? I only have experience with Aider so I am curious what I am missing here.

simonw · on April 19, 2025

With Claude Code you can at least type "/code" at any point to see how much it's spent, and it will show you when you end a session (with Ctrl+C) too.

The output of /cost looks like this:

  > /cost 
    ⎿  Total cost: $0.1331
       Total duration (API): 1m 13.1s
       Total duration (wall): 1m 21.3s

BeetleB · on April 21, 2025

Aider shows how much you've spent after each command :-). It shows the cost of the command as well as the session.

aitchnyu · on April 22, 2025

After switching to Aider, I realized the other tools have been playing elaborate games to choose cheaper models and to limit files and messages in context, both of which increase their bills.

Jerry2 · on April 19, 2025

>I use Aider. It's awesome.

What do you use for the model? Claude? Gemini? o3?

BeetleB · on April 21, 2025

Currently using Sonnet 3.7, but mostly because I've been too lazy to set up an account with Google.

aitchnyu · on April 22, 2025

Get an Openrouter account and you can play with almost all providers, I was burning money on Claude, tried V3 (blocked Deepseek provider for being flaky, let the laypeople mock them) and experimental and GA Gemini models.

m3kw9 · on April 20, 2025

Gemini 2.5 pro is my choice

kiratp · on April 19, 2025

The productivity boost can be so massive that this amount of fiddling to control costs is counterproductive.

Developers tend to seriously underestimate the opportunity cost of their own time.

Hint - it’s many multiples of your total compensation broken down to 40 hour work weeks.

Aurornis · on April 20, 2025

The cost of the task scales with how long it takes, plus or minus.

Substitute “cost” with “time” in the above post and all of the same tips are still valuable.

I don’t do much agentic LLM coding but the speed (or lack thereof) was one of my least favorite parts. Using any tricks that narrow scope, prevent reprocessing files over and over again, or searching through the codebase are all helpful even if you don’t care about the dollar amount.

pizza · on April 19, 2025

Hard agree. Whether it's 50 cents or 10 dollars per session, I'm using it to get work done for the sake of quickly completing work that aims to unblock many orders of magnitude more value. But in so far as cheaper correct sessions correlate with sessions where the problem solving was more efficient anyhow, they're fairly solid tips.

afiodorov · on April 19, 2025

I agree but optimisation often reveals implementation details helping to understand limits of current tech more. It might not be worth the time but part of engineering is optimisation and another part is deep understanding of tech. It is sometimes worth optimising anyway if you want to take the engineering discipline to the next level within yourself.

I myself didn’t think about not running linters however it makes obvious sense now and gives me the insight about how Claude Code works allowing me to use this insight in related engineering work.

jillesvangurp · on April 20, 2025

Exactly. I've been using the chat gpt desktop app not because of the model quality but because of the UX. It basically seamlessly integrates with my IDEs (intellij and vs code). Mostly I just do stuff like select a few lines, hit option+shift+1, and say something like "fix this". Nice short prompt and I get the answer relatively quickly. Option+shift+1 opens chat gpt with the open file already added to the context. It sees what lines are selected. And it also sees the output of any test runs on the consoles. So just me saying "fix this" now has a rich context that I don't need to micromanage.

Mostly I just use the 4o model instead of the newer better models because it is faster. It's good enough mostly and I prefer getting a good enough answer quickly than the perfect answer after a few minutes. Mostly what I ask is not rocket science so perfect is the enemy of good here. I rarely have to escalate to better models. The reasoning models are annoyingly slow. Especially when they go down the wrong track, which happens a lot.

And my cost is a predictable 20$/month. The downside is that the scope of what I can ask is more limited. I'd like it to be able to "see" my whole code base instead of just 1 file and for me to not have to micro manage what the model looks at. Claude can do that if you don't care about money. But if you do, you are basically micro managing context. That sounds like monkey work that somebody should automate. And it shouldn't require an Einstein sized artificial brain to do that.

There must be people that are experimenting with using locally running more limited AI models to do all the micromanaging that then escalate to remote models as needed. That's more or less what Apple pitched for Apple AI at some point. Sounds like a good path forward. I'd be curious to learn about coding tools that do something like that.

In terms of cost, I don't actually think it's unreasonable to spend a few hundred dollars per month on this stuff. But I question the added value over the 20$ I'm spending. I don't think the improvement is 20x better. more like 1.5x. And I don't like the unpredictability of this and having to think about how expensive a question is going to be.

I think a lot of the short term improvement is going to be a mix of UX and predictable cost. Currently the tools are still very clunky and a bit dumb. The competition is going to be about predictable speed, cost and quality. There's a lot of room for improvement here.

charlie0 · on April 20, 2025

If this is true, why isn't our compensation scaling with the increases in productivity?

lazzlazzlazz · on April 20, 2025

It usually does, just with a time delay and a strict condition that the firm you work at can actually commercialize your productivity. Apply your systems thinking skills to compensation and it will all make sense.

jjmarr · on April 19, 2025

I don't think about controlling cost because I price my time at US$40/h and virtually all models are cheaper than that (with the exception of o1 or Gemini 2.5 pro).

If I spend $2 instead of $0.50 on a session but I had to spend 6 minutes thinking about context, I haven't gained any money.

owebmaster · on April 19, 2025

Important to remind people this is only true if you have a profitable product, otherwise you’re spending money you haven’t earned.

jasonjmcghee · on April 19, 2025

If your expectation is to produce the same amount of output, you could argue when paying for AI tools, you're choosing to spend money to gain free time.

4 hours coding project X or 3 hours and a short hike with your partner / friends etc

jjmarr · on April 19, 2025

If what I'm doing doesn't have a positive expected value, the correct move isn't to use inferior dev tooling to save money, it's to stop working on it entirely.

oezi · on April 20, 2025

There might be value but you might not receive any of it. Most salaried employees won't see returns.

ngruhn · on April 20, 2025

Come on, every hobby has negative expected value. You're not doing it for the money but it still makes sense to save money.

jasonjmcghee · on April 19, 2025

If you do it a bit, it just becomes habit / no extra time or cognitive load.

Correlation or causation aside, the same people I see complain about cost, complain about quality.

It might indicate more tightly controlled sessions may also produce better results.

Or maybe it's just people that tend to complain about one thing, complain about another.

pclmulqdq · on April 19, 2025

It's interesting that this is a problem for people because I have never spent more than about $0.50 on a task with Claude Code. I have pretty good code hygiene and I tell Claude what to do with clear instructions and guidelines, and Claude does it. I will usually go through a few revisions and then just change anything myself if I find it not quite working. It's exactly like having an eager intern.

irthomasthomas · on April 20, 2025

I assume they use a conversation, so if you compress the prompt immediately you should only break cache once, and still hit cache on subsequent prompts?

So instead of Write Hit Hit Hit

It's Write Write Hit Hit Hit

bugglebeetle · on April 19, 2025

If I have to spend this much time thinking about any of this, congratulations, you’ve designed a product with a terrible UI.

jasonjmcghee · on April 19, 2025

Some tools take more effort to hold properly than others. I'm not saying there's not a lot of room for improvement - or that the ux couldn't hold the users hand more to force things like this in some "assisted mode" but at the end of the day, it's a thin, useful wrapper around an llm, and llms require effort to use effectively.

I definitely get value out of it- more than any other tool like it that I've tried.

oxidant · on April 19, 2025

Think about what you would do in an unfamiliar project with no context and the ticket

"please fix the authorization bug in /api/users/:id".

You'd start by grepping the code base and trying to understand it.

Compare that to, "fix the permission in src/controllers/users.ts in the function `getById`. We need to check the user in the JWT is the same user that is being requested"

troupo · on April 19, 2025

So, AIs are overeager junior developers at best, and not the magical programmer replacements they are advertised as.

lacker · on April 19, 2025

Let's split the difference and call them "magical overeager junior developer replacements".

whywhywhywhy · on April 20, 2025

On a shorter timeline than you'd think none of working with these tools will look like this.

You'll be prompting and evaluating and iterating entirely finished pieces of software and be able to see multiple attempts at each solve at once, none of this deep in the weeds fixing a bug stuff.

We're rapidly approaching a world where a lot of software will be being made without an engineer hire at all, maybe not the hardest most complex or novel software but a lot of software that previously required a team of 3-15 wont have a single dev.

My current estimate is mid 2026

hu3 · on April 20, 2025

my current estimate is 2030. because we can barely get a JS/TS application to compile after a year of dependency updates.

our current popular stack is quicksand.

unless we're talking about .net core, java, Django and more of these stable platforms.

xpe · on April 19, 2025

> So, AIs are overeager junior developers at best, and not the magical programmer replacements they are advertised as.

This may be a quick quip or a rant. But the things we say have a way of reinforcing how we think. So I suggest refining until what we say cuts to the core of the matter. The claim above is a false dichotomy. Let's put aside advertisements and hype. Trying to map between AI capabilities and human ones is complicated. There is high quality writing on this to be found. I recommend reading literature reviews on evals.

troupo · on April 19, 2025

[flagged]

drodgers · on April 19, 2025

Don’t be a dismissive dick; that’s not appropriate for this forum. The above post is clearly trying to engage thoughtfully and offers genuinely good advice.

troupo · on April 20, 2025

The above post produces some vague philosophical statements, and equally vague "juts google it" claims.

xpe · on April 21, 2025

I’m thinking you might be a kind of person that requires very direct feedback. Your flagged comment was unkind and unhelpful. Your follow-up response seems to suggest that you were justified in being rude?

You also mischaracterize my comment two levels up. It didn’t wave you away by saying “just google it”. It said — perhaps not directly enough — that your comment was off track and gave you some ideas to consider and directions to explore.

troupo · on April 22, 2025

> There is high quality writing on this to be found. I recommend reading literature reviews on evals.

This is, quite literally, "just google it".

And yes, I prefer direct feedback, not vague philosophical and pseudo-philosophical statements and vague references. I'm sure there's high quality writing to be found on this, too.

xpe · on April 22, 2025

We have very different ideas of what "literal" means. You _interpreted_ what I wrote as "just Google it". I didn't say those words verbatim _nor_ do I mean that. Use a search engine if you want to find some high-quality papers. Or use Google Scholar. Or go straight to Arxiv. Or ask people on a forum.

> not vague philosophical and pseudo-philosophical statements and vague references

If you stop being so uncharitable, more people might be inclined to engage you. Try to interpret what I wrote as constructive criticism.

Shall we get back to the object level? You wrote:

> AIs are overeager junior developers at best

Again, I'm saying this isn't a good framing. I'm asking you to consider you might be wrong. You don't need to hunker down. You don't need to counter-attack. Instead, you could do more reading and research.

troupo · on April 22, 2025

> We have very different ideas of what "literal" means. You _interpreted_ what I wrote as "just Google it". I didn't say those words verbatim _nor_ do I mean that. Use a search engine if you want to find some high-quality papers. Or use Google Scholar. Or go straight to Arxiv. Or ask people on a forum.

Aka "I will make some vague references to some literature, go Google it"

> Instead, you could do more reading and research.

Instead of vague "just google it", and vague ad hominems you could actually provide constructive feedback.

xpe · on April 25, 2025

Want to try a conversational reset? I'll start.

My disagreement with the claim "AIs are overeager junior developers at best" largely has to do with both understanding what is happening under the hood and well as personal experience. Like many people, I have interacted for thousands of hours with ChatGPT, Claude, Gemini, and others, though my interaction patterns may be unusual -- not sure -- which I would characterize as (a) set expectations with a detailed prelude; (b) frame problems carefully; (c) trust nothing; (d) pushback relentlessly; (e) require 'thinking out loud'; (f) resist bundled solutions; (g) actively guide design and problem-solving dialogues; (h) actively mitigate sycophancy, overconfidence, and hallucination.

I've guided some junior / less experienced developers using many of the same patterns above. More or less, they can be summarized as "be more methodical". While I've found considerable variation in the quality of responses from LLMs, I would not characterize this variation as being anywhere close to that of a junior developer. I grant adjusting my interaction patterns considerably to improve the quality of the experience.

LLMs vary across dimensions of intelligence and capability. Here's my current assessment -- somewhat off the cuff, but I have put thought into it -- (1) LLM recall is superhuman. (2) Contextual awareness is mixed, sometimes unpredictably bad. Getting sufficient context is hard, but IMO this is less of a failure of the LLM or RAG and more about its lack of embodiment in a particular work setting. (3) Speed is generally superhuman. (4) Synthesis is often superhuman. (5) Ready-to-go high-quality all-in-one software solutions are not there yet. (6) Failure modes are painful; e.g. going in circles or waffling.

I should also ask what you mean by "overeager"? I would guess you are referring to the tendency of many LLMs to offer solutions problems despite lacking a way to validate their answers, perhaps even hallucinating API calls that don't exist?

oxidant · on April 20, 2025

The grandparent is talking about how to control cost by focusing the tool. My response was to a comment about how that takes too much thinking.

If you give a junior an overly broad prompt, they are going to have to do a ton of searching and reading to find out what they need to do. If you give them specific instructions, including files, they are more likely to get it right.

I never said they were replacements. At best, they're tools that are incredibly effective when used on the correct type of problem with the right type of prompt.

troupo · on April 22, 2025

> If you give a junior an overly broad prompt, they are going to have to do a ton of

> they're tools that are incredibly effective when used on the correct type of problem with the right type of prompt.

So, a junior developer who has to be told exactly what to do.

As for the "correct type of problem with the right type of prompt", what exactly are those?

oezi · on April 20, 2025

As of April 2025. The pace is so fast that it will overtake seniors within years maybe months.

jdiff · on April 20, 2025

That's been said since at least 2021 (the release date for GitHub Copilot). I think you're overestimating the pace.

apwell23 · on April 20, 2025

overtake ceo by 2026

djtango · on April 20, 2025

I have been quite skeptical of using AI tools and my experiences using them have been frustrating for developing software but power tools usually come with a learning curve while "good product" with clean simplified interface often results in reduced capability.

VIM, Emacs and Excel are obvious power tools which may require you to think but often produce unrivalled productivity for power users

So I don't think the verdict that the product has a bad UI is fair. Natural language interfaces is such a step up from old school APIs with countless flags and parameters

tetha · on April 19, 2025

Mh. Like, I'm deeply impressed what these AI assistants can do by now. But, the list in the parent comment there is very similar to my mental check-list of pair-programming / pair-admin'ing with less experienced people.

I guess "context length" in AIs is what I intuitively tracked with people already. It can be a struggle to connect the Zabbix alert, the ticket and the situation on the system already, even if you don't track down all the zabbix code and scripts. And then we throw in Ansible configuring the thing, and then the business requriements by more, or less controlled dev-teams. And then you realize dev is controlled by impossible sales-terms.

These are scope -- or I guess context -- expansions that cause people to struggle.

sqs · on April 19, 2025

It's fundamentally hard. If you have an easy solution, you can go make a easy few billion dollars.

chewz · on April 19, 2025

My attempt is - Do not use Claude Code at all, it is terrible tool. It is bad at almost everything starting with making simple edits to files.

And most of all Claude Code is overeager to start messing with your code and run unnecessary $$ instead of making sensible plan.

This isn't problem with Claude Sonnet - it is fundamnetal problem with Claude Code.

winrid · on April 19, 2025

I pretty much one shot a scraper from an old Joomla site with 200+ articles to a new WP site, including all users and assets, and converting all the PDFs to articles. It cost me like $3 in tokens.

hu3 · on April 19, 2025

I guess the question the is: can't VScode Copilot do the same for a fixed $20/month? It even has access to all SOTA models like Claude 3.7, Gemini 2.5 Pro and GPT o3

mceachen · on April 19, 2025

Vscode’s agent mode in copilot (even in the insider’s nightly) is a bit rough in my experience: lots of 500 errors, stalls, and outright failures to follow tasks (as if there’s a mismatch between what the ui says it will include in context vs what gets fed to the LLM).

darksaints · on April 19, 2025

I would have thought so, but somehow no. I have a cursor subscription with access to all of those models, and I still consistently get better results from claude code.

winrid · on April 19, 2025

I haven't tried copilot. Mostly because I don't use VSCode, I use jetbrains ides. How do they provide Claude 3.7 for $20/mo with unlimited usage?

KronisLV · on April 20, 2025

Copilot has a pretty good plugin for JetBrains IDEs!

Though their own AI Assistant and Junie might be equally good choices there too.

oezi · on April 20, 2025

By providing bad UI that you don't use it so much.

troupo · on April 19, 2025

was it a wget call feeding into html2pdf?

winrid · on April 19, 2025

no it's a few hundred lines of python to parse weird and inconsistent HTML into json files and CSV files, and then a sync script that can call the WP API to create all the authors as needed, update the articles, and migrate the images

SoftTalker · on April 20, 2025

Plumbing to pipe shit from one sewer to another.

winrid · on April 20, 2025

Yep, don't wanna spend more of my life doing that than I have to!

gundmc · on April 19, 2025

Never edit files manually during a session (that'll bust cache). THIS INCLUDES LINT

Yesterday I gave up and disabled my format-on-save config within VSCode. It was burning way too many tokens with unnecessary file reads after failed diffs. The LLMs still have a decent number of failed diffs, but it helps a lot.

datavirtue · on April 19, 2025

GitHub copilot follows your context perfectly. I don't have to tell it anything about files. I tried this initially and it just screwed up the results.

xpe · on April 19, 2025

> GitHub copilot follows your context perfectly. I don't have to tell it anything about files. I tried this initially and it just screwed up the results.

Just to make sure we're on the same page. There are two things in play. First, a language model's ability to know what file you are referring to. Second, an assistant's ability to make sure the right file is in the context window. In your experience, how does Claude Code compare to Copilot w.r.t (1) and (2)?