Hacker Timesnew | past | comments | ask | show | jobs | submit | eslaught's commentslogin

Please don't take up space in the comment section with accusations. You can report this at the email below and the mods will look at it:

> Please don't post insinuations about astroturfing, shilling, brigading, foreign agents, and the like. It degrades discussion and is usually mistaken. If you're worried about abuse, email hn@ycombinator.com and we'll look at the data.

> https://hackertimes.com/newsguidelines.html


I find it kind of helpful and interesting to see a subset of these called out with a bit of data. Helps keep my LLM detector trained (the one in my brain, that is) and I think it helps a little about expressing the community consensus against this crap. In this case, I'm glad the GP posted something, as it's definitely not mistaken.

Without an empirical methodology it's hard to know how true this is. There are known and well-documented human biases (e.g., placebo effect) that could easily be involved here. And besides that, there's a convincing (but often overlooked on HN) argument to be made that modern LLMs are optimized in the same manner as other attention economy technologies. That is to say, they're addictive in the same general way that the YouTube/TikTok/Facebook/etc. feed algorithms are. They may be useful, but they also manipulate your attention, and it's difficult to disentangle those when the person evaluating the claims is the same person (potentially) being manipulated.

I'd love to see an empirical study that actually dives into this and attempts to show one way or another how true it is. Otherwise it's just all anecdotes.


I don't understand how the placebo effect is a human bias. Is it?

At least in some instances you could frame it that way: You believe that doctors and medicine are effective at treating disease, so when you are sick and a doctor gives you a bottle of sugar pills and you take them, you now interpret your state through the lens that you should feel better. A bias on how you perceive your condition

That's not all that the placebo effect is. But it's probably the aspect that best fits the framing as bias


It's much more than a bias.

You actually get better through placebo, as long as there's a pathway to it that is available to your body.

It's a really weird effect.

The fight isn't against triggering placebo, it's against letting it muddle study results.


I really love the back-and-forth in this mini-thread, I learned a lot about good thinking skills here. Thanks everyone.

> it is a career-ending failure

It depends highly on the field. In history, sure. The point of getting a history PhD is to become a history professor, and you can't do that if you don't get the PhD, and meanwhile history PhDs don't meaningfully open up any other job prospects, so attempting and failing to get a PhD provides negative value.

In CS and many engineering disciplines, there is a long history of people dropping out of PhDs and landing in industry. The industry is therefore much more accustomed to, and therefore accommodating to, people taking this path. Whether it's a maximally efficient use of time is another question, but it's certainly not wasted effort.

But I do agree that it's stressful nonetheless because it still feels like a failure even if it is not actually in reality. I wrote about this when I put down my own PhD journey here [1]. In particular after the control replication (2017) paper, I very nearly quit out of academia entirely despite it being my biggest contribution to the field by far.

[1]: https://elliottslaughter.com/2024/02/legion-paper-history (written without any use of LLMs, for anyone who is wondering)


For context I've been an AI skeptic and am trying as hard as I can to continue to be.

I honestly think we've moved the goalposts. I'm saying this because, for the longest time, I thought that the chasm that AI couldn't cross was generality. By which I mean that you'd train a system, and it would work in that specific setting, and then you'd tweak just about anything at all, and it would fall over. Basically no AI technique truly generalized for the longest time. The new LLM techniques fall over in their own particular ways too, but it's increasingly difficult for even skeptics like me to deny that they provide meaningful value at least some of the time. And largely that's because they generalize so much better than previous systems (though not perfectly).

I've been playing with various models, as well as watching other team members do so. And I've seen Claude identify data races that have sat in our code base for nearly a decade, given a combination of a stack trace, access to the code, and a handful of human-written paragraphs about what the code is doing overall.

This isn't just a matter of adding harnesses. The fields of program analysis and program synthesis are old as dirt, and probably thousands of CS PhD have cut their teeth of trying to solve them. All of those systems had harnesses but they weren't nearly as effective, as general, and as broad as what current frontier LLMs can do. And on top of it all we're driving LLMs with inherently fuzzy natural language, which by definition requires high generality to avoid falling over simply due to the stochastic nature of how humans write prompts.

Now, I agree vehemently with the superficial point that LLMs are "just" text generators. But I think it's also increasingly missing the point given the empirical capabilities that the models clearly have. The real lesson of LLMs is not that they're somehow not text generators, it's that we as a species have somehow encoded intelligence into human language. And along with the new training regimes we've only just discovered how to unlock that.


> I thought that the chasm that AI couldn't cross was generality. By which I mean that you'd train a system, and it would work in that specific setting, and then you'd tweak just about anything at all, and it would fall over. Basically no AI technique truly generalized for the longest time.

That is still true though, transformers didn't cross into generality, instead it let the problem you can train the AI on be bigger.

So, instead of making a general AI, you make an AI that has trained on basically everything. As long as you move far enough away from everything that is on the internet or are close enough to something its overtrained on like memes it fails spectacularly, but of course most things exists in some from on the internet so it can do quite a lot.

The difference between this and a general intelligence like humans is that humans are trained primarily in jungles and woodlands thousands of years ago, yet we still can navigate modern society with those genes using our general ability to adapt to and understand new systems. An AI trained on jungles and woodlands survival wouldn't generalize to modern society like the human model does.

And this makes LLM fundamentally different to how human intelligence works still.


> And I've seen Claude identify data races that have sat in our code base for nearly a decade

how do you know that claude isn't just a very fast monkey with a very fast typewriter that throws things at you until one of them is true ?


Iteration is inherent to how computers work. There's nothing new or interesting about this.

The question is who prunes the space of possible answers. If the LLM spews things at you until it gets one right, then sure, you're in the scenario you outlined (and much less interesting). If it ultimately presents one option to the human, and that option is correct, then that's much more interesting. Even if the process is "monkeys on keyboards", does it matter?

There are plenty of optimization and verification algorithms that rely on "try things at random until you find one that works", but before modern LLMs no one accused these things of being monkeys on keyboards, despite it being literally what these things are.


Of course it doesn't matter indeed. What I was hinting at is if you forget all the times the LLM was wrong and just remember that one time it was right it makes it seem much more magical than it actually might be.

Also how were the data races significant if nobody noticed them for a decade ? Were you all just coming to work and being like "jeez I dont know why this keeps happening" until the LLM found them for you?


I agree with your points. Answering your one question for posterity:

> Also how were the data races significant if nobody noticed them for a decade ?

They only replicated in our CI, so it was mainly an annoyance for those of us doing release engineering (because when you run ~150 jobs you'll inevitably get ~2-4 failures). So it's not that no one noticed, but it was always a matter of prioritization vs other things we were working on at the time.

But that doesn't mean they got zero effort put into them. We tried multiple times to replicate, perhaps a total of 10-20 human hours over a decade or so (spread out between maybe 3 people, all CS PhDs), and never got close enough to a smoking gun to develop a theory of the bug (and therefore, not able to develop a fix).

To be clear, I don't think "proves" anything one way or another, as it's only one data point, but given this is a team of CS PhDs intimately familiar with tools for race detection and debugging, it's notable that the tools meaningfully helped us debug this.


For someone claiming to be an AI skeptic, your post here, and posts in your profile certainly seem to be at least partially AI written.

For someone claiming to be an AI skeptic, you certainly seem to post a lot of pro-AI comments.

Makes me wonder if this is an AI agent prompted to claim to be against AIs but then push AI agenda, much like the fake "walk away" movement.


I have an old account, you can read my history of comments and see if my style has changed. No need to take my word for it.


Tangential off topic, but reminds me of seeing so many defenses for Brexit that started with “I voted Remain but…”

Nowadays when I read “I am an AI skeptic but” I already know the comment is coming from someone that has just downed the kool aid.


It's not just about the increase in volume, it's about the delta between the prompt and the generation.

If the generation merely restates the prompt (possibly in prettier, cleaner language), then usually it's the case that the prompt is shorter and more direct, though possibly less "correct" from a formal language perspective. I've seen friends send me LLM-generated stuff and when I asked to see the prompt, the prompts were honestly better. So why bother with the LLM?

But if you're using the LLM to generate information that goes beyond the prompt, then it's likely that you don't know what you're talking about. Because if you really did, you'd probably be comfortable with a brief note and instructions to go look the rest up on one's own. The desire to generate more comes from either laziness or else a desire to inflate one's own appearance. In either case, the LLM generation isn't terribly useful since anyone could get the same result from the prompt (again).

So I think LLMs contribute not just to a drowning out of human conversation but to semantic drift, because they encourage those of us who are less self-assured to lean into things without really understanding them. A danger in any time but certainly one that is more acute at the moment.


Is there something like this, but without (or with minimal) spoilers?


It's that, but it's also that the incentives are misaligned.

How many supposed "10x" coders actually produced unreadable code that no one else could maintain? But then the effort to produce that code is lauded while the nightmare maintenance of said code is somehow regarded as unimpressive, despite being massively more difficult?

I worry that we're creating a world where it is becoming easy, even trivial, to be that dysfunctional "10x" coder, and dramatically harder to be the competent maintainer. And the existence of AI tools will reinforce the culture gap rather than reducing it.


It's a societal problem we are just seeing the effects in computing now. People have given up, everything is too much, the sociopaths won, they can do what they want with my body mind and soul. Give me convenience or give me death.


Not the same industry but at least one literary agent does this: if you physically print and mail your book proposal, they will respond with a short but polite, physical rejection letter if they reject you.

But I think it's a generational thing. The younger agents I know of just shut down all their submissions when they get overwhelmed, or they start requiring everyone to physically meet them at a conference first.


But this is what I don't get. Writing code is not that hard. If the act of physically typing my code out is a bottleneck to my process, I am doing something wrong. Either I've under-abstracted, or over-abstracted, or flat out have the wrong abstractions. It's time to sit back and figure out why there's a mismatch with the problem domain and come back at it from another direction.

To me this reads like people have learned to put up with poor abstractions for so long that having the LLM take care of it feels like an improvement? It's the classic C++ vs Lisp discussion all over again, but people forgot the old lessons.


> Writing code is not that hard.

It's not that hard, but it's not that easy. If it was easy, everyone would be doing it. I'm a journalist who learned to code because it helped me do some stories that I wouldn't have done otherwise.

But I don't like to type out the code. It's just no fun to me to deal with what seem to me arbitrary syntax choices made by someone decades ago, or to learn new jargon for each language/tool (even though other languages/tools already have jargon for the exact same thing), or to wade through someone's undocumented code to understand how to use an imported function. If I had a choice, I'd rather learn a new human language than a programming one.

I think people like me, who (used to) code out of necessity but don't get much gratification out of it, are one of the primary targets of vibe coding.


I'm pretty damn sure the parent, by saying "writing code" meant the physical act of pushing down buttons to produce text, not the problem solving process that preceeds writing said code.


This. Most people defer the solving of hard problems to when they write the code. This is wrong, and too late to be effective. In one way, using agents to write code forces the thinking to occur closer to the right level - not at the code level - but in another way, if the thinking isn’t done or done correctly, the agent can’t help.


Disagree. No plan survives first contact.

I can spend all the time I want inside my ivory tower, hatching out plans and architecture, but the moment I start hammering letters in the IDE my watertight plan suddenly looks like Swiss cheese: constraints and edge cases that weren't accounted for during planning, flows that turn out to be unfeasible without a clunky implementation, etc...

That's why Writing code has become my favorite method of planning. The code IS the spec, and English is woefully insufficient when it comes to precision.

This makes Agentic workflows even worse because you'll only your architectural flaws much much later down the process.


I also think this is why AI works okay-ish on tiny new greenfield webapps and absolutely doesn't on large legacy software.

You can't accurately plan every little detail in an existing codebase, because you'll only find out about all the edge cases and side effects when trying to work in it.

So, sure, you can plan what your feature is supposed to do, but your plan of how to do that will change the minute you start working in the codebase.


Yeah, I think this is the fundamental thing I'm trying to get at.

If you think through a problem as you're writing the code for it, you're going to end up the wrong creek because you'll have been furiously head down rowing the entire time, paying attention to whatever local problem you were solving or whatever piece of syntax or library trivia or compiler satisfaction game you were doing instead of the bigger picture.

Obviously, before starting writing, you could sit down and write a software design document that worked out the architecture, the algorithms, the domain model, the concurrency, the data flow, the goals, the steps to achieve it and so on; but the problem with doing that without an agent is then it becomes boring. You've basically laid out a plan ahead of time and now you've just got to execute on the plan, which means (even though you might even fairly often revise the plan as you learn unknown unknowns or iterate on the design) that you've kind of sucked all the fun and discovery out of the code rights process. And it sort of means that you've essentially implemented the whole thing twice.

Meanwhile, with a coding agent, you can spend all the time you like building up that initial software design document, or specification, and then you can have it implement that. Basically, you can spend all the time in your hammock thinking through things and looking ahead, but then have that immediately directly translated into pull requests you can accept or iterate on instead of then having to do an intermediate step that repeats the effort of the hammock time.

Crucially, this specification or design document doesn't have to remain static. As you would discover problems or limitations or unknown unknowns, you can revise it and then keep executing on it, meaning it's a living documentation of your overall architecture and goals as they change. This means that you can really stay thinking about the high level instead of getting sucked into the low level. Coding agents also make it much easier to send something off to vibe out a prototype or explore the code base of a library or existing project in detail to figure out the feasibility of some idea, meaning that the parts that traditionally would have been a lot of effort to verify that what your planning makes sense have a much lower activation energy. so you're more likely to actually try things out in the process of building a spec


I believe programming languages are the better language for planning architecture, the algorithms, the domain model, etc... compared to English.

The way I develop mirrors the process of creating said design document. I start with a high level overview, define what Entities the program should represent, define their attributes, etc... only now I'm using a more specific language than English. By creating a class or a TS interface with some code documentation I can use my IDEs capabilities to discover connections between entities.

I can then give the code to an LLM to produce a technical document for managers or something. It'll be a throwaway document because such documents are rarely used for actual decision making.

> Obviously, before starting writing, you could sit down and write a software design document that worked out the architecture, the algorithms, the domain model, the concurrency, the data flow, the goals, the steps to achieve it and so on;

I do this with code, and the IDE is much better than MS Word or whatevah at detecting my logical inconsistencies.


The problem is that you actually can't really model or describe a lot of the things that I do with my specifications using code without just ending up fully writing the low level code. Most languages don't have a type system that actually lets you describe the logic and desired behavior of various parts of the system and which functions should call which other functions and what your concurrency model is and so on without just writing the specific code that does it; in fact, I think the only languages that would allow you to do something like that would have to be like dependently typed languages or languages adjacent to formal methods. This is literally what the point of pseudocode and architecture graphs and so on are for.


Ah, perhaps. I understood it a little more broadly to include everything beyond pseudocode, rather than purely being able to use your fingers. You can solve a problem with pseudocode, and seasoned devs won't have much of an issue converting it to actual code, but it's not a fun process for everyone.


Yeah I basically write pseudocode and let the ai take it from there.


But this is exactly my point: if your "code" is different than your "pseudocode", something is wrong. There's a reason why people call Lisp "executable pseudocode", and it's because it shrinks the gap between the human-level description of what needs to happen and the text that is required to actually get there. (There will always be a gap, because no one understands the requirements perfectly. But at least it won't be exacerbated by irrelevant details.)

To me, reading the prompt example half a dozen levels up, reminds me of Greenspun's tenth rule:

> Any sufficiently complicated C++ program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp. [1]

But now the "program" doesn't even have formal semantics and isn't a permanent artifact. It's like running a compiler and then throwing away the source program and only hand-editing the machine code when you don't like what it does. To me that seems crazy and misses many of the most important lessons from the last half-century.

[1]: https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule (paraphrased to use C++, but applies equally to most similar languages)


Replying to sibling comment:

the problem is that you actually have to implement that high level DSL to get Lisp to look like that, and most DSLs are not going to be able to be as concise and abstract as a natural language description of what you want, and then just making sure it resulted in the right thing — which then I'd want to use AI for, to write that initial boilerplate, from a high level description of what the DSL should do.

And a Lisp macro DSL is not going to help with automating refactors, automatically iterating to take care of small compiler issues or minor bugs without your involvement so you can focus on the overall goal, remembering or discovering specific library APIs or syntax, etc.


I think of it more like moving from sole developer to a small team lead. Which I have experienced in my career a few times.

I still write my code in all the places I care about, but I don’t get stuck on “looking up how to enable websockets when creating the listener before I even pass anything to hyper.”

I do not care to spend hours or days to know that API detail from personal pain, because it is hyper-specific, in both senses of hyper-specific.

(For posterity, it’s `with_upgrades`… thanks chatgpt circa 12 months ago!)


It's not hard, but it's BORING.

I get my dopamine from solving problems, not trying to figure out why that damn API is returning the wrong type of field for three hours. Claude will find it out in minutes - while I do something else. Or from writing 40 slightly different unit tests to cover all the edge cases for said feature.


> it's time to sit back and figure out why there's a mismatch with the problem domain and come back at it from another direction

But this is exactly what LLMs help me with! If I decide I want to shift the abstractions I'm using in a codebase in a big way, I'd usually be discouraged by all the error, lint, and warning chasing I'd need to do to update everything else; with agents I can write the new code (or describe it and have it write it) and then have it set off and update everything else to align: a task that is just varied and context specific enough that refactoring tools wouldn't work, but is repetitive and time consuming enough that it makes sense to pass off to a machine.

The thing is that it's not necessarily a bottleneck in terms of absolute speed (I know my editor well and I'm a fast typist, and LLMs are in their dialup era) but it is a bottleneck in terms of motivation, when some refactor or change in algorithm I want to make requires a lot of changes all over a codebase, that are boring to make but not quite rote enough to handle with sed or IDE refactoring. It really isn't, for me, even mostly about the inconvenience of typing out the initial code. It's about the inconvenience of trying to munge text from one state to another, or handle big refactors that require a lot of little mostly rote changes in a lot of places; but it's also about dealing with APIs or libraries where I don't want to have to constantly remind myself what functions to use, what to pass as arguments, what config data I need to construct to pass in, etc, or spend hours trawling through docs to figure out how to do something with a library when I can just feed its source code directly to an LLM and have it figure it out. There's a lot of friction and snags to writing code beyond typing that has nothing to do with having come up with a wrong abstraction, that very often lead to me missing the forest for the trees when I'm in the weeds.

Also, there is ALWAYS boilerplate scaffolding to do, even with the most macrotastic Lisp; and let's be real: Lisp macros have their own severe downsides in return for eliminating boilerplate, and Lisp itself is not really the best language (in terms of ecosystem, toolchain, runtime, performance) for many or most tasks someone like me might want to do, and languages adapted to the runtime and performance constraints of their domain may be more verbose.

Which means that, yes, we're using languages that have more boilerplate and scaffolding to do than strictly ideally necessary, which is part of why we like LLMs, but that's just the thing: LLMs give you the boilerplate eliminating benefits of Lisp without having to give up the massive benefits in other areas of whatever other language you wanted to use, and without having to write and debug macro soup and deal with private languages.

There's also how staying out of the code writing oar wells changes how you think about code as well:

If you think through a problem as you're writing the code for it, you're going to end up the wrong creek because you'll have been furiously head down rowing the entire time, paying attention to whatever local problem you were solving or whatever piece of syntax or library trivia or compiler satisfaction game you were doing instead of the bigger picture.

Obviously, before starting writing, you could sit down and write a software design document that worked out the architecture, the algorithms, the domain model, the concurrency, the data flow, the goals, the steps to achieve it and so on; but the problem with doing that without an agent is then it becomes boring. You've basically laid out a plan ahead of time and now you've just got to execute on the plan, which means (even though you might even fairly often revise the plan as you learn unknown unknowns or iterate on the design) that you've kind of sucked all the fun and discovery out of the code rights process. And it sort of means that you've essentially implemented the whole thing twice.

Meanwhile, with a coding agent, you can spend all the time you like building up that initial software design document, or specification, and then you can have it implement that. Basically, you can spend all the time in your hammock thinking through things and looking ahead, but then have that immediately directly translated into pull requests you can accept or iterate on instead of then having to do an intermediate step that repeats the effort of the hammock tim


Here's a paper from September 2025 that compares programs for (a) semantic equivalence (do they do the same thing) and (b) syntactic similarity (are the parse trees similar).

LLMs are more likely to judge programs (correctly or incorrectly) as being semantically equivalent when they are syntactically similar, even though syntactically similar programs can actually do drastically different things. In fact LLMs are generally pretty bad at program equivalence, suggesting they don't really "understand" what programs are doing, even for a fairly mechanical definition of "understand".

https://arxiv.org/pdf/2502.12466

While this is a point in time study and I'm sure all these tools will evolve, this matches my intuition for how LLMs behave and the kinds of mistakes they make.

By comparison the approach in this article seems narrow and doesn't explain a whole lot, and more importantly doesn't give us any hypotheses we can actually test against these systems.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: