Hacker Timesnew | past | comments | ask | show | jobs | submit | xscott's commentslogin

Can you expand on that. I've been wanting to try Claude for a while, but their payment processing wouldn't take any of my credit cards (they work everywhere else, so it's not the cards). I've heard I can work around this by installing their mobile app or something, but it was extra hurdles, so I didn't try very hard.

And I've been absolutely amazed with Codex. I started using that with version ChatGPT 5.3-Codex, and it was so much better than online ChatGPT 5.2, even sticking to single page apps which both can do. I don't have any way to measure the "smarts" for of the new 5.4, but it seems similar.

Anyways, I'll try to get Claude running if it's better in some significant way. I'm happy enough the the Codex GUI on MacOS, but that's just one of several things that could be different between them.


Codex is not bad, I think it is still useful. But I find that it takes things far too literally, and is generally less collaborative. It is a bit like working with a robot that makes no effort to understand why a user is asking for something.

Claude, IMO, is much better at empathizing with me as a user: It asks better questions, tries harder to understand WHY I'm trying to do something, and is more likely to tell me if there's a better way.

Both have plenty of flaws. Codex might be better if you want to set it loose on a well-defined problem and let it churn overnight. But if you want a back-and-forth collaboration, I find Claude far better.


That is interesting, and thank you.

I've had a list of pet projects that I've been adding to for years. For those, I just say the broad strokes and tell it to do it's best. Codex has done a really good job for most of them, sometimes in one shot, and my list of experiments is emptying. Only one notable exception where it had no idea what I was after.

I also have my larger project, which I hope to actually keep and use it. Same thing though, it's really hard to explain what's going on, and it acts on bad assumptions.

So if Claude is better at that, then having two tools makes a lot of sense to me.


> I've been wanting to try Claude for a while, but their payment processing wouldn't take any of my credit cards (they work everywhere else, so it's not the cards). I've heard I can work around this by installing their mobile app or something, but it was extra hurdles, so I didn't try very hard.

Not Claude Code specifically, but you can try the Claude Opus and Sonnet 4.6 models for free using Google Antigravity.


Thank you for this. I had Antigravity already but was thinking of cancelling it because Gemini frustrates me. Using it with Claude though was very impressive. I burned through my token budget in about 5 hours though.

I think it would be cool if a language specifically for LLMs came about. It should have something like required preconditions and postconditions so that a deterministic compiler can verify the assumptions the LLM is claiming. Something like a theorem prover, but targeted specifically for programming and efficient compilation/runtime. And it doesn't need all the niceties human programmers tend to prefer (implicit conversions comes to mind).


If you're that confident in the LLM's output, just train it to output some kind of intermediate language, or even machine code.

And if you're not that confident, shouldn't you still be optimising for humans, because humans have to check the LLM's output?


At least in programming, humans have to check the product of the LLM's output rather than the output itself.


I'm working on this now.

It's a Profile Guided Optimization language - with memory safety like Rust.

It's extremely easy to optimize assuming you either 1) profile it in production (obviously has costs) or 2) can generate realistic workloads to test against.

It's like Rust, in that it makes expressing common illegal states just outright impossible. Though it goes much further than Rust.

And it's easier to read than Swift or Go.

There's a lot of magic that happens with defaults that languages like Zig or Rust don't want, because they want every cost signal to be as visible as possible, so you can understand the cost of a line and a function.

LLMs with tests can - I hope - do this without that noise.

We shall see.


Do you have a repo?


Yes.

I'm almost ready to launch v0.1 - but the documentation is especially a mess right now, so I don't want to share yet.

I'll update this comment in a week or so [=


Appreciate it!


Of course I can't be certain, but I think the "mixture of experts" design plays into it too. Metaphorically, there's a mid-level manager who looks at your prompt and tries to decide which experts it should be sent to. If he thinks you won't notice, he saves money by sending it to the undergraduate intern.

Just a theory.


Notice that MOE isn’t different experts for different types of problems. It’s per token and not really connect to problem type.

So if you send a python code then the first one in function can be one expert, second another expert and so on.


Can you back this up with documentation? I don't believe that this is the case.


The router that routes the tokens between the "experts" is part of the training itself as well. The name MoE is really not a good acronym as it makes people believe it's on a more coarse level and that each of the experts somehow is trained by different corpus etc. But what do I know, there are new archs every week and someone might have done a MoE differently.


It's not only per token, but also each layer has its own router and can choose different experts. https://huggingface.co/blog/moe#what-is-a-mixture-of-experts...


Check out Unsloths REAP models, you can outright delete a few of the lesser used experts without the model going braindead since they all can handle each token but some are better posed to do so.


Language changes over time, and I remember recent memes where a cute girl says something like "claiming you're moderate means you know conservatives don't get laid" (presumably because of abortion politics). It makes me wonder if the moderates actually became liberal or if they just don't want to use that word any more.

After all the polarism in "reality show politics", my diehard liberal friends seem less liberal to me, but they'll state which team they're on more fervently than ever.


Very simple code is UB:

    int handle_untrusted_numbers(int a, int b) {
        if (a < 0) return ERROR_EXPECTED_NON_NEGATIVE;
        if (b < 0) return ERROR_EXPECTED_NON_NEGATIVE;
        int sum = a + b;
        if (sum < 0) {
            return ERROR_INTEGER_OVERFLOW;
        }
        return do_something_important_with(sum);
    }
Every computer you will ever use has two's complement for signed integers, and the standard recently recognized and codified this fact. However, the UB fanatics (heretics) insisted that not allowing signed overflow is an important opportunity for optimizations, so that last if-statement can be deleted by the compiler and your code quietly doesn't check for overflow any more.

There are plenty more examples, but I think this is one of the simplest.


For or against, I don't know why the "just predicting" or "stochastic parrots" criticism was ever insightful. People make one word after another and frequently repeat phrases they heard elsewhere. It's kind of like criticizing a calculator for making one digit after another.


It isn’t a criticism; it’s a description of what the technology is.

In contrast, human thinking doesn’t involve picking a word at a time based on the words that came before. The mechanics of language can work that way at times - we select common phrasings because we know they work grammatically and are understood by others, and it’s easy. But we do our thinking in a pre-language space and then search for the words that express our thoughts.

I think kids in school ought to be made to use small, primitive LLMs so they can form an accurate mental model of what the tech does. Big frontier models do exactly the same thing, only more convincingly.


> In contrast, human thinking doesn’t involve picking a word at a time based on the words that came before

Do we have science that demonstrates humans don't autoregressively emit words? (Genuinely curious / uninformed).

From the outset, its not obvious that auto-regression through the state space of action (i.e. what LLMs do when yeeting tokens) is the difference they have with humans. Though I can guess we can distinguish LLMs from other models like diffusion/HRM/TRM that explicitly refine their output rather than commit to a choice then run `continue;`.


Have you ever had a concept you wanted to express, known that there was a word for it, but struggled to remember what the word was? For human thought and speech to work that way it must be fundamentally different to what an LLM does. The concept, the "thought", is separated from the word.


Analogies are all messy here, but I would compare the values of the residual stream to what you are describing as thought.

We force this residual stream to project to the logprobs of all tokens, just as a human in the act of speaking a sentence is forced to produce words. But could this residual stream represent thoughts which don't map to words?

Its plausible, we already have evidence that things like glitch-token representations trend towards the centroid of the high-dimensional latent space, and logprobs for tokens that represent wildly-branching trajectories in output space (i.e. "but" vs "exactly" for specific questions) represent a kind of cautious uncertainty.


Fine, that would at least teach them that LLMs are doing a lot more than "predicting the next word" given that they can also be taught that a Markov model can do that and be about 10 lines of simple Python and use no neural nets or any other AI/ML technology.


> In contrast, human thinking doesn’t involve picking a word at a time based on the words that came before.

More to the point, human thinking isn't just outputting text by following an algorithm. Humans understand what each of those words actually mean, what they represent, and what it means when those words are put together in a given order. An LLM can regurgitate the wikipedia article on a plum. A human actually knows what a plum is and what it tastes like. That's why humans know that glue isn't a pizza topping and AI doesn't.


> That's why humans know that glue isn't a pizza topping and AI doesn't.

It's the opposite. That came from a Google AI summary which was forced to quote a reddit post, which was written by a human.


No, but in this case it indicates some hypocrisy.


> It was annoying but if it hadn't happened Python would still be struggling with basic things like Unicode.

They should've just used Python 2's strings as UTF-8. No need to break every existing program, just deprecate and discourage the old Python Unicode type. The new Unicode type (Python 3's string) is a complicated mess, and anyone who thinks it is simple and clean isn't aware of what's going on under the hood.

Having your strings be a simple array of bytes, which might be UTF-8 or WTF-8, seems to be working out pretty well for Go.


I can't say i've ever thought "wow I wish I had to use go's unicode approach". The bytes/str split is the cleanest approach of any runtime I've seen.


What you propose would have, among other things, broken the well established expectation of random access for strings, including for slicing, while leaving behind unclear semantics about what encoding was used. (If you read in data in a different encoding and aren't forced to do something about it before passing it to a system that expects UTF-8, that's a recipe for disaster.) It would also leave unclear semantics for cases where the underlying bytes aren't valid UTF-8 data (do you just fail on every operation? Fail on the ones that happen to encounter the invalid bytes?), which in turn is also problematic for command-line arguments.


I'm not a web dev. Are people really that opposed to writing a function which would solve problems in their project? It's two lines long...


> Assuming assert existed

It's a function you could write for yourself and give it whatever semantics you feel is best. No changes to the language required for this one.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: