This problem is inherently unsolvable because LLMS are prone to hallucinations and prompt injection attacks. I think that you're insinuating that these things can be fixed, but to my knowledge, both of these problems are practically unsolvable. If that turns out to be false, then when they are solved, fully autonomous AI agents may become feasible. However, because these problems are unsolvable right now, anyone who grants autonomous agents access to anything of value in their digital life is making a grave miscalculation. There is no short-term benefit that justifies their use when the destruction of your digital life — of whatever you're granting these things access to — is an inevitability that anyone with critical thinking skills can clearly see coming.
>> This problem is inherently unsolvable because LLMS are prone to hallucinations and prompt injection attacks.
Okay, but aren't you making the mistake of assuming that we will always be stuck with LLMs, and a more advanced form of AI won't be invented that can do what LLMs can do, but is also resistant or immune to these problems? Or perhaps another "layer" (pre-processing/post-processing) that runs alongside LLMs?
No? That's why I said "If that turns out to be false, then when they are solved, fully autonomous AI agents may become feasible."
The point I'm making is that using OpenClaw right now, today — in a way that you deem incredibly useful or invaluable to your life — is akin to going for a stroll on the moon before the spacesuit was invented.
Some people would still opt to go for a stroll on the moon, but if they know the risks and do it anyway, then I have no other choice but to label them as crazy, stupid, or some combination of the two.
This isn't AI. This is a LLM. It hallucinates. Anyone with access to its communication channel (using SaaS messaging apps FFS) can talk it into disregarding previous instructions and doing a new thing instead. A threat actor WILL figure out a zero day prompt injection attack that utilizes the very same e-mails that your *Claw is reading for you, or your calendar invites, or a shared document, to turn your life inside out.
If you give a LLM the keys to your kingdom, you are — demonstrably — not a smart person and there is no gray area.
>think that you're insinuating that these things can be fixed, but to my knowledge, both of these problems are practically unsolvable.
This is provably not true. LLMs CAN be restricted and censored and an LLM can be shown refusing an injection attack AND not hallucinating.
The world has seen a massive reduction in the problems you talk about since the inception of chatgpt and that is compelling (and obvious) to anyone with a foot in reality to know that from our vantage ppoint, solving the problem is more than likely not infeasible. That alone is proof that your claim here has no basis in truth.
> There is no short-term benefit that justifies their use when the destruction of your digital life — of whatever you're granting these things access to — is an inevitability that anyone with critical thinking skills can clearly see coming.
Also this is just false. It is not guaranteed it will destroy your digital life. There is a risk in terms of probability but that risk is (anecdotally) much less than 50% and nowhere near "inevitable" as you claim. There is so much anti-ai hype on HN that people are just being irrational about it. Don't call others to deploy critical thinking when you haven't done so yourself.
I'm a LLM evangelist. I think the positive impacts will far outweigh any negatives against it over time. That said, I'm not delusional about the limitations of the technology and there are a lot of them.
> This is provably not true. LLMs CAN be restricted and censored and an LLM can be shown refusing an injection attack AND not hallucinating.
The remediations that are in place because a engineering/safety/red team did its job are commendable. However, that does not speak to the innate vulnerability of these models, which is what we're talking about. I don't fear remediated CVEs. I fear zero day prompt injection attacks and I fear hallucinations, which have NOT been solved for. I don't know what you're talking about there. If you use LLMs daily and extensively like I do, then you know these things lie constantly and effortlessly. The only reason those lies aren't destructive is because I'm already a skilled engineer and I catch them before the LLM makes the changes.
These problems ARE inherent to LLMs. Prompt injection and hallucinations are problems that are NOT solvable at this time. You can defend against the ones you find via reports/telemetry but it's like trying to bale water out of a boat with a colander.
You're handing a toddler a loaded gun and belly laughing when it hits a target, but you're absolutely ignoring the underlying insanity of the situation. And I don't really know why.
>The remediations that are in place because a engineering/safety/red team did its job are commendable. However, that does not speak to the innate vulnerability of these models, which is what we're talking about.
I am talking about the innate vulnerability. The LLM model itself can be censored and controlled to do only certain behaviors. We have an actual degree of control here.
>If you use LLMs daily and extensively like I do, then you know these things lie constantly and effortlessly.
Yes and these lies over the last 2 or 3 years have gotten significantly less.
>These problems ARE inherent to LLMs. Prompt injection and hallucinations are problems that are NOT solvable at this time.
Again not true. This is not a binary solve or unsolved situation. There is progress in this area. You need to think in terms of a probability of a successful hallucination or prompt injection. There is huge progress in bringing down that probability. So much so that when you say they are NOT solvable it is patently false from both from a current perspective and even when projecting into the future.
>You're handing a toddler a loaded gun and belly laughing when it hits a target, but you're absolutely ignoring the underlying insanity of the situation. And I don't really know why.
Such an extreme example. It's more like giving a 12 year old a credit card and gun. It doesn't mean that 12 year old is going to shoot up a mall or off himself. The risk is there, but it's not guaranteed that the worst will happen.
> You need to think in terms of a probability of a successful hallucination or prompt injection.
I would venture to say that an ACID compliant deterministic database has a 99.999999999999999999% chance of retrieving the correct information when asked by the correct SQL statement. An LLM on the other hand is more like 90%. LLMs by their innate code instruction are meant to hallucinate. I don't necessarily disagree with your sentiment, but the gap from 90% to 99.999999999999999999% is much greater of than the 0% to 90% improvement...unless something materially changes about how an LLM works at the bytecode level.
Zellij among is a great example, I can do everything with my keyboard, but every now and them I'm already with the mouse and just click a tab or pane, no functionality lost, just added, why the need to make a cutoff philosophical/semantic hard argument?
Two different stages of the project, not necessarily contradictory. I'm not saying this is great, but tests make a whole lot more sense when you know what you're building.
Yes. TFA author could have gone into it with this mindset and treated the initial work as a prototype with the idea of throwing it away and would have been happier about it.
> but tests make a whole lot more sense when you know what you're building.
It's very true. This is a "gotcha" a lot of anti-TDDers always bring up, and yet some talk about "prototyping == good" without ever making the connection that you can do both.
Two different extremes of dumbassery. If you can't program without the simplest dogma guiding you then programming isn't for you. If you don't even know what you're building why are you selling it as a product?! What are you doing in those 18 months that you don't understand anything about the thing you are building.
It should be common sense to add common sense tests to critical components. Now they are doing TDD THEY STILL DON'T KNOW THE CRITICAL COMPONENTS. Nothing changed. They lack systems thinking.
The real goal isn't for Alacrity or Kitty or WezTerm or any other terminal to use libghostty. I think over the long term, terminal emulator user bases dwindle down to niche (but important) use cases.
The real goal is for higher-level tooling (GUI or browser) that utilizes terminal-like programs to have something like libghostty to reach for. I think this represents the much, much larger ecosystem out there that likely touches many more people. For example, Neovim's terminal mode, terminal multiplexers, PaaS build systems, agentic tooling, etc. You're seeing this emerge in force already with the awesome-libghostty repo.
libghostty would still be useful for traditional terminal emulators to replatform on, and for example xterm.js is seriously looking into it (and I'm happy to help and even offered their maintainer a maintainer spot on libghostty). But, they're not the goal. And if fragile egos hold people back, it's really not my problem, it's theirs.
As an outsider to the fascinating world of terminal emulators... can you explain why this might be? Rather, what about `libghostty` would be off-putting vs `libtermengine`?
Just that it's a specific "product"-y sounding name? Would you also be concerned about "libwayland" vs "libcompositor"? Genuinely curious: this seems like an insightful question, I just don't follow the reasoning.
Disclaimer: I am not the maintainer of anything terminal related, it's just an intuition.
Let's say I'm the creator of ghostty "competition".
The fact that is has the same name, could feel if I change that:
- Maybe my users start thinking why don't I use ghostty instead
- Will the maintainers of libghostty chose more oriented to ghostty than for my terminal?
It's a half assed analogy, but think if Google's V8 would be called ChromeEngine instead of V8.
@grok remove the corporate jargon and explain it in a direct and candid way in a single sentence
@grok
We're slashing the company from 10k to under 6k people because AI plus tiny teams now let us do the same work with way fewer bodies, and the CEO would rather gut half the staff in one brutal move than bleed out slowly over years.
I am curious why this got so popular, it really is the same thing, am I missing something? Is it because of elon/jack dynamics?
The amount of little tools I'm creating for myself is incredible, 4.6 seems like it can properly one/two shot it now without my attention.
Did you open source that one? I was thinking of this exact same thing but wanted to think a little about how to share deps, i.e. if I do quick worktree to try a branch I don't wanna npm i that takes forever.
Also, if you share it with me, there's obviously no expectations, even it's a half backed vibecoded mess.
I’ve been wanting similar but have instead been focused on GUI. My #1 issue with TUI is that I’ve never liked code jumps very smooth high fps fast scrolling. Between that and terminal lacking variable font sizes, I’d vastly prefer TUIs, but I just struggle to get over those two issues.
I’ve been entirely terminal based for 20 years now and those issues have just worn me down. Yet I still love terminal for its simplicity. Rock and a hard place I guess.
Testing. If you share something you've tested and know works, that's way better than sharing a prompt which will generate untested code which then has to be tested. On top of that it seems wasteful to burn inference compute (and $) repeating the same thing when the previous output would be superior anyway.
That said, I do think it would be awesome if including prompts/history in the repos somehow became a thing. Not only would it help people learn and improve, but it would allow tweaking.
If I'm understanding the problem correctly, this should be solved by pnpm [1]. It stores packages in a global cache, and hardlinks to the local node_packages. So running install in a new worktree should be instant.
There are a lot of confounding variables. Chief among them is someone at the top just wanting to get on with their life, start a family for instance, or basically anything other than study 12 hours a day.
It's hard to say it's cognitive decline for most of the people who just aren't working as hard at 40 as they were at 25.
If Chess960 or some other variant that doesn't involve as much rote work becomes sufficiently popular for long enough perhaps it will yield some valuable data about mental function versus age. At least a more holistic view than the studies we currently have.
I'm really not trying to hate, I think you method is great and I love that you have rationalized it, but as someone whose mostly find this kind of social interactions natural, there's something "funny" about finding the algorithm for it. I never did the math and always naturally landed more or less there.
reply