More

jangletown · 2026-03-02T21:52:13 1772488333

impressive

jangletown · 2026-01-06T07:35:07 1767684907

hey there, shameless plug here, I built pinacle.dev exactly with this use case in mind, cheap $7 VMs that comes with vs code, vibe kanban everything else needed out of the box to keep vibe coding on the go, no need to leave your computer running

jangletown · 2025-07-16T09:28:08 1752658088

and how do you detect hallucinations?

jangletown · 2025-07-09T18:04:44 1752084284

hello aszen, I work with draismaa, the way we have developed our simulations is by putting a few agents in a loop to simulate the conversation:

- the agent under test - a user simulator agent, sending messages as a user would - a judge agent, overlooking and stopping the simulation with a verdict when achieved

it then takes a description of the simulation scenario, and a list of criteria for the judge to eval, and that's enough to run the simulation

this is allowing us to tdd our way into building those agents, like, before adding something to the prompt, we can add a scenario/criteria first, see it fail, then fix the prompt, and see it playing out nicely (or having to vibe a bit further) until the test is green

we put this together in a framework called Scenario:

https://github.com/langwatch/scenario

the cool thing is that we also built in a way to control the simulation, so you can go as flexible as possible (just let it play out on autopilot), or define what the user said, mock what agent replied and so on to carry on a situation

and then in the middle of this turns we can throw in any additional evaluation, for example checking if a tool was called, it's really just a simple pytest/vitest assertion, it's a function callback so any other eval can also be called

jangletown · 2025-06-26T14:05:12 1750946712

That's true, we have been trying to help customers doing evals for ages now, and it's super hard for everyone to build a really good dataset and define great quality metrics

just wanted then to shameless plug this lib I've built recently for this very topic, because it's been much easier to sell that into our clients than evals really, because it's closer to e2e tests: https://github.com/langwatch/scenario

instead of 100 examples, it's easier for people to think on just the anecdotal example where the problem happens and let AI expand it, or replicate a situation from prod and describe the criteria in simple terms or code

jangletown · 2025-06-26T13:56:23 1750946183

I love the term! But I do think it's both really, after all this time, LLMs are still very finicky, even the order of the instructions still matter a lot, even with the right context, so you are still prompt engineering, ideally this will go away and only context engineering will remain

jangletown · 2025-06-26T13:53:20 1750946000

"51% fewer false positives", how were you measuring? is this an internal or benchmarking dataset?

jangletown · 2025-06-26T11:00:53 1750935653

oh shoot, wrong link: https://github.com/langwatch/scenario

I think AI slop on the editor changed for me when I was typing it and I didn't notice

fixed now, thanks!

jangletown · on July 8, 2023

I agree, I really don’t like LangChain abstractions, the chains they say are “composable” are not really, you spend more time trying to figure out langchain than actually building things with it, and it seems it’s not just me after talking to many people

Their code seems all rushed, and seems it worked out for initial popularity, but with their current abstractions I personally don’t think it’s a good long term framework to learn and adopt

That’s why I built my own alternative to it, I call it LiteChain, where the chains are actual composable monads, the code is async streamed by default not ducktaped, it’s very bare bones yet but I’m really putting effort on building a solid foundation first, and having a final simple abstractions for users that don’t get in the way, check it out:

https://github.com/rogeriochaves/litechain

ibains · on July 8, 2023

Why is this just not ETL, why do you need anything here? There is no new category or product needed here.

jangletown · on July 8, 2023

Just saw the video you shared on the other comment using prophecy, very cool

Generally I don’t care much about the embedding and retrieval and connectors etc for playing with the LLMs, I imagined much more robust tools were available already indeed, my focus was more on the prompt development actually, connecting many prompts together for a better chain of thought kinda of thing, working out the memory and stateful parts of it and so on, and I think there might be a case for an “LLM framework” for that, and also a case for a small lib to solve it instead of an ETL cannon

However, I am indeed not experienced with ETLs, have to play more with the available tools to see if and how can I do the things I was building using them

rchaves · on July 9, 2023

hmmm, just had a chat with GPT-4, it didn't seem convinced that ETLs would do well the same things that LiteChain is trying to achieve: https://chat.openai.com/share/88961bd1-8250-45f0-b814-0680ba...

I'd be happy to see some more examples of LLM application building on ETLs like the video you shared though

krawczstef · on July 9, 2023

Totally! As a person driving a project like https://github.com/DAGWorks-Inc/hamilton I couldn't agree more!

jangletown · on June 2, 2023

unfortunately picovoice does not support plain “GPT” as it’s a “unrecognized word” in their model

On the plus side, it never triggers by accident, you have to be very intentional

dynamix · on June 2, 2023

try `chat G P T`