They haven't made the chart very clear, but it seems it has configurable passes and at 2 passes it's better than Haiku and Sonnet and at 16 passes starts closing in on Opus although it's not quite there, while consistently being less expensive than Sonnet.
pass@k means that you run the model k times and give it a pass if any of the answers is correct. I guess Lean is one of the few use cases where pass@k actually makes sense, since you can automatically validate correctness.
Oh my bad. I'm not sure how that works in practice. Do you just keep running it until the tests pass? I guess with formal verification you can run it as many times as you need, right?
The text file part has the instructions for the LLM, but it can also have scripts along with it that the LLM can invoke. At least that's how I understand it.
It's the UX, deliberately omitting information or not. There at least used to be some toggles for example without any indication that they mean anything other than a minor load balancer configuration change, but caused I think $200 month bill addition. No indication at all that they have a meaningful monetary impact.
I'm being rather snarky here, but the main point of front-end JS UI frameworks is to exist and to survive in their environment. For this purpose they have evolved to form a parasymbiotic relationship with others in their environment, for example with influencers. The frameworks with the best influencers win out over older ones that do not have the novelty value anymore and fail to attract the best influencers.
Next is the Microsoft Sharepoint of the JavaScript world. It’s a terrible solution to just about anything, and yet gets crammed into places and forced on people due to marketing-led decision making.
My 10 minute Next build was replaced with a 1 minute 30 second Vite build.
And such an extrodinary different is usually holding the tool wrong, but Next has years old open issues for many of the causes here (like forced output tracing) and has just ignored them. Possibly because the Next team's preferred deployment environment isn't affected?
The "just retry" approach is truly bothersome. I think it is at least partly an organizational issue, because it happens far more often when QA is a separate team.
I think not so long from now the exotic meal experience for the young ones will be real grilled chicken that looks like a chicken. Like zebra or crocodile meat was for us northerners.
From my own little box I think that that if lab grown meat was available and affordable, I would never eat a bit of real chicken, pork or beef again. I know veganism is an option too, but... I grew up with meat and it's very difficult to give up.
Have you tried tempeh? It solves 95% of my chicken craving since I found the right recipe and spices. It's also cheaper, nutritious, faster to cook and almost no processed.
I have the same problem. The "What It Is" section starts with "Mycelium is a Clojure workflow framework built on Maestro" and that's a bit generic. Maybe something to test some AI generated code and then test if the tests are tested enough using Closure, but I'm not entirely sure.
The main question that is not obvious, is what should I use it for?
This is far more brilliant than I thought. I know my purpose now, "AI" told me. It's to drink wine and eat macaroni!
The only problem is that larp as ai comes back with "no work yet. check back later :(" a lot, but if you run out of credits, that's it. So... Did everyone run out of credits? I feel like there's something up with it.
I'll take one addiction and a possible oral cancer for the company, thank you so much. No, I understand it's not guaranteed, but I am seriously flabbergasted by the careless actions of some companies...
reply