Pelican for Fable 5 on default settings is a clear improvement on Opus 4.8 Fable...

sempron64 · 2026-06-09T17:44:12 1781027052

The pelican has looked very same-y across all frontier models, same color bike, same camera angle, etc. I suspect this challenge is already too embedded in the training data to be a good signal when it succeeds, and maybe even when it fails in pathological ways mirroring existing AI pelicans on the internet.

h4ny · 2026-06-09T19:21:09 1781032869

Was it ever a good test? How do you even objectively assess what a good pelican on a bike is anyway?

fwipsy · 2026-06-09T19:25:35 1781033135

SVG generation is a good test because it's extremely easy to subjectively assess with visual reasoning where humans are strong. However, pelican on a bike specifically may be overused at this point.

Fuzzwah · 2026-06-10T03:49:46 1781063386

The "big beak!" comment in the svg source makes me think it's definitely a gamed "benchmark" at this point.

kayge · 2026-06-09T20:28:11 1781036891

Do you think the models are ready for the next level? I believe that would be: Pelican feeding Spaghetti to Will Smith.

quantumwoke · 2026-06-09T18:40:49 1781030449

Variations of this comment have been posted for over a year. The pelican has now morphed into part of HN culture rather than a legitimate benchmark, but it's still valuable as a meme.

brazukadev · 2026-06-09T19:41:40 1781034100

it is more an example of gaming (the HN system) than meme.

stratos123 · 2026-06-09T22:18:32 1781043512

I'd be very surprised if this is in the training data given that most models mess it up to this day. E.g. look at the ones from Opus.

tripleee · 2026-06-09T18:03:03 1781028183

I'd say it's working great for its intended purpose. Keeps Simon on top of all these threads and funnels traffic to his site.

yreg · 2026-06-09T18:54:34 1781031274

I really don't understand what's interesting about this test and why is it always on top.

simonw · 2026-06-09T18:55:05 1781031305

It's funny.

girvo · 2026-06-09T23:32:56 1781047976

It really is lol

mrandish · 2026-06-10T06:59:40 1781074780

As often happens with random oddball things which become traditions in web communities, the replies asking what it is or complaining about it, begin to gain their own humor value.

depr · 2026-06-09T19:20:51 1781032851

Same reason you would always see the same top comments on reddit during a certain era.

yreg · 2026-06-10T03:10:25 1781061025

That’s what I think too, but we should actively go against such culture here because hn is not reddit.

gunsle · 2026-06-10T05:30:58 1781069458

It basically is at this point, if you haven’t noticed. Complete with the same America bad, Elon bad, democrats good midwit progressive politics.

clydethefrog · 2026-06-10T16:02:57 1781107377

Almost all Musk related negative news gets [flagged] and never hits the the front page, so there is still a silent base on the other "team" apparently.

anhner · 2026-06-10T07:01:28 1781074888

Don't forget EU bad! Because they won't let Apple screw over consumers.

replwoacause · 2026-06-10T05:49:17 1781070557

Elon does suck. Objectively.

ankit_mishra · 2026-06-10T07:17:49 1781075869

Is this Straw Man and Ad Hominem ?

inglor_cz · 2026-06-10T11:06:49 1781089609

It has become a funny meme, much like "My hovercraft is full of eels!"

luqtas · 2026-06-10T00:41:31 1781052091

because you can't still ask LLMs to port DOOM to hardware X or Y

WithinReason · 2026-06-09T19:51:38 1781034698

It's a meme, and HN loves upvoting memes. Just like Reddit!

port11 · 2026-06-09T19:49:07 1781034547

The ultimate measure of an LLM is whether it can produce a capable image of a pelican riding a bicycle. All other use cases are but a distraction!

scrollaway · 2026-06-09T18:47:11 1781030831

Do you seriously have a dedicated “bad takes on AI” hn account?

tripleee · 2026-06-09T19:18:28 1781032708

yeah, although I do combine it with "replies to snarky questions" for efficiency

jurgenaut23 · 2026-06-09T18:43:09 1781030589

True that

sarreph · 2026-06-09T17:20:27 1781025627

I'm beginning to wonder how much of a useful metric the pelican is because surely the frontier labs must be training their models on pelican-artistry because of how well known your test is now?

bensyverson · 2026-06-09T17:56:21 1781027781

Simon has addressed this on virtually every new model release. He also has unpublished alternate prompts. But the larger point is: this is a fun experiment, not a serious and objective benchmark.

refulgentis · 2026-06-09T18:19:00 1781029140

It's silly and a joke and a surprisingly good benchmark and don't take it seriously but don't take not taking it seriously seriously and if it's too good we use another prompt but don't actually because then it's not the pelican post and there's obvious ways to better it and it's not worth doing because it's not serious.

Only coherent move at this point: hit the minus button immediately. There's never anything about the model in the thread other than simon's post.

stasomatic · 2026-06-09T19:03:05 1781031785

But what if they are better at flamingos? Are they optimized for pelicans? How about “draw me a four headed owl”? The meme, I get it, but I’d settle for a working bash script, tbh.

wongarsu · 2026-06-09T17:41:26 1781026886

I just run my own benchmark for "draw an SVG with $animal driving $vehicle". I won't post my choice of animal and mode of transport, but there are plenty of uncommon combinations to choose from. So far it's a fun and visually intuitive benchmark that does seem to correlate with model capabilities

modriano · 2026-06-09T17:34:26 1781026466

I don't know. Just looking at the bike frames (specifically the fact that the AI generated bikes have rather unsteerable front forks), it's clear to me that frontier labs aren't spending much time tuning models to make bikes look coherent, which I assume is an easier task than making a pelican riding a bike look coherent.

HaZeust · 2026-06-09T17:23:02 1781025782

I've seen this reply to Simon's benchmark for 2 years running now, and yet you still see improvements and objectively-bad results over time from new releases, even when I'm sure every frontier AI team has/had a person at least partially dedicated to better bicycle-pelican SVG outputs. Alas.

sarreph · 2026-06-09T17:26:08 1781025968

I had intended to caveat that: I'm sure I'm not the first person to ask about this!

> you still see improvements

This is expected if they are training their models on it, right?

> objectively-bad results

Keen to learn when this has been the case, i.e. across version increments in major models.

simonw · 2026-06-09T17:29:15 1781026155

I've written about this a couple of times, most notably here: https://simonwillison.net/2025/Nov/13/training-for-pelicans-...

I've been enjoying seeing how the quality of individual models differ based on the amount of reasoning effort you give them. If they were baking an a good pelican you wouldn't expect them to differ so much.

(Google Gemini are the only lab that have very clearly paid attention to the quality of SVG animals-riding-vehicles, see their announcement for Gemini 3.1: https://twitter.com/JeffDean/status/2024525132266688757 )

sarreph · 2026-06-09T17:31:18 1781026278

Amazing, thank you Simon! Look forward to reading.

mrandish · 2026-06-10T06:32:36 1781073156

Hence it has become a meta-benchmark of relative progress in SVG image generation of a known target which has leaked into the training data and for which "every frontier AI team has/had a person at least partially dedicated to" at least checking if not optimizing.

llm_nerd · 2026-06-09T17:32:26 1781026346

I honestly assumed their comment was tongue in cheek humour, because positively no one actually cares how these models generate an SVG pelican riding a bicycle. It's some meme thing that this stuff always appears here.

BrokenCogs · 2026-06-09T17:38:54 1781026734

Yeah this is not a real benchmark, it's just a fun tradition everytime a new model is released

pelipost123 · 2026-06-09T17:47:39 1781027259

"fun" / boringly predictable meme thread with 30+ replies already

brazukadev · 2026-06-09T19:43:26 1781034206

It is telling that people need to create throwaway accounts to criticize simonw's behavior in this website.

mrandish · 2026-06-10T06:53:06 1781074386

It's evolved from a funny, unserious benchmark to a tradition. When a major new model is released, I now always check the HN thread for Simon's Pelican post. I'll be sad when I don't find it.

When it started, comparing the progress between models was mildly interesting but everyone (including Simon) acknowledges it certainly leaked into the training data long ago.

notnullorvoid · 2026-06-09T19:49:57 1781034597

The way I see it the benefit of benchmark isn't to take Simon's results at face value. It's a template for your own benchmarks that are easy to visually evaluate.

iLoveOncall · 2026-06-09T19:10:34 1781032234

It was a completely useless test even before the labs trained for it.

mrandish · 2026-06-10T07:08:36 1781075316

Yes, it's always been published as a joke. You've explained why it was (and still is) funny meta-commentary on AI benchmarks.

ealready_value · 2026-06-09T17:15:24 1781025324

This is the reply I look for in all the new model announcements. Its fun to tell people that I judge models based on pelicans.

chorkpop · 2026-06-09T17:17:43 1781025463

Now someone post the link about how it’s impossible for humans to draw a bike from memory.

Atheros · 2026-06-10T06:27:25 1781072845

https://link.springer.com/article/10.3758/BF03195929

pixel_popping · 2026-06-09T17:34:35 1781026475

This is all we need, that moment the Pelican put the leg behind the frame, we are all doomed.

upcoming-sesame · 2026-06-09T19:18:55 1781032735

I also look for this reply because i like seeing the follow-up reply saying that this is not a benchmark anymore because labs have gotten it in their training data.

that reply never failed to come it's basically a meme at this point

redox99 · 2026-06-09T17:28:12 1781026092

It's interesting that they still get the head tube / handle bar part wrong.

aarjaneiro · 2026-06-09T17:44:48 1781027088

Or the hands not being wings

raffael_de · 2026-06-09T19:59:22 1781035162

I find it quite interesting that while the picture looks better the more advanced the model is, but apparently none so far "understands" that the pelicans legs are on both sides of the bike / top bar.

LordDragonfang · 2026-06-09T20:14:54 1781036094

If you scroll to the bottom of the Fable-5 by effort page, Max effort actually gets this correct! (Along with being the only one I've seen so far to make a bicycle frame that matches the shape of what most bikes on Google images look like)

wasabi991011 · 2026-06-09T20:30:19 1781037019

And the only one linked here that includes a bicycle chain!

ethanlipson · 2026-06-09T17:19:13 1781025553

How much money do you think they spent fine-tuning on pelican SVG generation?

tarruda · 2026-06-09T17:26:21 1781025981

Not as much as Qwen, since apparently 3.6 35B surpassed Opus 4.7 https://x.com/simonw/status/2044830134885306701

csomar · 2026-06-09T17:23:24 1781025804

Probably none. They probably have much better targets to optimize for than an SVG pelican or even SVGs in general.

Reebz · 2026-06-10T03:52:55 1781063575

The Max version gets more details right. The bike frame looks good, the chain, the wings are appropriately styled instead of “arms”, and the knee is bent, etc. Obviously we’re hitting marginal returns now, but I see differences.

csomar · 2026-06-09T17:22:05 1781025725

Where is the clear improvement on Fable 5? The tail is misplaced.

leecommamichael · 2026-06-09T17:20:18 1781025618

Looks like Fable constructed the "max" "looking" pelican of the previous model for the "xhigh" output token count of the previous model.

smusamashah · 2026-06-09T20:59:28 1781038768

Can you please compare the code generated by other similar quality pelicans by other models. Code in your first link (Fable 5 Default) looks minimal yet very good.

mer_mer · 2026-06-10T01:19:01 1781054341

It's interesting that Gemini 3(.1?) Deep Think is still the best at this task and it's still not really generally available. Maybe Fable could match it at higher effort levels? https://simonwillison.net/2026/Feb/12/gemini-3-deep-think/

rkuska · 2026-06-09T17:39:54 1781026794

Is it possible to use the credits from subscription (https://support.claude.com/en/articles/15036540-use-the-clau...) for fable?

XCSme · 2026-06-10T14:11:20 1781100680

It also does A LOT better, for my hamster test: https://aibenchy.com/showcase/?q=claude#showcase=6efb87c28e3...

382hi · 2026-06-09T17:41:28 1781026888

I'm pretty sure they're optimizing the models around these sorts of tests.

makingstuffs · 2026-06-09T17:52:00 1781027520

I could be tripping but I’m sure that is very similar to the Deepseek one from not long ago. Clearly I am too lazy to go and find it for verification.

bergheim · 2026-06-09T19:40:59 1781034059

Anyone care about these pelicans that always come up anymore?

Clearly at this point they are part of the training data.

They even all look sort of ish the same. Daytime, colors,...

1attice · 2026-06-09T19:49:57 1781034597

Without being mean, I encourage you to go look at some of simonw's writing on this topic, which he has addressed repeatedly (and IMO satisfactorily.)

I know because I too had this initial take; however, upon analysis, it is not sound.

bergheim · 2026-06-09T19:55:17 1781034917

I know he is an AI influencer that promotes his blog any chance he gets.

I agree as well that he writes many interesting things.

benatkin · 2026-06-09T22:57:14 1781045834

The way they talked it up, having both legs on one side of the bike is like walking to the car wash

jerryliu12 · 2026-06-09T18:50:21 1781031021

Personally feel like it could be more ambitious with what it creates.

ceroxylon · 2026-06-09T23:03:20 1781046200

Yay, max level actually put one of the legs behind the frame!

mercacona · 2026-06-09T17:18:34 1781025514

Why always sunny days?

umeshunni · 2026-06-09T17:25:33 1781025933

Pelicans hate biking in the rain (as do I).

gavinray · 2026-06-09T18:57:05 1781031425

Fable 5 xhigh actually looks the best to me.

purple-leafy · 2026-06-09T19:48:20 1781034500

Do we need a pelican every single time a model is released? Beating a very dead horse.

Fun at first, seems disingenuous now. A site funnel

david_shi · 2026-06-09T17:40:10 1781026810

that's a great looking pelican

ge96 · 2026-06-09T17:32:53 1781026373

need more Alex Moulton style bikes

lacoolj · 2026-06-09T22:19:00 1781043540

dude, the max version looks like it's finally there. handle bar holding with wings, the left leg is behind the frame while the right is in front of it (correctly).

well done anthropic.

arthurcolle · 2026-06-10T02:19:38 1781057978

mediocre pelican. very disappointing

kylehotchkiss · 2026-06-09T17:32:42 1781026362

How many barrels of oil are burned per pelican at Fable levels?