The "visual inputs" samples are extraordinary, and well worth paying extra atten...

orangecat · on March 14, 2023

Wow. I specifically remember "AIs will never be able to explain visual humor" as a confident prediction from the before times of 2020.

_qua · on March 14, 2023

Yes! I remember the "Obama stepping on the scale" example that was used in that article. Would love to know how GPT-4 performs on that test.

LeanderK · on March 14, 2023

you mean this http://karpathy.github.io/2012/10/22/state-of-computer-visio...? Very funny to revisit. How primitive our tools were in comparison to now is astounding. It feels like the first flight of the Wright Brothers vs a jetliner. Imagenet was the new frontier. Simpler times...

kromem · on March 14, 2023

I think the interesting thing here is the very, very surprising result that LLMs would be capable of abstracting the things in the second to last paragraph from the described experiences of amalgamated written human data.

It's the thing most people even in this thread don't seem to realize has emerged in research in the past year.

Give a Markov chain a lot of text about fishing and it will tell you about fish. Give GPT a lot of text about fishing and it turns out that it will probably learn how to fish.

World model representations are occuring in GPT. And people really need to start realizing there's already published research demonstrating that, as it goes a long way to explaining why the multimodal parts work.

lysozyme · on March 15, 2023

Especially funny since the author, Andrej Karpathy, wrote at the end of the 2012 article that

>we are very, very far and this depresses me. What is the way forward? :( Maybe I should just do a startup

and was a founding member of OpenAI just a few years later in 2015

djmips · on March 15, 2023

And he just rejoined them in February.

_qua · on March 14, 2023

Didn't realize this was from 2012, but yes this is definitely what I was thinking of.

djmips · on March 15, 2023

They say there are 3 mirrors in the scene but there are at least 5 - one which can only be seen indirectly through one of the other mirrors!

robocat · on March 14, 2023

If they are using popular images from the internet, then I strongly suspect the answers come from the text next to the known image. The man ironing on the back of the taxi has the same issue. https://google.com/search?q=mobile+phone+charger+resembling+...

I would bet good money that when we can test prompting with our own unique images, GPT4 will not give similar quality answers.

I do wonder how misleading their paper is.

EMM_386 · on March 14, 2023

Did you watch the livestream?

They literally sent it 1) an a screenshot of the Discord session they were in and 2) an audience submitted image

It described the Discord image in incredible detail, including what was in that, what channels they subscribed to, how many users were there. And for the audience image, it correctly described it as an astronaut on an alien planet, with a spaceship on a distant hill.

And that image looked like it was AI created!

These aren't images it's been "trained on".

kromem · on March 14, 2023

99% of the comments here have no iota of a clue what they are talking about.

There's easily a 10:1 ratio of "it doesn't understand it's just fancy autocomplete" to the alternative, in spite of published peer reviewed research from Harvard and MIT researchers months ago demonstrating even a simplistic GPT model builds world representations from which it draws its responses and not simply frequency guessing.

Watch the livestream!?! But why would they do that because they already know it's not very impressive and not worth their time outside commenting on it online.

I imagine this is coming from some sort of monkey brain existential threat rationalization ("I'm a smart monkey and no non-monkey can do what I do"). Or possibly just an overreaction to very early claims of "it's alive!!!" in an age when it was still just a glorified Markov chain. But whatever the reason, it's getting old very fast.

RC_ITR · on March 14, 2023

>published peer reviewed research from Harvard and MIT researchers months ago

Curious, source?

EDIT: Oh, the Othello paper. Be careful extrapolating that too far. Notice they didn't ask it to play the same game on a board of arbitrary size (something easy for a model with world understanding to do).

OkGoDoIt · on March 14, 2023

In the livestream demo they did something similar but with a DALLE-generated image of a squirrel holding a camera and it still was able to explain why it was funny. As the image was generated by DALLE, it clearly doesn't appear anywhere on the internet with text explaining why its funny. So I think this is perhaps not the only possible explanation.

yura · on March 15, 2023

It didn't correctly explain why it was funny though: which is that it's a squirrel "taking a picture of his nuts", nuts here being literal nuts and not the nuts we expect with phrasing like that.

What is funny is neither GPT-4 nor the host noticed that (or maybe the host noticed it but didn't want to bring it up due to it being "inappropriate" humor).

OkGoDoIt · on March 16, 2023

That interpretation never occurred to me either, actually. I suppose that makes more sense. But since it did not occur to me, I can give GPT4 some slack. It came up at the same explanation I would have.

r00fus · on March 14, 2023

Can it identify porn vs e.g. family pics? Could it pass the "I'll know it when I see it" test?

knicholes · on March 14, 2023

Some people are sexually aroused by feet. How would YOU define "porn?"

belter · on March 14, 2023

Does it know what a "man of culture" is?

TremendousJudge · on March 14, 2023

https://xkcd.com/468/

anything not on your list

callalex · on March 14, 2023

That’s exactly their point though. It requires intuition to decide if a picture of feet is sexualized or not. Hence the “I know it when I see it” standard they mentioned.

ttul · on March 14, 2023

I’d bet they pass images through a porn filter prior to even giving GPT-4 a chance to screw that up…

DesiLurker · on March 14, 2023

I suppose It could do it from porn snapshots, kinda like porn-id thing on reddit. I can see more nefarious uses like identifying car licence plates or faces from public cameras for digital stalking. I know it can be done RN with ALPRs but they have to be manually designed with specialty cameras setups. if this makes it ubiquitous then that would be a privacy/security nightmare.

elicash · on March 14, 2023

Can it explain this one? https://www.reddit.com/r/seinfeld/comments/e82uuy/new_yorker...

int_is_compress · on March 14, 2023

Yea it's incredible. Looks like tooling in the LLM space is quickly following suit: https://twitter.com/gpt_index/status/1635668512822956032

davesque · on March 14, 2023

Am I the only one who thought that GPT-4 got this one wrong? It's not simply that it's ridiculous to plug what appears to be an outdated VGA cable into a phone, it's that the cable connector does nothing at all. I'd argue that's what actually funny. GPT-4 didn't mention that part as far as I could see.