I used GPT-4 (entirely) to convert a Vimium-based browser control project from Python to Typescript[0].
Unlike this demo, it uses a simpler interface (Vim bindings over the browser) to make control flow easier without a fine-tuned model (e.g. type “s” instead of click X,Y coords)
I was surprised how well it worked — it even passed the captcha on Amazon!
If the machines are smart enough, shouldn’t they be able to build better interfaces to existing software?
With that aside, it seems like there are two things at play in this demo:
1. Pixel-tuned GPT-4o
2. “Agent” in prod (supervisor loop + operator loop)
Will be interesting to see if they open those up as separate tools in the future, or if they let this fall to the wayside like GPTs, Dalle, etc.