I've been using Codex (GPT-5.4 extra high) to code custom FeatureScript in Onshape (3D mechanical CAD software). It's challenging to get it to do TDD that involves any visual reasoning. At the moment I've got tooling through Google Chrome Devtools MCP and Playwright to extract things and control the browser and I use some custom features which help with formatting and controlling debugging outputs (text and visual overlays). Mostly the text debugging outputs are very helpful to Codex. It will often add debugging payloads when we're focused on a particular issue. I do occasionally take screenshots and paste them into Codex and explain the issue that I'm seeing. It seems to understand a certain amount, especially if the issue can be seen in orthogonal views.