one of the contributors here. we have a statistical test script we can run on our branches to test the compile time, cost and reliability. we want to try tree of thought and also something like https://www.reddit.com/r/ChatGPT/comments/14d7pfz/become_god.... that said, we found that when we asked GPT to first explain why a test case is failing and then to correct that failure instead of just asking it to correct the failure, unexpectedly costs went up and reliability went down