That's funny that's all the things I don't trust it to do. I actually use it the other way around, give it a big non-specific task, see if it works, specify better, retry, throw away 60% - 90% of the generated code, fix bugs in a bunch of places and out comes an implemented feature.
I give the agent the following standing instructions:
"Make the smallest possible change. Do not refactor existing code unless I explicitly ask."
That directive cut down considerably on the amount of extra changes I had to review. When it gets it right, the changes are close to the right size now.
The agent still tries to do too much, typically suggesting three tangents for every interaction.