Given your threat model, I think I’d recommend trying out Codex Web (or the Claude equivalent). The agent will run in the cloud and will only have access to things you’ve explicitly given to it. Network access is cut off to the cloud container once the agent starts running.
See also:
If web isn’t flexible or fun enough, I would try running Codex CLI locally out of a container. This will prevent read access to your file system, and you can use Codex CLI’s sandboxing to deny network access. You can use codex sandbox subcommand to test out for yourself what the sandbox restricts (details of how the sandbox is implemented differ based on your OS). Since you mention distrusting the client software, if it’s any consolation, note that Codex CLI is open source at GitHub - openai/codex: Lightweight coding agent that runs in your terminal
There’s a bunch more stuff discussed in this thread, here are some quick thoughts:
Things are changing very quickly. Personally, I find there’s been a step function change in what OpenAI coding agents can do since the release of GPT-5.2-Codex in mid December: https://openai.com/index/introducing-gpt-5-2-codex/ . That’s not a lot of time!
I agree and am annoyed that there are more rubbish PRs being put up than ever before, but I certainly wouldn’t judge what today’s tools are capable of from random kids using random models to make drive-by PRs to solve issues they don’t understand.
Personally, there are certain kinds of code I have no intention of ever writing myself again, and quick hacks I spin up that I would otherwise never have done.
One thing I’ve gotten a lot of value out of is having automatic Codex code reviews at work. This is by far the best code review experience I’ve ever had in my life. Setting this up on a project you work on is a great way to get a feel for how much you can trust them. (Note there is a lot of variance in AI coding review products, e.g. I don’t think the Github one is much good)
Think of the agents as strange alien interns.
There are things you trust an intern to do, things you don’t quite yet, and you have to figure out what the appropriate level of oversight is for a task (and it isn’t always “read every line”). Be a little patient and remember that if you never give your intern feedback, it probably won’t know how to be better. But overall interns learn and grow and you hope they’ll join full time when they graduate. These interns are also alien! All of us are still discovering how best to collaborate and communicate with them and leverage their relative strengths. Giving them the right tools or the right prompt can still be the difference between something sloppy mid and something that is state of the art.
(And of course like any alien encounter, a lot of us humans are suspicious, the aliens and their inner workings are often misunderstood, we’re distrustful of the companies that provide access to these aliens, the aliens bring offerings that we’re still trying to figure out how best to use, some humans have decided to worship the aliens, etc etc)
On the environment, here’s a thing I’d written last year that still holds up: WaPo is very wrong on ChatGPT energy use