I think the ideal scenario would include a fair, merit-based AI reviewer working alongside human experts. AI could also act as a "reviewer of the reviewers," flagging potential flaws or biases in human evaluations to help ensure fairness, accountability and consistency.
One reason being that the training data is not unbiased and it's very hard to make it less biased, let alone unbiased.
The other issue being that the AI companies behind the models are not interested in this. You can see the Grok saga playing out in plain sight, but the competitors are not much better. They patch a few things over, but don't solve it at the root. And they don't have incentives to do that.
The rule set is simple: "Don't be biased." What does that mean? And that is the problem. It's hard (read: impossible) to define in technical, formal terms. That's because bias is at the root a social problem, not a technical one. Therefore you won't be able to solve it with technology. Just like poverty, world peace, racism.
The best you can hope for is to provide technical means to point out indicators of bias. But anything beyond that could, at worst, do more harm than good. ("The tool said this result is unbiased now! Keep your skepticism to yourself and let me publish!")
You don't have much experience in it, do you? The real world peer review process could not be furher from what you are describing.
Source: I've personally been involved in peer reviewing in fields as diverse as computer science, quantum physics and applied animal biology. I've recently left those fields in part because of how terrible some of the real-world practices are.
And of course anyone with sufficient knowledge in a domain is freed from human foibles like racism, sexism, etc. Just dry expertise, applied without bias or agenda.
I was joking, but probably so, to the extent that 80% of peer reviewers are men and 80% of authors of peer reviewed articles are men[0]
0. https://www.jhsgo.org/article/S2589-5141%2820%2930046-3/full...